The Box-Cox transformation is defined as:

T(Y)=(Y exp(λ)−1)/λ if λ!=0, or log(Y) otherwise.

where Y is the response variable and λ is the transformation parameter. λ varies, typically from -5 to 5. In the transformation, all values of λ are considered and the optimal value for a given variable is selected.

API Reference

class feature_engine.variable_transformers.BoxCoxTransformer(variables=None)[source]

The BoxCoxTransformer() applies the BoxCox transformation to numerical variables.

The BoxCox transformation implemented by this transformer is that of SciPy.stats:

The BoxCoxTransformer() works only with numerical positive variables (>=0, the transformer also works for zero values).

A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all numerical variables.


variables (list, default=None) – The list of numerical variables that will be transformed. If None, the transformer will automatically find and select all numerical variables.


The dictionary containing the {variable: best exponent for the BoxCox transfomration} pairs. These are determined automatically.



fit(X, y=None)[source]

Learns the optimal lambda for the BoxCox transformation.

  • X (pandas dataframe of shape = [n_samples, n_features]) – The training input samples. Can be the entire dataframe, not just the variables to transform.

  • y (None) – y is not needed in this transformer. You can pass y or None.


Applies the BoxCox transformation.


X (pandas dataframe of shape = [n_samples, n_features]) – The data to be transformed.


X_transformed – The dataframe with the transformed variables.

Return type

pandas dataframe of shape = [n_samples, n_features]

Example Use

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine import variable_transformers as vt

# Load dataset
data = data = pd.read_csv('houseprice.csv')

# Separate into train and test sets
X_train, X_test, y_train, y_test =  train_test_split(
            data.drop(['Id', 'SalePrice'], axis=1),
            data['SalePrice'], test_size=0.3, random_state=0)

# set up the variable transformer
tf = vt.BoxCoxTransformer(variables = ['LotArea', 'GrLivArea'])

# fit the transformer

# transform the data
train_t= tf.transform(X_train)
test_t= tf.transform(X_test)

# un-transformed variable
# transformed variable