LogTransformer

The LogTransformer() will apply the logarithm to the indicated variables. Note that the logarithm can only be applied to positive values. Thus, if the variable contains 0 or negative variables, this transformer will return and error.

Example

Let’s load the house prices dataset and separate it into train and test sets (more details about the dataset here).

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine import transformation as vt

# Load dataset
data = pd.read_csv('houseprice.csv')

# Separate into train and test sets
X_train, X_test, y_train, y_test =  train_test_split(
            data.drop(['Id', 'SalePrice'], axis=1),
            data['SalePrice'], test_size=0.3, random_state=0)

Now we want to apply the logarithm to 2 of the variables in the dataset using the LogTransformer().

# set up the variable transformer
tf = vt.LogTransformer(variables = ['LotArea', 'GrLivArea'])

# fit the transformer
tf.fit(X_train)

With fit(), this transformer does not learn any parameters. We can go ahead not an transform the variables.

# transform the data
train_t= tf.transform(X_train)
test_t= tf.transform(X_test)

Next, we make a histogram of the original variable distribution:

# un-transformed variable
X_train['LotArea'].hist(bins=50)
../../_images/lotarearaw.png

And now, we can explore the distribution of the variable after the logarithm transformation:

# transformed variable
train_t['LotArea'].hist(bins=50)
../../_images/lotarealog.png

Note that the transformed variable has a more Gaussian looking distribution.

More details

You can find more details about the LogTransformer() here:

All notebooks can be found in a dedicated repository.