LogTransformer

API Reference

class feature_engine.variable_transformers.LogTransformer(base='e', variables=None)[source]

The LogTransformer() applies the natural logarithm or the base 10 logarithm to numerical variables. The natural logarithm is logarithm in base e.

The LogTransformer() only works with numerical non-negative values. If the variable contains a zero or a negative value, the transformer will return an error.

A list of variables can be passed as an argument. Alternatively, the transformer will automatically select and transform all variables of type numeric.

Parameters
  • base (string, default='e') – Indicates if the natural or base 10 logarithm should be applied. Can take values ‘e’ or ‘10’.

  • variables (list, default=None) – The list of numerical variables to be transformed. If None, the transformer will find and select all numerical variables.

fit(X, y=None)[source]

Selects the numerical variables and determines whether the logarithm can be applied on the selected variables (it checks if the variables are all positive).

Parameters
  • X (pandas dataframe of shape = [n_samples, n_features]) – The training input samples. Can be the entire dataframe, not just the variables to transform.

  • y (None) – y is not needed in this transformer. You can pass y or None.

transform(X)[source]

Transforms the variables using logarithm.

Parameters

X (pandas dataframe of shape = [n_samples, n_features]) – The data to transform.

Returns

X_transformed – The log transformed dataframe.

Return type

pandas dataframe of shape = [n_samples, n_features]

Example Use

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine import variable_transformers as vt

# Load dataset
data = data = pd.read_csv('houseprice.csv')

# Separate into train and test sets
X_train, X_test, y_train, y_test =  train_test_split(
            data.drop(['Id', 'SalePrice'], axis=1),
            data['SalePrice'], test_size=0.3, random_state=0)

# set up the variable transformer
tf = vt.LogTransformer(variables = ['LotArea', 'GrLivArea'])

# fit the transformer
tf.fit(X_train)

# transform the data
train_t= tf.transform(X_train)
test_t= tf.transform(X_test)

# un-transformed variable
X_train['LotArea'].hist(bins=50)
../_images/lotarearaw.png
# transformed variable
train_t['LotArea'].hist(bins=50)
../_images/lotarealog.png