SklearnTransformerWrapper

Implements Scikit-learn transformers like the SimpleImputer, the OrdinalEncoder or most scalers only to the selected subset of features.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from feature_engine.wrappers import SklearnTransformerWrapper

# Load dataset
data = pd.read_csv('houseprice.csv')

# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    data.drop(['Id', 'SalePrice'], axis=1),
    data['SalePrice'], test_size=0.3, random_state=0)

# set up the wrapper with the SimpleImputer
imputer = SklearnTransformerWrapper(transformer = SimpleImputer(strategy='mean'),
                                    variables = ['LotFrontage', 'MasVnrArea'])

# fit the wrapper + SimpleImputer
imputer.fit(X_train)

# transform the data
X_train = imputer.transform(X_train)
X_test = imputer.transform(X_test)

For more details, check more examples in the Jupyter notebooks in our repository,

API Reference

class feature_engine.wrappers.SklearnTransformerWrapper(variables=None, transformer=None)[source]

Wrapper for Scikit-learn pre-processing transformers like the SimpleImputer() or OrdinalEncoder(), to allow the use of the transformer on a selected group of variables.

Parameters
  • variables (list, default=None) –

    The list of variables to be imputed.

    If None, the wrapper will select all variables of type numeric for all transformers except the SimpleImputer, OrdinalEncoder and OneHotEncoder, in which case it will select all variables in the dataset.

  • transformer (sklearn transformer, default=None) – The desired Scikit-learn transformer.

fit(X, y=None)[source]

The fit method allows Scikit-learn transformers to learn the required parameters from the training data set.

If transformer is OneHotEncoder, OrdinalEncoder or SimpleImputer, all variables indicated in the variables parameter will be transformed. When the variables parameter is None, the SklearnWrapper will automatically select and transform all features in the dataset, numerical or otherwise.

For all other Scikit-learn transformers only numerical variables will be transformed. The SklearnWrapper will check that the variables indicated in the variables parameter are numerical, or alternatively, if variables is None, it will automatically select the numerical variables in the data set.

transform(X)[source]

Apply the transformation to the dataframe. Only the selected features will be modified.

If transformer is OneHotEncoder, dummy features are concatenated to the source dataset. Note that the original categorical variables will not be removed from the dataset after encoding. If this is the desired effect, please use Feature-engine’s OneHotCategoricalEncoder instead.