SklearnTransformerWrapper

API Reference

class feature_engine.wrappers.SklearnTransformerWrapper(variables=None, transformer=None)[source]

Wrapper for Scikit-learn pre-processing transformers like the SimpleImputer() or OrdinalEncoder(), to allow the use of the transformer on a selected group of variables.

Parameters
variableslist, default=None

The list of variables to be imputed.

If None, the wrapper will select all variables of type numeric for all transformers except the SimpleImputer, OrdinalEncoder and OneHotEncoder, in which case it will select all variables in the dataset.

transformersklearn transformer, default=None

The desired Scikit-learn transformer.

Methods

fit:

Fit Scikit-learn transformers

transform:

Transforms with Scikit-learn transformers

fit_transform:

Fit to data, then transform it.

fit(X, y=None)[source]

The fit method allows Scikit-learn transformers to learn the required parameters from the training data set.

If transformer is OneHotEncoder, OrdinalEncoder or SimpleImputer, all variables indicated in the `variables` parameter will be transformed. When the variables parameter is None, the SklearnWrapper will automatically select and transform all features in the dataset, numerical or otherwise.

For all other Scikit-learn transformers only numerical variables will be transformed. The SklearnWrapper will check that the variables indicated in the variables parameter are numerical, or alternatively, if variables is None, it will automatically select the numerical variables in the data set.

Parameters
XPandas DataFrame

The dataset to fit the transformer

ypandas Series, default=None

This parameter exists only for compatibility with sklearn.pipeline.Pipeline.

Returns
self
Raises
TypeError

If the input is not a Pandas DataFrame

transform(X)[source]

Apply the transformation to the dataframe. Only the selected features will be modified.

If transformer is OneHotEncoder, dummy features are concatenated to the source dataset. Note that the original categorical variables will not be removed from the dataset after encoding. If this is the desired effect, please use Feature-engine’s OneHotEncoder instead.

Parameters
XPandas DataFrame

The data to transform

Returns
XPandas DataFrame

The transformed dataset.

rtype

DataFrame ..

Raises
TypeError

If the input is not a Pandas DataFrame

Example

Implements Scikit-learn transformers like the SimpleImputer, the OrdinalEncoder or most scalers only to the selected subset of features.

import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from feature_engine.wrappers import SklearnTransformerWrapper

# Load dataset
data = pd.read_csv('houseprice.csv')

# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
    data.drop(['Id', 'SalePrice'], axis=1),
    data['SalePrice'], test_size=0.3, random_state=0)

# set up the wrapper with the SimpleImputer
imputer = SklearnTransformerWrapper(transformer = SimpleImputer(strategy='mean'),
                                    variables = ['LotFrontage', 'MasVnrArea'])

# fit the wrapper + SimpleImputer
imputer.fit(X_train)

# transform the data
X_train = imputer.transform(X_train)
X_test = imputer.transform(X_test)