SklearnTransformerWrapper¶
API Reference¶
-
class
feature_engine.wrappers.
SklearnTransformerWrapper
(variables=None, transformer=None)[source]¶ Wrapper for Scikit-learn pre-processing transformers like the SimpleImputer() or OrdinalEncoder(), to allow the use of the transformer on a selected group of variables.
- Parameters
- variableslist, default=None
The list of variables to be imputed.
If None, the wrapper will select all variables of type numeric for all transformers except the SimpleImputer, OrdinalEncoder and OneHotEncoder, in which case it will select all variables in the dataset.
- transformersklearn transformer, default=None
The desired Scikit-learn transformer.
Methods
fit:
Fit Scikit-learn transformers
transform:
Transforms with Scikit-learn transformers
fit_transform:
Fit to data, then transform it.
-
fit
(X, y=None)[source]¶ The
fit
method allows Scikit-learn transformers to learn the required parameters from the training data set.If transformer is OneHotEncoder, OrdinalEncoder or SimpleImputer, all variables indicated in the
`variables`
parameter will be transformed. When the variables parameter is None, the SklearnWrapper will automatically select and transform all features in the dataset, numerical or otherwise.For all other Scikit-learn transformers only numerical variables will be transformed. The SklearnWrapper will check that the variables indicated in the variables parameter are numerical, or alternatively, if variables is None, it will automatically select the numerical variables in the data set.
- Parameters
- XPandas DataFrame
The dataset to fit the transformer
- ypandas Series, default=None
This parameter exists only for compatibility with sklearn.pipeline.Pipeline.
- Returns
- self
- Raises
- TypeError
If the input is not a Pandas DataFrame
-
transform
(X)[source]¶ Apply the transformation to the dataframe. Only the selected features will be modified.
If transformer is OneHotEncoder, dummy features are concatenated to the source dataset. Note that the original categorical variables will not be removed from the dataset after encoding. If this is the desired effect, please use Feature-engine’s OneHotEncoder instead.
- Parameters
- XPandas DataFrame
The data to transform
- Returns
- XPandas DataFrame
The transformed dataset.
- rtype
DataFrame
..
- Raises
- TypeError
If the input is not a Pandas DataFrame
Example¶
Implements Scikit-learn transformers like the SimpleImputer, the OrdinalEncoder or most scalers only to the selected subset of features.
import pandas as pd
import numpy as np
from sklearn.model_selection import train_test_split
from sklearn.impute import SimpleImputer
from feature_engine.wrappers import SklearnTransformerWrapper
# Load dataset
data = pd.read_csv('houseprice.csv')
# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'], test_size=0.3, random_state=0)
# set up the wrapper with the SimpleImputer
imputer = SklearnTransformerWrapper(transformer = SimpleImputer(strategy='mean'),
variables = ['LotFrontage', 'MasVnrArea'])
# fit the wrapper + SimpleImputer
imputer.fit(X_train)
# transform the data
X_train = imputer.transform(X_train)
X_test = imputer.transform(X_test)