API Reference

class feature_engine.selection.RecursiveFeatureElimination(estimator=RandomForestClassifier(), scoring='roc_auc', cv=3, threshold=0.01, variables=None)[source]

RecursiveFeatureElimination selects features following a recursive process.

The process is as follows:

  1. Train an estimator using all the features.

  2. Rank the features according to their importance, derived from the estimator.

3. Remove one feature -the least important- and fit a new estimator with the remaining features.

  1. Calculate the performance of the new estimator.

5. Calculate the difference in performance between the new and the original estimator.

6. If the performance drops beyond the threshold, then that feature is important and will be kept. Otherwise, that feature is removed.

  1. Repeat steps 3-6 until all features have been evaluated.

Model training and performance calculation are done with cross-validation.

variablesstr or list, default=None

The list of variable to be evaluated. If None, the transformer will evaluate all numerical features in the dataset.

estimatorobject, default = RandomForestClassifier()

A Scikit-learn estimator for regression or classification. The estimator must have either a feature_importances or coef_ attribute after fitting.

scoringstr, default=’roc_auc’

Desired metric to optimise the performance of the estimator. Comes from sklearn.metrics. See the model evaluation documentation for more options:

thresholdfloat, int, default = 0.01

The value that defines if a feature will be kept or removed. Note that for metrics like roc-auc, r2_score and accuracy, the thresholds will be floats between 0 and 1. For metrics like the mean_square_error and the root_mean_square_error the threshold will be a big number. The threshold must be defined by the user. Bigger thresholds will select less features.

cvint, default=3

Cross-validation fold to be used to fit the estimator.


initial_model_performance_ :

Performance of the model trained using the original dataset.

feature_importances_ :

Pandas Series with the feature importance


Dictionary with the performance drift per examined feature.


List with the features to remove from the dataset.



Find the important features.


Reduce X to the selected features.


Fit to data, then transform it.

fit(X, y)[source]

Find the important features. Note that the selector trains various models at each round of selection, so it might take a while.

Xpandas dataframe of shape = [n_samples, n_features]

The input dataframe

yarray-like of shape (n_samples)

Target variable. Required to train the estimator.


Return dataframe with selected features.

Xpandas dataframe of shape = [n_samples, n_features].

The input dataframe.

X_transformed: pandas dataframe of shape = [n_samples, n_selected_features]

Pandas dataframe with the selected features.


DataFrame ..


import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from feature_engine.selection import RecursiveFeatureElimination

# load dataset
diabetes_X, diabetes_y = load_diabetes(return_X_y=True)
X = pd.DataFrame(diabetes_X)
y = pd.DataFrame(diabetes_y)

# initialize linear regresion estimator
linear_model = LinearRegression()

# initialize feature selector
tr = RecursiveFeatureElimination(estimator=linear_model, scoring="r2", cv=3)

# fit transformer
Xt = tr.fit_transform(X, y)

# get the initial linear model performance, using all features
# Get the performance drift of each feature
{0: -0.0032796652347705235,
 9: -0.00028200591588534163,
 6: -0.0006752869546966522,
 7: 0.00013883578730117252,
 1: 0.011956170569096924,
 3: 0.028634492035512438,
 5: 0.012639090879036363,
 2: 0.06630127204137715,
 8: 0.1093736570697495,
 4: 0.024318093565432353}
# get the selected features
[1, 3, 5, 2, 8, 4]
          1         3         5         2         8         4
0  0.050680  0.021872 -0.034821  0.061696  0.019908 -0.044223
1 -0.044642 -0.026328 -0.019163 -0.051474 -0.068330 -0.008449
2  0.050680 -0.005671 -0.034194  0.044451  0.002864 -0.045599
3 -0.044642 -0.036656  0.024991 -0.011595  0.022692  0.012191
4 -0.044642  0.021872  0.015596 -0.036385 -0.031991  0.003935