RecursiveFeatureElimination implements recursive feature elimination. Recursive feature elimination (RFE) is a backward feature selection process. In Feature-engine’s implementation of RFE, a feature will be kept or removed based on the performance of a machine learning model without that feature. This differs from Scikit-learn’s implementation of RFE where a feature will be kept or removed based on the feature importance.

This technique begins by building a model on the entire set of variables, then calculates and stores a model performance metric, and finally computes an importance score for each variable. Features are ranked by the model’s coef_ or feature_importances_ attributes.

In the next step, the least important feature is removed, the model is re-built, and a new performance metric is determined. If this performance metric is worse than the original one, then, the feature is kept, (because eliminating the feature clearly caused a drop in model performance) otherwise, it removed.

The procedure removes now the second to least important feature, trains a new model, determines a new performance metric, and so on, until it evaluates all the features, from the least to the most important.

Note that, in Feature-engine’s implementation of RFE, the feature importance is used just to rank features and thus determine the order in which the features will be eliminated. But whether to retain a feature is determined based on the decrease in the performance of the model after the feature elimination.

By recursively eliminating features, RFE attempts to eliminate dependencies and collinearity that may exist in the model.


Feature-engine’s RFE has 2 parameters that need to be determined somewhat arbitrarily by the user: the first one is the machine learning model which performance will be evaluated. The second is the threshold in the performance drop that needs to occur, to remove a feature.

RFE is not machine learning model agnostic, this means that the feature selection depends on the model, and different models may have different subsets of optimal features. Thus, it is recommended that you use the machine learning model that you finally intend to build.

Regarding the threshold, this parameter needs a bit of hand tuning. Higher thresholds will of course return fewer features.


Let’s see how to use this transformer with the diabetes dataset that comes in Scikit-learn. First, we load the data:

import pandas as pd
from sklearn.datasets import load_diabetes
from sklearn.linear_model import LinearRegression
from feature_engine.selection import RecursiveFeatureElimination

# load dataset
diabetes_X, diabetes_y = load_diabetes(return_X_y=True)
X = pd.DataFrame(diabetes_X)
y = pd.DataFrame(diabetes_y)

Now, we set up RecursiveFeatureElimination to select features based on the r2 returned by a Linear Regression model, using 3 fold cross-validation. In this case, we leave the parameter threshold to the default value which is 0.01.

# initialize linear regresion estimator
linear_model = LinearRegression()

# initialize feature selector
tr = RecursiveFeatureElimination(estimator=linear_model, scoring="r2", cv=3)

With fit() the model finds the most useful features, that is, features that when removed cause a drop in model performance bigger than 0.01. With transform(), the transformer removes the features from the dataset.

# fit transformer
Xt = tr.fit_transform(X, y)

RecursiveFeatureElimination stores the performance of the model trained using all the features in its attribute:

# get the initial linear model performance, using all features

RecursiveFeatureElimination also stores the change in the performance caused by removing every feature.

# Get the performance drift of each feature
{0: -0.0032796652347705235,
 9: -0.00028200591588534163,
 6: -0.0006752869546966522,
 7: 0.00013883578730117252,
 1: 0.011956170569096924,
 3: 0.028634492035512438,
 5: 0.012639090879036363,
 2: 0.06630127204137715,
 8: 0.1093736570697495,
 4: 0.024318093565432353}

RecursiveFeatureElimination also stores the features that will be dropped based n the given threshold.

# the features to remove
[0, 6, 7, 9]

If we now print the transformed data, we see that the features above were removed.

          1         3         5         2         8         4
0  0.050680  0.021872 -0.034821  0.061696  0.019908 -0.044223
1 -0.044642 -0.026328 -0.019163 -0.051474 -0.068330 -0.008449
2  0.050680 -0.005671 -0.034194  0.044451  0.002864 -0.045599
3 -0.044642 -0.036656  0.024991 -0.011595  0.022692  0.012191
4 -0.044642  0.021872  0.015596 -0.036385 -0.031991  0.003935