DropConstantFeatures

class feature_engine.selection.DropConstantFeatures(variables=None, tol=1, missing_values='raise')[source]

DropConstantFeatures() drops constant and quasi-constant variables from a dataframe. Constant variables show the same value in all the observations in the dataset. Quasi-constant variables show the same value in almost all the observations in the dataset.

This transformer works with numerical and categorical variables. The user can indicate a list of variables to examine. Alternatively, the transformer will evaluate all the variables in the dataset.

The transformer will first identify and store the constant and quasi-constant variables. Next, the transformer will drop these variables from a dataframe.

More details in the User Guide.

Parameters
variables: list, default=None

The list of variables to evaluate. If None, the transformer will evaluate all variables in the dataset.

tol: float,int, default=1

Threshold to detect constant/quasi-constant features. Variables showing the same value in a percentage of observations greater than tol will be considered constant / quasi-constant and dropped. If tol=1, the transformer removes constant variables. Else, it will remove quasi-constant variables. For example, if tol=0.98, the transformer will remove variables that show the same value in 98% of the observations.

missing_values: str, default=raises

Whether the missing values should be raised as error, ignored or included as an additional value of the variable. Takes values ‘raise’, ‘ignore’, ‘include’.

Attributes
features_to_drop_:

List with constant and quasi-constant features.

variables_:

The variables that will be considered for the feature selection.

n_features_in_:

The number of features in the train set used in fit.

See also

sklearn.feature_selection.VarianceThreshold

Notes

This transformer is a similar concept to the VarianceThreshold from Scikit-learn, but it evaluates number of unique values instead of variance.

Methods

fit:

Find constant and quasi-constant features.

transform:

Remove constant and quasi-constant features.

fit_transform:

Fit to the data. Then transform it.

fit(X, y=None)[source]

Find constant and quasi-constant features.

Parameters
X: pandas dataframe of shape = [n_samples, n_features]

The input dataframe.

y: None

y is not needed for this transformer. You can pass y or None.

fit_transform(X, y=None, **fit_params)[source]

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)[source]

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X)[source]

Return dataframe with selected features.

Parameters
X: pandas dataframe of shape = [n_samples, n_features].

The input dataframe.

Returns
X_new: pandas dataframe of shape = [n_samples, n_selected_features]

Pandas dataframe with the selected features.

rtype

DataFrame ..