CombineWithReferenceFeature#

class feature_engine.creation.CombineWithReferenceFeature(variables_to_combine, reference_variables, operations=['sub'], new_variables_names=None, missing_values='ignore', drop_original=False)[source]#

CombineWithReferenceFeature() applies basic mathematical operations between a group of variables and one or more reference features. It adds one or more additional features to the dataframe with the result of the operations.

In other words, CombineWithReferenceFeature() sums, multiplies, subtracts or divides a group of features to / by a group of reference variables, and returns the result as new variables in the dataframe.

The transformed dataframe will contain the additional features indicated in the new_variables_name list plus the original set of variables.

More details in the User Guide.

Parameters
variables_to_combine: list

The list of numerical variables to combine with the reference variables.

reference_variables: list

The list of numerical reference variables that will be added to, multiplied with, or subtracted from the variables_to_combine, or used as denominator for division.

operations: list, default=[‘sub’]

The list of basic mathematical operations to be used in the transformation.

If None, all of [‘sub’, ‘div’,’add’,’mul’] will be performed. Alternatively, you can enter a list of operations to carry out. Each operation should be a string and must be one of the elements in ['sub', 'div','add', 'mul'].

Each operation will result in a new variable that will be added to the transformed dataset.

new_variables_names: list, default=None

Names of the new variables. If passing a list with the names for the new features (recommended), you must enter as many names as new features created by the transformer. The number of new features is the number of operations, times the number of reference_variables, times the number of variables_to_combine.

If new_variable_names is None, the transformer will assign an arbitrary name to the features. The name will be var + operation + ref_var.

missing_values: string, default=’ignore’

Indicates if missing values should be ignored or raised. If ‘ignore’, the transformer will ignore missing data when transforming the data. If ‘raise’ the transformer will return an error if the training or the datasets to transform contain missing values.

drop_original: bool, default=False

If True, the original variables will be dropped from the dataframe after their combination.

Attributes
n_features_in_:

The number of features in the train set used in fit.

Notes

Although the transformer in essence allows us to combine any feature with any of the allowed mathematical operations, its use is intended mostly for the creation of new features based on some domain knowledge. Typical examples within the financial sector are:

  • Ratio between income and debt to create the debt_to_income_ratio.

  • Subtraction of rent from income to obtain the disposable_income.

Methods

fit:

This transformer does not learn parameters.

transform:

Combine the variables with the mathematical operations.

fit_transform:

Fit to the data, then transform it.

fit(X, y=None)[source]#

This transformer does not learn any parameter.

Parameters
X: pandas dataframe of shape = [n_samples, n_features]

The training input samples. Can be the entire dataframe, not just the variables to transform.

y: pandas Series, or np.array. Default=None.

It is not needed in this transformer. You can pass y or None.

fit_transform(X, y=None, **fit_params)[source]#

Fit to data, then transform it.

Fits transformer to X and y with optional parameters fit_params and returns a transformed version of X.

Parameters
Xarray-like of shape (n_samples, n_features)

Input samples.

yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None

Target values (None for unsupervised transformations).

**fit_paramsdict

Additional fit parameters.

Returns
X_newndarray array of shape (n_samples, n_features_new)

Transformed array.

get_params(deep=True)[source]#

Get parameters for this estimator.

Parameters
deepbool, default=True

If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns
paramsdict

Parameter names mapped to their values.

set_params(**params)[source]#

Set the parameters of this estimator.

The method works on simple estimators as well as on nested objects (such as Pipeline). The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters
**paramsdict

Estimator parameters.

Returns
selfestimator instance

Estimator instance.

transform(X)[source]#

Combine the variables with the mathematical operations.

Parameters
X: pandas dataframe of shape = [n_samples, n_features]

The data to transform.

Returns
X_new: Pandas dataframe, shape = [n_samples, n_features + n_operations]

The dataframe with the new variables.

:rtype:py:class:~pandas.core.frame.DataFrame