CombineWithReferenceFeature¶
- class feature_engine.creation.CombineWithReferenceFeature(variables_to_combine, reference_variables, operations=['sub'], new_variables_names=None, missing_values='ignore', drop_original=False)[source]¶
CombineWithReferenceFeature() applies basic mathematical operations between a group of variables and one or more reference features. It adds one or more additional features to the dataframe with the result of the operations.
In other words, CombineWithReferenceFeature() sums, multiplies, subtracts or divides a group of features to / by a group of reference variables, and returns the result as new variables in the dataframe.
The transformed dataframe will contain the additional features indicated in the new_variables_name list plus the original set of variables.
More details in the User Guide.
- Parameters
- variables_to_combine: list
The list of numerical variables to combine with the reference variables.
- reference_variables: list
The list of numerical reference variables that will be added to, multiplied with, or subtracted from the
variables_to_combine
, or used as denominator for division.- operations: list, default=[‘sub’]
The list of basic mathematical operations to be used in the transformation.
If None, all of [‘sub’, ‘div’,’add’,’mul’] will be performed. Alternatively, you can enter a list of operations to carry out. Each operation should be a string and must be one of the elements in
['sub', 'div','add', 'mul']
.Each operation will result in a new variable that will be added to the transformed dataset.
- new_variables_names: list, default=None
Names of the new variables. If passing a list with the names for the new features (recommended), you must enter as many names as new features created by the transformer. The number of new features is the number of
operations
, times the number ofreference_variables
, times the number ofvariables_to_combine
.If
new_variable_names
is None, the transformer will assign an arbitrary name to the features. The name will be var + operation + ref_var.- missing_values: string, default=’ignore’
Indicates if missing values should be ignored or raised. If ‘ignore’, the transformer will ignore missing data when transforming the data. If ‘raise’ the transformer will return an error if the training or the datasets to transform contain missing values.
- drop_original: bool, default=False
If True, the original variables will be dropped from the dataframe after their combination.
- Attributes
- n_features_in_:
The number of features in the train set used in fit.
Notes
Although the transformer in essence allows us to combine any feature with any of the allowed mathematical operations, its use is intended mostly for the creation of new features based on some domain knowledge. Typical examples within the financial sector are:
Ratio between income and debt to create the debt_to_income_ratio.
Subtraction of rent from income to obtain the disposable_income.
Methods
fit:
This transformer does not learn parameters.
transform:
Combine the variables with the mathematical operations.
fit_transform:
Fit to the data, then transform it.
- fit(X, y=None)[source]¶
This transformer does not learn any parameter.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The training input samples. Can be the entire dataframe, not just the variables to transform.
- y: pandas Series, or np.array. Default=None.
It is not needed in this transformer. You can pass y or None.
- fit_transform(X, y=None, **fit_params)[source]¶
Fit to data, then transform it.
Fits transformer to
X
andy
with optional parametersfit_params
and returns a transformed version ofX
.- Parameters
- Xarray-like of shape (n_samples, n_features)
Input samples.
- yarray-like of shape (n_samples,) or (n_samples, n_outputs), default=None
Target values (None for unsupervised transformations).
- **fit_paramsdict
Additional fit parameters.
- Returns
- X_newndarray array of shape (n_samples, n_features_new)
Transformed array.
- get_params(deep=True)[source]¶
Get parameters for this estimator.
- Parameters
- deepbool, default=True
If True, will return the parameters for this estimator and contained subobjects that are estimators.
- Returns
- paramsdict
Parameter names mapped to their values.
- set_params(**params)[source]¶
Set the parameters of this estimator.
The method works on simple estimators as well as on nested objects (such as
Pipeline
). The latter have parameters of the form<component>__<parameter>
so that it’s possible to update each component of a nested object.- Parameters
- **paramsdict
Estimator parameters.
- Returns
- selfestimator instance
Estimator instance.
- transform(X)[source]¶
Combine the variables with the mathematical operations.
- Parameters
- X: pandas dataframe of shape = [n_samples, n_features]
The data to transform.
- Returns
- X_new: Pandas dataframe, shape = [n_samples, n_features + n_operations]
The dataframe with the new variables.
- rtype
DataFrame
..