CombineWithReferenceFeature¶
API Reference¶

class
feature_engine.creation.
CombineWithReferenceFeature
(variables_to_combine, reference_variables, operations=['sub'], new_variables_names=None, missing_values='ignore')[source]¶ CombineWithReferenceFeature() applies basic mathematical operations between one or more reference variables and a group of variables, returning one or more additional features as a result. That is, it sums, multiplies, substracts or divides a group of features to or by a group of reference variables and returns the result into new variables.
For example, if we have the variables number_payments_first_quarter, number_payments_second_quarter, number_payments_third_quarter, number_payments_fourth_quarter, and total_payments we can use CombineWithReferenceFeature() to determine the percentage of total payments per month as follows:
transformer = CombineWithReferenceFeature( variables_to_combine=[ 'number_payments_first_quarter', 'number_payments_second_quarter', 'number_payments_third_quarter', 'number_payments_fourth_quarter', ], reference_variables=['total_payments'], operations=['div'], new_variables_name=[ 'perc_payments_first_quarter', 'perc_payments_second_quarter', 'perc_payments_third_quarter', 'perc_payments_fourth_quarter', ] ) Xt = transformer.fit_transform(X)
The transformed X, Xt, will contain the additional features indicated in the new_variables_name list plus the original set of variables.
 Parameters
 variables_to_combinelist
The list of numerical variables to be combined with the reference variables.
 reference_variableslist
The list of numerical reference variables that will be added, multiplied, or substracted from the variables_to_combine, or used as denominator for division.
 operationslist, default=[‘sub’]
The list of basic mathematical operations to be used in transformation.
If none, all of [‘sub’, ‘div’,’add’,’mul’] will be performed over the variables. Alternatively, the user can enter the list of operations to carry out.
Each operation should be a string and must be one of the elements from the list: [‘sub’, ‘div’,’add’,’mul’]
Each operation will result in a new variable that will be added to the transformed dataset.
 new_variables_nameslist, default=None
Names of the newly created variables. The user can enter a list with the names for the newly created features (recommended). The user must enter as many names as new features created by the transformer. The number of new features is the number of operations times the number of reference variables times the number of variables to combine.
Thus, if you want to perform 2 operations, sub and div, combining 4 variables with 2 reference variables, you should enter 2 X 4 X 2 new variable names.
The name of the variables indicated by the user should coincide with the order in which the operations are performed by the transformer. The transformer will first carry out ‘sub’, then ‘div’, then ‘add’ and finally ‘mul’.
If new_variable_names=None, the transformer will assign an arbitrary name to the newly created features.
 missing_valuesstring, default=’ignore’
Indicates if missing values should be ignored or raised. If missing_values=’ignore’, the transformer will ignore missing data when transforming the data. If missing_values=’raise’ the transformer will return an error if the training or the datasets to transform contain missing values.
Notes
Although the transformer in essence allows us to combine any feature with any of the allowed mathematical operations, its used is intended mostly for the creation of new features based on some domain knowledge. Typical examples within the financial sector are:
Ratio between income and debt to create the debt_to_income_ratio.
Subtraction of rent from income to obtain the disposable_income.
Methods
fit :
This transformer does not learn parameters.
transform :
Combine the variables with the mathematical operations.
fit_transform :
Fit to the data, then transform it.

fit
(X, y=None)[source]¶ This transformer does not learn any parameter. Performs dataframe checks.
 Parameters
 Xpandas dataframe of shape = [n_samples, n_features]
 The training input samples.
 Can be the entire dataframe, not just the variables to transform.
 ypandas Series, or np.array. Defaults to None.
It is not needed in this transformer. You can pass y or None.
 Returns
 self
 Raises
 TypeError
If the input is not a Pandas DataFrame
If any user provided variables are not numerical
 ValueError
If any of the reference variables contain null values and the mathematical operation is ‘div’.

transform
(X)[source]¶ Combine the variables with the mathematical operations.
 Parameters
 Xpandas dataframe of shape = [n_samples, n_features]
 The data to transform.
 Returns
 XPandas dataframe, shape = [n_samples, n_features + n_operations]
 The dataframe with the operations results added as columns.
 rtype
DataFrame
..
Example¶
CombineWithReferenceFeature() combines a group of variables with a group of reference variables utilizing basic mathematical operations (subtraction, division, addition and multiplication), returning one or more additional features in the dataframe as a result.
In this example, we subtract 2 variables from the house prices dataset.
import pandas as pd
from sklearn.model_selection import train_test_split
from feature_engine.creation import CombineWithReferenceFeature
data = pd.read_csv('houseprice.csv').fillna(0)
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1),
data['SalePrice'],
test_size=0.3,
random_state=0
)
combinator = CombineWithReferenceFeature(
variables_to_combine=['LotArea'],
reference_variables=['LotFrontage'],
operations = ['sub'],
new_variables_names = ['LotPartial']
)
combinator.fit(X_train, y_train)
X_train_ = combinator.transform(X_train)
print(X_train_[["LotPartial","LotFrontage","LotArea"]].head())
LotTotal LotFrontage LotArea
64 9375.0 0.0 9375
682 2887.0 0.0 2887
960 7157.0 50.0 7207
1384 9000.0 60.0 9060
1100 8340.0 60.0 8400