MathematicalCombinator

The MathematicalCombinator() applies basic mathematical operations across features, returning one or more additional features as a result.

import pandas as pd
from sklearn.model_selection import train_test_split

from feature_engine import mathematical_combination as mc

data = pd.read_csv('houseprice.csv').fillna(0)

X_train, X_test, y_train, y_test = train_test_split(
    data.drop(['Id', 'SalePrice'], axis=1),
    data['SalePrice'],
    test_size=0.3,
    random_state=0
)

math_combinator = mc.MathematicalCombinator(
    variables=['LotFrontage', 'LotArea'],
    math_operations = ['sum'],
    new_variables_names = ['LotTotal']
)

math_combinator.fit(X_train, y_train)
X_train_ = math_combinator.transform(X_train)
print(math_combinator.combination_dict_)
{'LotTotal': 'sum'}
print(X_train_.loc[:,['LotFrontage', 'LotArea', 'LotTotal']].head())
      LotFrontage  LotArea  LotTotal
64            0.0     9375    9375.0
682           0.0     2887    2887.0
960          50.0     7207    7257.0
1384         60.0     9060    9120.0
1100         60.0     8400    8460.0

API Reference

class feature_engine.mathematical_combination.MathematicalCombinator(variables=None, math_operations=['sum', 'prod', 'mean', 'std', 'max', 'min'], new_variables_names=None)[source]

The MathematicalCombinator() applies basic mathematical operations across features, returning 1 or more additional features as a result.

For example, if we have the variables number_payments_first_quarter, number_payments_second_quarter, number_payments_third_quarter and number_payments_fourth_quarter, we can use the MathematicalCombinator to calculate the total number of payments and mean number of payments as follows:

transformer = MathematicalCombinator(
    variables=[
        'number_payments_first_quarter',
        'number_payments_second_quarter',
        'number_payments_third_quarter',
        'number_payments_fourth_quarter'
    ],
    math_operations=[
        'sum',
        'mean'
    ],
    new_variables_name=[
        'total_number_payments',
        'mean_number_payments'
    ]
)

transformer.fit_transform(X)

The transformed X will contain the additional features total_number_payments and mean_number_payments, plus the original set of variables.

Parameters
  • variables (list, default=None) – The list of numerical variables to be transformed. If None, the transformer will find and select all numerical variables.

  • math_operations (list, default=['sum', 'prod', 'mean', 'std', 'max', 'min']) –

    The list of basic math operations to be used in transformation.

    Each operation should be a string and must be one of the elements from the list: [‘sum’, ‘prod’, ‘mean’, ‘std’, ‘max’, ‘min’]

    Each operation will result in a new variable that will be added to the transformed dataset.

  • new_variables_names (list, default=None) –

    Names of the newly created variables. The user can enter a name or a list of names for the newly created features (recommended). User must enter one name for each mathematical transformation indicated in the math_operations attribute. That is, if you want to perform mean and sum of features, you should enter 2 new variable names. If you perform only mean of features, enter 1 variable name. Alternatively, if you chose to perform all mathematical transformations, please enter 6 new variable names.

    The name of the variables indicated by the user should coincide with the order in which the mathematical operations are initialised in the transformer. That is, if you set math_operations = [‘mean’, ‘prod’], the first new variable name will be assigned to the mean of the variables and the second variable name to the product of the variables.

    If new_variable_names=None, the transformer will assign an arbitrary name to the newly created features starting by the name of the mathematical operation, followed by the variables combined separated by -.

fit(X, y=None)[source]

Performs dataframe checks. Selects variables to transform if None were indicated by the user. Creates dictionary of column to transformation mappings

Xpandas dataframe of shape = [n_samples, n_features]

The training input samples. Can be the entire dataframe, not just the variables to transform.

yNone

y is not needed in this transformer. You can pass y or None.

transform(X)[source]

Transforms source dataset.

Adds column for each operation with calculation based on variables and operation.

Parameters

X (pandas dataframe of shape = [n_samples, n_features]) – The data to transform.

Returns

X_transformed – The dataframe with operations results added.

Return type

pandas dataframe of shape = [n_samples, n_features + n_operations]