The MathematicalCombinator() applies basic mathematical operations across features, returning one or more additional features as a result.
import pandas as pd from sklearn.model_selection import train_test_split from feature_engine import mathematical_combination as mc data = pd.read_csv('houseprice.csv').fillna(0) X_train, X_test, y_train, y_test = train_test_split( data.drop(['Id', 'SalePrice'], axis=1), data['SalePrice'], test_size=0.3, random_state=0 ) math_combinator = mc.MathematicalCombinator( variables=['LotFrontage', 'LotArea'], math_operations = ['sum'], new_variables_names = ['LotTotal'] ) math_combinator.fit(X_train, y_train) X_train_ = math_combinator.transform(X_train)
print(X_train_.loc[:,['LotFrontage', 'LotArea', 'LotTotal']].head())
LotFrontage LotArea LotTotal 64 0.0 9375 9375.0 682 0.0 2887 2887.0 960 50.0 7207 7257.0 1384 60.0 9060 9120.0 1100 60.0 8400 8460.0
MathematicalCombinator(variables=None, math_operations=['sum', 'prod', 'mean', 'std', 'max', 'min'], new_variables_names=None)¶
The MathematicalCombinator() applies basic mathematical operations across features, returning 1 or more additional features as a result.
For example, if we have the variables number_payments_first_quarter, number_payments_second_quarter, number_payments_third_quarter and number_payments_fourth_quarter, we can use the MathematicalCombinator to calculate the total number of payments and mean number of payments as follows:
transformer = MathematicalCombinator( variables=[ 'number_payments_first_quarter', 'number_payments_second_quarter', 'number_payments_third_quarter', 'number_payments_fourth_quarter' ], math_operations=[ 'sum', 'mean' ], new_variables_name=[ 'total_number_payments', 'mean_number_payments' ] ) transformer.fit_transform(X)
The transformed X will contain the additional features total_number_payments and mean_number_payments, plus the original set of variables.
variables (list, default=None) – The list of numerical variables to be transformed. If None, the transformer will find and select all numerical variables.
math_operations (list, default=['sum', 'prod', 'mean', 'std', 'max', 'min']) –
The list of basic math operations to be used in transformation.
Each operation should be a string and must be one of the elements from the list: [‘sum’, ‘prod’, ‘mean’, ‘std’, ‘max’, ‘min’]
Each operation will result in a new variable that will be added to the transformed dataset.
new_variables_names (list, default=None) –
Names of the newly created variables. The user can enter a name or a list of names for the newly created features (recommended). User must enter one name for each mathematical transformation indicated in the math_operations attribute. That is, if you want to perform mean and sum of features, you should enter 2 new variable names. If you perform only mean of features, enter 1 variable name. Alternatively, if you chose to perform all mathematical transformations, please enter 6 new variable names.
The name of the variables indicated by the user should coincide with the order in which the mathematical operations are initialised in the transformer. That is, if you set math_operations = [‘mean’, ‘prod’], the first new variable name will be assigned to the mean of the variables and the second variable name to the product of the variables.
If new_variable_names=None, the transformer will assign an arbitrary name to the newly created features starting by the name of the mathematical operation, followed by the variables combined separated by -.
Performs dataframe checks. Selects variables to transform if None were indicated by the user. Creates dictionary of column to transformation mappings
- Xpandas dataframe of shape = [n_samples, n_features]
The training input samples. Can be the entire dataframe, not just the variables to transform.
y is not needed in this transformer. You can pass y or None.
Transforms source dataset.
Adds column for each operation with calculation based on variables and operation.
X (pandas dataframe of shape = [n_samples, n_features]) – The data to transform.
X_transformed – The dataframe with operations results added.
- Return type
pandas dataframe of shape = [n_samples, n_features + n_operations]