AddMissingIndicator

API Reference

class feature_engine.imputation.AddMissingIndicator(missing_only=True, variables=None)[source]

The AddMissingIndicator() adds an additional column or binary variable that indicates if data is missing.

AddMissingIndicator() will add as many missing indicators as variables indicated by the user, or variables with missing data in the train set.

The AddMissingIndicator() works for both numerical and categorical variables. The user can pass a list with the variables for which the missing indicators should be added as a list. Alternatively, the imputer will select and add missing indicators to all variables in the training set that show missing data.

Parameters
missing_onlybool, defatult=True

Indicates if missing indicators should be added to variables with missing data or to all variables.

True: indicators will be created only for those variables that showed missing data during fit.

False: indicators will be created for all variables

variableslist, default=None

The list of variables to be imputed. If None, the imputer will find and select all variables with missing data.

**Note**
The transformer will first select all variables or all user entered
variables and if how=missing_only, it will re-select from the original group
only those that show missing data in during fit.

Attributes

variables_:

List of variables for which the missing indicators will be created.

Methods

fit:

Learn the variables for which the missing indicators will be created

transform:

Add the missing indicators.

fit_transform:

Fit to the data, then trasnform it.

fit(X, y=None)[source]

Learn the variables for which the missing indicators will be created.

Parameters
Xpandas dataframe of shape = [n_samples, n_features]

The training dataset.

ypandas Series, default=None

y is not needed in this imputation. You can pass None or y.

Returns
self.variables_list

The list of variables for which missing indicators will be added.

Raises
TypeError

If the input is not a Pandas DataFrame

transform(X)[source]

Add the binary missing indicators.

Parameters
Xpandas dataframe of shape = [n_samples, n_features]

The dataframe to be transformed.

Returns
X_transformedpandas dataframe of shape = [n_samples, n_features]

The dataframe containing the additional binary variables. Binary variables are named with the original variable name plus ‘_na’.

rtype

DataFrame ..

Example

The AddMissingIndicator() adds a binary variable indicating if observations are missing (missing indicator). It adds a missing indicator for both categorical and numerical variables. A list of variables for which to add a missing indicator can be passed, or the imputer will automatically select all variables.

The imputer has the option to select if binary variables should be added to all variables, or only to those that show missing data in the train set, by setting the option how=’missing_only’.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine.imputation import AddMissingIndicator

# Load dataset
data = pd.read_csv('houseprice.csv')


# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['Id', 'SalePrice'], axis=1), data['SalePrice'], test_size=0.3, random_state=0)

# set up the imputer
addBinary_imputer = AddMissingIndicator( variables=['Alley', 'MasVnrType', 'LotFrontage', 'MasVnrArea'])

# fit the imputer
addBinary_imputer.fit(X_train)

# transform the data
train_t = addBinary_imputer.transform(X_train)
test_t = addBinary_imputer.transform(X_test)

train_t[['Alley_na', 'MasVnrType_na', 'LotFrontage_na', 'MasVnrArea_na']].head()
../_images/missingindicator.png