Missing Data Imputation

Feature-engine’s missing data imputers replace missing data by parameters estimated from data or arbitrary values pre-defined by the user.

Summary of Feature-engine’s imputers main characteristics

Transformer

Numerical variables

Categorical variables

Description

MeanMedianImputer()

×

Replaces missing values by the mean or median

ArbitraryNumberImputer()

x

Replaces missing values by an arbitrary value

EndTailImputer()

×

Replaces missing values by a value at the end of the distribution

CategoricalImputer()

Replaces missing values by the most frequent category or by an arbitrary value

RandomSampleImputer()

Replaces missing values by random value extractions from the variable

AddMissingIndicator()

Adds a binary variable to flag missing observations

DropMissingData()

Removes observations with missing data from the dataset

The CategoricalImputer() performs procedures suitable for categorical variables. From version 1.1.0 it also accepts numerical variables as input, for those cases were categorical variables by nature are coded as numeric.