Categorical Encoding#

Feature-engine’s categorical encoders replace variable strings by estimated or arbitrary numbers. The following image summarizes the main encoder’s functionality.

Summary of Feature-engine’s encoders characteristics

Transformer	Regression	Classification	Multi-class	Description
`OneHotEncoder()`	√	√	√	Adds dummy variables to represent each category
`OrdinalEncoder()`	√	√		√ Replaces categories with an integer
`CountFreuencyEncoder()`	√	√	√	Replaces categories with their count or frequency
`MeanEncoder()`	√	√	x	Replaces categories with the targe mean value
`WoEEncoder()`	x	√	x	Replaces categories with the weight of the evidence
`DecisionTreeEncoder()`	√	√		√ Replaces categories with the predictions of a decision tree
`RareLabelEncoder()`	√	√		√ Groups infrequent categories into a single one

Feature-engine’s categorical encoders work only with categorical variables by default. From version 1.1.0, you have the option to set the parameter ignore_format to False, and make the transformers also accept numerical variables as input.

Monotonicity

Most Feature-engine’s encoders will return, or attempt to return monotonic relationships between the encoded variable and the target. A monotonic relationship is one in which the variable value increases as the values in the other variable increase, or decrease. See the following illustration as examples:

Monotonic relationships tend to help improve the performance of linear models and build shallower decision trees.

Regression vs Classification

Most Feature-engine’s encoders are suitable for both regression and classification, with the exception of the WoEEncoder() and the PRatioEncoder() which are designed solely for binary classification.

Multi-class classification

Finally, some Feature-engine’s encoders can handle multi-class targets off-the-shelf for example the OneHotEncoder(), the :class:CountFrequencyEncoder()` and the DecisionTreeEncoder().

Note that while the MeanEncoder() and the OrdinalEncoder() will operate with multi-class targets, but the mean of the classes may not be significant and this will defeat the purpose of these encoding techniques.

Encoders

This site uses cookies

Categorical Encoding#