Categorical Encoding

Feature-engine’s categorical encoders replace variable strings by estimated or arbitrary numbers. The following image summarizes the main encoder’s functionality.


Summary of Feature-engine’s encoders main characteristics

Feature-engine’s categorical encoders work only with categorical variables by default. From version 1.1.0, you have the option to set the parameter ignore_format to False, and make the transformers also accept numerical variables as input.


Most Feature-engine’s encoders will return, or attempt to return monotonic relationships between the encoded variable and the target. A monotonic relationship is one in which the variable value increases as the values in the other variable increase, or decrease. See the following illustration as examples:


Monotonic relationships tend to help improve the performance of linear models and build shallower decision trees.

Regression vs Classification

Most Feature-engine’s encoders are suitable for both regression and classification, with the exception of the WoEEncoder() and the PRatioEncoder() which are designed solely for binary classification.

Multi-class classification

Finally, some Feature-engine’s encoders can handle multi-class targets off-the-shelf for example the OneHotEncoder(), the :class:CountFrequencyEncoder()` and the DecisionTreeEncoder().

Note that while the MeanEncoder() and the OrdinalEncoder() will operate with multi-class targets, but the mean of the classes may not be significant and this will defeat the purpose of these encoding techniques.


Additional categorical encoding transformations ara available in the open-source package Category encoders.