Categorical Encoding#
Feature-engine’s categorical encoders replace variable strings by estimated or arbitrary numbers. The following image summarizes the main encoder’s functionality.
Summary of Feature-engine’s encoders characteristics
Transformer |
Regression |
Classification |
Multi-class |
Description |
---|---|---|---|---|
√ |
√ |
√ |
Adds dummy variables to represent each category |
|
√ |
√ |
√ Replaces categories with an integer |
||
|
√ |
√ |
√ |
Replaces categories with their count or frequency |
√ |
√ |
x |
Replaces categories with the targe mean value |
|
x |
√ |
x |
Replaces categories with the weight of the evidence |
|
√ |
√ |
√ Replaces categories with the predictions of a decision tree |
||
√ |
√ |
√ Groups infrequent categories into a single one |
Feature-engine’s categorical encoders work only with categorical variables by default. From version 1.1.0, you have the option to set the parameter ignore_format to False, and make the transformers also accept numerical variables as input.
Monotonicity
Most Feature-engine’s encoders will return, or attempt to return monotonic relationships between the encoded variable and the target. A monotonic relationship is one in which the variable value increases as the values in the other variable increase, or decrease. See the following illustration as examples:
Monotonic relationships tend to help improve the performance of linear models and build shallower decision trees.
Regression vs Classification
Most Feature-engine’s encoders are suitable for both regression and classification, with
the exception of the WoEEncoder()
and the PRatioEncoder()
which are
designed solely for binary classification.
Multi-class classification
Finally, some Feature-engine’s encoders can handle multi-class targets off-the-shelf for
example the OneHotEncoder()
, the :class:CountFrequencyEncoder()` and the
DecisionTreeEncoder()
.
Note that while the MeanEncoder()
and the OrdinalEncoder()
will operate
with multi-class targets, but the mean of the classes may not be significant and this will
defeat the purpose of these encoding techniques.
Encoders