Version 1.5.X#
Version 1.5.2#
Deployed: 21th November 2022
Contributors#
In this release, we expand the functionality of existing classes and the documentation.
New functionality#
The
StringSimilarityEncoder
can now create similarity variables based on keywords entered by the user (Gleb Levitski)The
Winsorizer
andOutlierTrimmer
now automatically adjust the value of thefold
parameter based on thecapping_method
(pxn39)
Bug fixes#
Type checks errors raised by newer versions (Gleb Levitski)
Documentation#
Add example code snippets to the categorical encoding API docs (Alfonso Tobar)
Add example code snippets to the imputation module API docs (Alfonso Tobar)
Add example code snippets to the discretisation module API docs (Alfonso Tobar)
Add example code snippets to the creation module API docs (Alfonso Tobar)
Add example code snippets to the datetime module API docs (Alfonso Tobar)
Update the user guide docs for the forecasting feature transformers (Soledad Galli)
Update the user guide docs for datetime features and cyclical features (Soledad Galli)
Fix badges in README (Gleb Levitski)
Version 1.5.0#
Deployed: 17th October 2022
Contributors#
In this release, we fix a bug that made the get_feature_names_out
not compatible
with the Scikit-learn pipeline.
In addition, thanks to Gleb Levitski, we’ve got a new encoder to replace categories by string similarity variables. Gleb Levitski also made a number of code enhancements to various transformers across the library, making a lot of new functionality available.
Finally, we’d like to thank Alfonso Tobar, David Cortes and Morgan Sell for creating new transformers, fixing bugs and expanding the functionality of Feature-engine.
Thank you so much to all contributors and to those of you who created issues flagging bugs or requesting new functionality.
New transformers#
StringSimilarityEncoder: encodes categorical variables based on string similarity (Gleb Levitski)
MatchCategories: matches the categories in train and test set when of type pandas categorical (David Cortes)
SelectByInformationValue: selects features based on the information value (Morgan Sell and Soledad Galli)
New functionality#
The
MeanEncoder
can now implement smoothing during the encoding to handle high cardinality (Gleb Levitski)The
MeanEncoder
can now encode unseen categories (Gleb Levitski)The
OrdinalEncoder
can now encode unseen categories (Soledad Galli)The
CountFrequencyEncoder
can now encode unseen categories (David Cortes)All outlier transformers can now detect outliers based on the MAD rule (Gleb Levitski)
Add automatic calculation of PSI threshold in
DropHighPSIFeatures
(Gleb Levitski)All feature selection transformers now have the method
get_support()
(Soledad Galli)
Bug fixes#
get_feature_names_out
is now compatible with the Scikit-learn pipeline in all transformers (Soledad Galli)The
inverse_transform
method in encoders now correctly handles unseen categories or raises not implemented errors (Soledad Galli)Fixes output of
SklearnTransformerWrapper
forOneHotEncoder
andPolynomialFeatures
(Alfonso Tobar)
Documentation#
Add more resources to documentation (Soledad Galli)
User guide for
StringSimilarityEncoder
(Gleb Levitski)New Jupyter notebook for
StringSimilarityEncoder
(Gleb Levitski)User guide for SelectByInformationValue (Morgan Sell and Soledad Galli)
Deprecations#
Parameter
errors
in encoders is now replaced byunseen
(Soledad Galli)The classes
MathematicalCombination
,CombineWithFeatureReference
andCyclicalTransformer
are removed (Soledad Galli)We are deprecating
PRatioEncoder
in version 1.5 and it will be removed in version 1.6 (Soledad Galli)
Code improvements#
Adds code coverage test (Soledad Galli)
Changes logic of encoding unseen categories to work with inverse_transform (Soledad Galli)
Increases code coverage for encoders (Soledad Galli)
Removes CategoricalInitExpandedMixin (Soledad Galli)
Removes checks for encoding dictionaries in all encoders (Soledad Galli)
Refactors creation module (Soledad Galli)
Refactors docstring module (Soledad Galli)
Refactors variable handling module (Soledad Galli)
Refactors numerical dictionary checks (Soledad Galli)
Refactors base transformers module (Soledad Galli)
Makes dataframe checks more performant (Soledad Galli)
Replaces pd.concat by pd.group in all target based encoders (Soledad Galli)