Version 1.5.X#

Version 1.5.2#

Deployed: 21th November 2022

Contributors#

In this release, we expand the functionality of existing classes and the documentation.

New functionality#

  • The StringSimilarityEncoder can now create similarity variables based on keywords entered by the user (Gleb Levitski)

  • The Winsorizer and OutlierTrimmer now automatically adjust the value of the fold parameter based on the capping_method (pxn39)

Bug fixes#

Documentation#

  • Add example code snippets to the categorical encoding API docs (Alfonso Tobar)

  • Add example code snippets to the imputation module API docs (Alfonso Tobar)

  • Add example code snippets to the discretisation module API docs (Alfonso Tobar)

  • Add example code snippets to the creation module API docs (Alfonso Tobar)

  • Add example code snippets to the datetime module API docs (Alfonso Tobar)

  • Update the user guide docs for the forecasting feature transformers (Soledad Galli)

  • Update the user guide docs for datetime features and cyclical features (Soledad Galli)

  • Fix badges in README (Gleb Levitski)

Version 1.5.0#

Deployed: 17th October 2022

Contributors#

In this release, we fix a bug that made the get_feature_names_out not compatible with the Scikit-learn pipeline.

In addition, thanks to Gleb Levitski, we’ve got a new encoder to replace categories by string similarity variables. Gleb Levitski also made a number of code enhancements to various transformers across the library, making a lot of new functionality available.

Finally, we’d like to thank Alfonso Tobar, David Cortes and Morgan Sell for creating new transformers, fixing bugs and expanding the functionality of Feature-engine.

Thank you so much to all contributors and to those of you who created issues flagging bugs or requesting new functionality.

New transformers#

  • StringSimilarityEncoder: encodes categorical variables based on string similarity (Gleb Levitski)

  • MatchCategories: matches the categories in train and test set when of type pandas categorical (David Cortes)

  • SelectByInformationValue: selects features based on the information value (Morgan Sell and Soledad Galli)

New functionality#

  • The MeanEncoder can now implement smoothing during the encoding to handle high cardinality (Gleb Levitski)

  • The MeanEncoder can now encode unseen categories (Gleb Levitski)

  • The OrdinalEncoder can now encode unseen categories (Soledad Galli)

  • The CountFrequencyEncoder can now encode unseen categories (David Cortes)

  • All outlier transformers can now detect outliers based on the MAD rule (Gleb Levitski)

  • Add automatic calculation of PSI threshold in DropHighPSIFeatures (Gleb Levitski)

  • All feature selection transformers now have the method get_support() (Soledad Galli)

Bug fixes#

  • get_feature_names_out is now compatible with the Scikit-learn pipeline in all transformers (Soledad Galli)

  • The inverse_transform method in encoders now correctly handles unseen categories or raises not implemented errors (Soledad Galli)

  • Fixes output of SklearnTransformerWrapper for OneHotEncoder and PolynomialFeatures (Alfonso Tobar)

Documentation#

Deprecations#

  • Parameter errors in encoders is now replaced by unseen (Soledad Galli)

  • The classes MathematicalCombination, CombineWithFeatureReference and CyclicalTransformer are removed (Soledad Galli)

  • We are deprecating PRatioEncoder in version 1.5 and it will be removed in version 1.6 (Soledad Galli)

Code improvements#