Version 1.5.X ============= Version 1.5.2 ------------- Deployed: 21th November 2022 Contributors ~~~~~~~~~~~~ - `Gleb Levitski `_ - `Alfonso Tobar `_ - `pxn39 `_ - `Soledad Galli `_ In this release, we expand the functionality of existing classes and the documentation. New functionality ~~~~~~~~~~~~~~~~~ - The `StringSimilarityEncoder` can now create similarity variables based on keywords entered by the user (`Gleb Levitski `_) - The `Winsorizer` and `OutlierTrimmer` now automatically adjust the value of the `fold` parameter based on the `capping_method` (`pxn39 `_) Bug fixes ~~~~~~~~~ - Type checks errors raised by newer versions (`Gleb Levitski `_) Documentation ~~~~~~~~~~~~~ - Add example code snippets to the categorical encoding API docs (`Alfonso Tobar `_) - Add example code snippets to the imputation module API docs (`Alfonso Tobar `_) - Add example code snippets to the discretisation module API docs (`Alfonso Tobar `_) - Add example code snippets to the creation module API docs (`Alfonso Tobar `_) - Add example code snippets to the datetime module API docs (`Alfonso Tobar `_) - Update the user guide docs for the forecasting feature transformers (`Soledad Galli `_) - Update the user guide docs for datetime features and cyclical features (`Soledad Galli `_) - Fix badges in README (`Gleb Levitski `_) Version 1.5.0 ------------- Deployed: 17th October 2022 Contributors ~~~~~~~~~~~~ - `Gleb Levitski `_ - `David Cortes `_ - `Alfonso Tobar `_ - `Morgan Sell `_ - `Soledad Galli `_ In this release, we fix a bug that made the `get_feature_names_out` not compatible with the Scikit-learn pipeline. In addition, thanks to `Gleb Levitski `_, we've got a new encoder to replace categories by string similarity variables. `Gleb Levitski `_ also made a number of code enhancements to various transformers across the library, making a lot of new functionality available. Finally, we'd like to thank `Alfonso Tobar `_, `David Cortes `_ and `Morgan Sell `_ for creating new transformers, fixing bugs and expanding the functionality of Feature-engine. Thank you so much to all contributors and to those of you who created issues flagging bugs or requesting new functionality. New transformers ~~~~~~~~~~~~~~~~ - **StringSimilarityEncoder**: encodes categorical variables based on string similarity (`Gleb Levitski `_) - **MatchCategories**: matches the categories in train and test set when of type pandas categorical (`David Cortes `_) - **SelectByInformationValue**: selects features based on the information value (`Morgan Sell `_ and `Soledad Galli `_) New functionality ~~~~~~~~~~~~~~~~~ - The `MeanEncoder` can now implement smoothing during the encoding to handle high cardinality (`Gleb Levitski `_) - The `MeanEncoder` can now encode unseen categories (`Gleb Levitski `_) - The `OrdinalEncoder` can now encode unseen categories (`Soledad Galli `_) - The `CountFrequencyEncoder` can now encode unseen categories (`David Cortes `_) - All outlier transformers can now detect outliers based on the MAD rule (`Gleb Levitski `_) - Add automatic calculation of PSI threshold in `DropHighPSIFeatures` (`Gleb Levitski `_) - All feature selection transformers now have the method `get_support()` (`Soledad Galli `_) Bug fixes ~~~~~~~~~ - `get_feature_names_out` is now compatible with the Scikit-learn pipeline in all transformers (`Soledad Galli `_) - The `inverse_transform` method in encoders now correctly handles unseen categories or raises not implemented errors (`Soledad Galli `_) - Fixes output of `SklearnTransformerWrapper` for `OneHotEncoder` and `PolynomialFeatures` (`Alfonso Tobar `_) Documentation ~~~~~~~~~~~~~ - Add more resources to documentation (`Soledad Galli `_) - User guide for `StringSimilarityEncoder` (`Gleb Levitski `_) - New Jupyter notebook for `StringSimilarityEncoder` (`Gleb Levitski `_) - User guide for SelectByInformationValue (`Morgan Sell `_ and `Soledad Galli `_) Deprecations ~~~~~~~~~~~~ - Parameter `errors` in encoders is now replaced by `unseen` (`Soledad Galli `_) - The classes `MathematicalCombination`, `CombineWithFeatureReference` and `CyclicalTransformer` are removed (`Soledad Galli `_) - We are deprecating `PRatioEncoder` in version 1.5 and it will be removed in version 1.6 (`Soledad Galli `_) Code improvements ~~~~~~~~~~~~~~~~~ - Adds code coverage test (`Soledad Galli `_) - Changes logic of encoding unseen categories to work with inverse_transform (`Soledad Galli `_) - Increases code coverage for encoders (`Soledad Galli `_) - Removes CategoricalInitExpandedMixin (`Soledad Galli `_) - Removes checks for encoding dictionaries in all encoders (`Soledad Galli `_) - Refactors creation module (`Soledad Galli `_) - Refactors docstring module (`Soledad Galli `_) - Refactors variable handling module (`Soledad Galli `_) - Refactors numerical dictionary checks (`Soledad Galli `_) - Refactors base transformers module (`Soledad Galli `_) - Makes dataframe checks more performant (`Soledad Galli `_) - Replaces pd.concat by pd.group in all target based encoders (`Soledad Galli `_)