Version 1.3.X#
Version 1.3.0#
Deployed: 5th May 2022
Contributors#
In this release, we add the get_feature_names_out functionality to all our transformers! You asked for it, we delivered :)
In addition, we introduce a new module for time series forecasting. This module will host transformers that create features suitable for, well…, time series forecasting. We created three new transformers: LagFeatures, WindowFeatures and ExpandingWindowFeatures. We had the extraordinary support from Kishan Manani who is an experienced forecaster, and Morgan Sell who helped us draft the new classes. Thank you both for the incredible work!
We also improved the functionality of our feature creation classes. To do this, we are
deprecating our former classes, MathematicalCombination
and CombineWithFeatureReference
,
which are a bit of a mouthful, for the new classes MathFeatures
and RelativeFeatures
.
We are also renaming the class CyclicalTransformer
to CyclicalFeatures
.
We’ve also enhanced the functionality of the SelectByTargetMeanPerformance
and
SklearnTransformerWrapper
.
In addition, we’ve had some bug reports and bug fixes that we list below, and a number of enhancements to our current classes.
Thank you so much to all contributors to this release for making this massive release possible!
New modules#
- timeseries-forecasting: this module hosts transformers that create features suitable for time series forecasting (Morgan Sell, Kishan Manani and Soledad Galli)
LagFeatures
WindowFeatures
ExpandingWindowFeatures
New transformers#
LagFeatures: adds lag versions of the features (Morgan Sell, Kishan Manani and Soledad Galli)
WindowFeatures: creates features from operations on past time windows (Morgan Sell, Kishan Manani and Soledad Galli)
ExpandingWindowFeatures: creates features from operations on all past data (Kishan Manani)
MathFeatures: replaces
MathematicalCombination
and expands its functionality (Soledad Galli)RelativeFeatures: replaces
CombineWithFeatureReference
and expands its functionality (Soledad Galli)CyclicalFeatures: new name for
CyclicalTransformer
with same functionality (Soledad Galli)
New functionality#
All our transformers have now the
get_feature_names_out
functionality to obtain the names of the output features (Alejandro Giacometti, Morgan Sell and Soledad Galli)SelectByTargetMeanPerformance now uses cross-validation and supports all possible performance metrics for classification and regression (Morgan Sell and Soledad Galli)
Enhancements#
All our feature selection transformers can now check that the variables were not dropped in a previous selection step (Gilles Verbockhaven)
The
DecisionTreeDiscretiser
and theDecisionTreeEncoder
now check that the user enters a target suitable for regression or classification (Morgan Sell)The
DecisionTreeDiscretiser
and theDecisionTreeEncoder
now accept all sklearn cross-validation constructors (Soledad Galli)The
SklearnTransformerWrapper
now implements the methodinverse_transform
(Soledad Galli)The
SklearnTransformerWrapper
now supports additional transformers, for example, PolynomialFeatures (Soledad Galli)The
CategoricalImputer()
now let’s you know which variables have more than one mode (Soledad Galli)The
DatetimeFeatures()
now can extract features from the dataframe index (Edoardo Argiolas)Transformers that take y now check that X and y match (Noah Green and Ben Reiniger)
Bug fixes#
The
SklearnTransformerWrapper
now works with cross-validation when using the one hot encoder (Noah Green)The
SelectByShuffling
now evaluates the initial performance and the performance after shuffling in the same data parts (Gilles Verbockhaven)Discretisers: when setting
return_boundaries=True
the interval limits are now returned as strings and the variables as object data type (Soledad Galli)
DecisionTreeEncoder
now enforces passing y tofit()
(Soledad Galli)
DropMissingData
can now take a string in thevariables
parameter (Soledad Galli)
DropFeatures
now accepts a string as input of the features_to_drop parameter (Noah Green)Categorical encoders now work correctly with numpy arrays as inputs (Noah Green and Ben Reiniger)
Documentation#
Improved user guide for
SelectByTargetMeanPerformance
with lots of tips for troubleshooting (Soledad Galli)Added guides on how to use
MathFeatures
andRelativeFeatures
(Soledad Galli)Expanded user guide on how to use
CyclicalFeatures
with explanation and demos of what these features are (Soledad Galli)Added a Jupyter notebook with a demo of the
CyclicalFeatures
class (Soledad Galli)We now display all available methods in the documentation methods summary (Soledad Galli)
Fixes typo in
ArbitraryNumberImputer
documentation (Tim Vink)
Deprecations#
We are deprecating
MathematicalCombination
,CombineWithFeatureReference
andCyclicalTransformer
in version 1.3 and they will be removed in version 1.4Feature-engine does not longer work with Python 3.6 due to dependence on latest versions of Scikit-learn
In
MatchColumns
the attributeinput_features_
was replaced byfeature_names_in_
to adopt Scikit-learn convention
Code improvements#
Imputers: removed looping over every variable to replace NaN. Now passing imputer dictionary to
pd.fillna()
(Soledad Galli)
AddMissingIndicators
: removed looping over every variable to add missing indicators. Now usingpd.isna()
(Soledad Galli)
CategoricalImputer
now captures all modes in one go, without looping over variables (Soledad Galli)Removed workaround to import docstrings for
transform()
method in various transformers (Soledad Galli)
For developers#
Created functions and docstrings for common descriptions of methods and attributes (Soledad Galli)
We introduce the use of common tests that are applied to all transformers (Soledad Galli)
Experimental#
New experimental, currently private module: prediction, that hosts classes that are used by the SelectByTargetMeanPerformance
feature selection transformer. The estimators in this module have functionality that exceed that required by the selector,
in that, they can output estimates of the target by taking the average across a group of variables.
New private module, prediction with a regression and a classification estimator (Morgan Sell and Soledad Galli)
TargetMeanRegressor
: estimates the target based on the average target mean value per class or interval, across variables (Morgan Sell and Soledad Galli)
TargetMeanClassifier
: estimates the target based on the average target mean value per class or interval, across variables (Morgan Sell and Soledad Galli)