Version 1.7.X#

Version 1.7.0#

Deployed: 24th March 2024

Contributors#

There are a few big additions in this new release. First, we introduce a new Pipeline that supports transformers that remove rows from the dataset during the data transformation. From now on, you can use DropMissingData, OutlierTrimmer, LagFeatures and WindowFeatures as part of a feature engineering pipeline that will transform your variables, re-align the target to the remaining rows if necessary, and then fit a model. All in one go!

In addition, transformers that remove rows from the dataset, like DropMissingData, OutlierTrimmer, LagFeatures and WindowFeatures, can now adjust the target value to the remaining rows through the new method transform_x_y.

The third big improvement consists in a massive speed optimization of our correlation transformers, which now find and remove correlated features at doble the speed, and also, let you easily identify the feature against which the correlated ones were determined.

Other than that, we did a lot of work to catch up with the latest developments of Scikit-learn and pandas, to ensure that our transformers keep on being compatible. That being said, we are a small team and maintenance is hard for us, so we’ve deprecated support of earlier releases of these libraries.

Read on to find out more what we’ve been up to!

New functionality#

  • We now have a Pipeline() and make_pipeline that support transformers that remove rows from the dataset (Soledad Galli)

  • DropMissingData, OutlierTrimmer, LagFeatures, ExpandingWindowFeatures and WindowFeatures have the method transform_x_y to remove rows from the data and then adjust the target variable (Soledad Galli)

Enhancements#

  • DropCorrelatedFeatures() and SmartCorrelationSelection have a new attribute to indicate which feature will be retained from each correlated group (Soledad Galli, dlaprins)

  • DropCorrelatedFeatures() and SmartCorrelationSelection are twice as fast and can sort variables based on variance, cardinality or alphabetically before the search (Soledad Galli, dlaprins)

  • LagFeatures can now impute the introduced nan values (Soledad Galli)

Bug fixes#

In addition to these bug fixes, we fixed other pandas, and scikit-learn new version and deprecation related bugs.

Code improvements#

Documentation#

Deprecations#

  • We remove support for Python 3.8 (Soledad Galli)

  • We bump the dependencies on pandas and Scikit-learn to their latest versions (Soledad Galli)