Feature-engine

A Python library for Feature Engineering and Selection#

Feature-engine rocks!#

Feature-engine is a Python library with multiple transformers to engineer and select features to use in machine learning models. Feature-engine preserves Scikit-learn functionality with methods fit() and transform() to learn parameters from and then transform the data.

Feature-engine includes transformers for:

Missing data imputation
Categorical encoding
Discretisation
Outlier capping or removal
Variable transformation
Variable creation
Variable selection
Datetime features
Time series
Preprocessing

Feature-engine allows you to select the variables you want to transform within each transformer. This way, different engineering procedures can be easily applied to different feature subsets.

Feature-engine transformers can be assembled within the Scikit-learn pipeline, therefore making it possible to save and deploy one single object (.pkl) with the entire machine learning pipeline. Check **Quick Start** for an example.

Pst! How did you find us?#

We want to share Feature-engine with more people. It’d help us loads if you tell us how you discovered us.

Then we’d know what we are doing right and which channels to use to share the love.

Please share your story by answering 1 quick question at this link . 😃

What is unique about Feature-engine?#

The following characteristics make Feature-engine unique:

Feature-engine contains the most exhaustive collection of feature engineering transformations.
Feature-engine can transform a specific group of variables in the dataframe.
Feature-engine returns dataframes, hence suitable for data exploration and model deployment.
Feature-engine is compatible with the Scikit-learn pipeline, Grid and Random search and cross validation.
Feature-engine automatically recognizes numerical, categorical and datetime variables.
Feature-engine alerts you if a transformation is not possible, e.g., if applying logarithm to negative variables or divisions by 0.

If you want to know more about what makes Feature-engine unique, check this article.

Installation#

Feature-engine is a Python 3 package and works well with 3.7 or later. Earlier versions are not compatible with the latest versions of Python numerical computing libraries.

The simplest way to install Feature-engine is from PyPI with pip:

$ pip install feature-engine

Note, you can also install it with a _ as follows:

$ pip install feature_engine

Feature-engine is an active project and routinely publishes new releases. To upgrade Feature-engine to the latest version, use pip like this:

$ pip install -U feature-engine

If you’re using Anaconda, you can install the Anaconda Feature-engine package:

$ conda install -c conda-forge feature_engine

Feature-engine features in the following resources#

Feature Engineering for Machine Learning, Online Course.
Feature Selection for Machine Learning, Online Course.
Feature Engineering for Time Series Forecasting, Online Course.
Python Feature Engineering Cookbook, book.
Feature Selection in Machine Learning with Python, book.

More learning resources in the **Learning Resources**.

Feature-engine’s Transformers#

Feature-engine hosts the following groups of transformers:

Missing Data Imputation: Imputers#

MeanMedianImputer: replaces missing data in numerical variables by the mean or median
ArbitraryNumberImputer: replaces missing data in numerical variables by an arbitrary number
EndTailImputer: replaces missing data in numerical variables by numbers at the distribution tails
CategoricalImputer: replaces missing data with an arbitrary string or by the most frequent category
RandomSampleImputer: replaces missing data by random sampling observations from the variable
AddMissingIndicator: adds a binary missing indicator to flag observations with missing data
DropMissingData: removes observations (rows) containing missing values from dataframe

Categorical Encoders: Encoders#

OneHotEncoder: performs one hot encoding, optional: of popular categories
CountFrequencyEncoder: replaces categories by the observation count or percentage
OrdinalEncoder: replaces categories by numbers arbitrarily or ordered by target
MeanEncoder: replaces categories by the target mean
WoEEncoder: replaces categories by the weight of evidence
DecisionTreeEncoder: replaces categories by predictions of a decision tree
RareLabelEncoder: groups infrequent categories
StringSimilarityEncoder: encodes categories based on string similarity

Variable Discretisation: Discretisers#

ArbitraryDiscretiser: sorts variable into intervals defined by the user
EqualFrequencyDiscretiser: sorts variable into equal frequency intervals
EqualWidthDiscretiser: sorts variable into equal width intervals
DecisionTreeDiscretiser: uses decision trees to create finite variables
GeometricWidthDiscretiser: sorts variable into geometrical intervals

Outlier Capping or Removal#

ArbitraryOutlierCapper: caps maximum and minimum values at user defined values
Winsorizer: caps maximum or minimum values using statistical parameters
OutlierTrimmer: removes outliers from the dataset

Numerical Transformation: Transformers#

LogTransformer: performs logarithmic transformation of numerical variables
LogCpTransformer: performs logarithmic transformation after adding a constant value
ReciprocalTransformer: performs reciprocal transformation of numerical variables
PowerTransformer: performs power transformation of numerical variables
BoxCoxTransformer: performs Box-Cox transformation of numerical variables
YeoJohnsonTransformer: performs Yeo-Johnson transformation of numerical variables
ArcsinTransformer: performs arcsin transformation of numerical variables

Feature Creation:#

MathFeatures: creates new variables by combining features with mathematical operations
RelativeFeatures: combines variables with reference features
CyclicalFeatures: creates variables using sine and cosine, suitable for cyclical features

Datetime:#

DatetimeFeatures: extract features from datetime variables
DatetimeSubtraction: computes subtractions between datetime variables

Feature Selection:#

DropFeatures: drops an arbitrary subset of variables from a dataframe
DropConstantFeatures: drops constant and quasi-constant variables from a dataframe
DropDuplicateFeatures: drops duplicated variables from a dataframe
DropCorrelatedFeatures: drops correlated variables from a dataframe
SmartCorrelatedSelection: selects best features from correlated groups
DropHighPSIFeatures: selects features based on the Population Stability Index (PSI)
SelectByInformationValue: selects features based on their information value
SelectByShuffling: selects features by evaluating model performance after feature shuffling
SelectBySingleFeaturePerformance: selects features based on their performance on univariate estimators
SelectByTargetMeanPerformance: selects features based on target mean encoding performance
RecursiveFeatureElimination: selects features recursively, by evaluating model performance
RecursiveFeatureAddition: selects features recursively, by evaluating model performance
ProbeFeatureSelection: selects features whose importance is greater than those of random variables

Forecasting:#

LagFeatures: extract lag features
WindowFeatures: create window features
ExpandingWindowFeatures: create expanding window features

Preprocessing:#

MatchCategories: ensures categorical variables are of type ‘category’
MatchVariables: ensures that columns in test set match those in train set

Scikit-learn Wrapper:#

SklearnTransformerWrapper: applies Scikit-learn transformers to a selected subset of features

Getting Help#

Can’t get something to work? Here are places where you can find help.

The **User Guide** in the docs.
Stack Overflow. If you ask a question, please mention “feature_engine” in it.
If you are enrolled in the Feature Engineering for Machine Learning course , post a question in a relevant section.
If you are enrolled in the Feature Selection for Machine Learning course , post a question in a relevant section.
Join our gitter community. You an ask questions here as well.
Ask a question in the repo by filing an issue (check before if there is already a similar issue created :) ).

Contributing#

Interested in contributing to Feature-engine? That is great news!

Feature-engine is a welcoming and inclusive project and we would be delighted to have you on board. We follow the Python Software Foundation Code of Conduct.

Regardless of your skill level you can help us. We appreciate bug reports, user testing, feature requests, bug fixes, addition of tests, product enhancements, and documentation improvements. We also appreciate blogs about Feature-engine. If you happen to have one, let us know!

For more details on how to contribute check the contributing page. Click on the **Contribute** guide.

Sponsor us#

Support Feature-engine financially via Github Sponsors and help further our mission to democratize machine learning tools through open-source software.

Open Source#

Feature-engine’s license is an open source BSD 3-Clause.

Feature-engine is hosted on GitHub. The issues and pull requests are tracked there.

Feature-engine#

A Python library for Feature Engineering and Selection#

Pst! How did you find us?#

What is unique about Feature-engine?#

Installation#

Feature-engine features in the following resources#

Feature-engine’s Transformers#

Missing Data Imputation: Imputers#

Categorical Encoders: Encoders#

Variable Discretisation: Discretisers#

Outlier Capping or Removal#

Numerical Transformation: Transformers#

Feature Creation:#

Datetime:#

Feature Selection:#

Forecasting:#

Preprocessing:#

Scikit-learn Wrapper:#

Getting Help#

Contributing#

Open Source#

Table of Contents#

This site uses cookies

Feature-engine#

A Python library for Feature Engineering and Selection#

Pst! How did you find us?#

What is unique about Feature-engine?#

Installation#

Feature-engine features in the following resources#

Feature-engine’s Transformers#

Missing Data Imputation: Imputers#

Categorical Encoders: Encoders#

Variable Discretisation: Discretisers#

Outlier Capping or Removal#

Numerical Transformation: Transformers#

Feature Creation:#

Datetime:#

Feature Selection:#

Forecasting:#

Preprocessing:#

Scikit-learn Wrapper:#

Getting Help#

Contributing#

Sponsor us#

Open Source#

Table of Contents#