Feature-engine: A Feature Engineering for Machine Learning library¶
Feature-engine is a Python library that contains several transformers to engineer features for use in machine learning models. Feature-engine preserves Scikit-learn functionality with fit() and transform() methods to learn parameters from and then transform data.
Feature-engine includes transformers for:
- Missing value imputation
- Categorical variable encoding
- Outlier capping
- Numerical variable transformation
Feature-engine allows to select which variables to engineer within each transformer.
Feature-engine’s transformers can be assembled within the Scikit-learn pipeline, therefore making it possible to save and deploy one single object (.pkl) with the entire machine learning pipeline.
Feature-engine is a Python 3 package and works well with 3.5 or later. Earlier versions have not been tested. The simplest way to install Feature-engine is from PyPI with pip, Python’s preferred package installer.
$ pip install feature-engine
Interested in contributing to Feature-engine? That is great news! Feature-engine is a welcoming and inclusive project and it would be great to have you onboard. We follow the Python Software Foundation Code of Conduct.
Regardless of your skill level you can help us. We appreciate bug reports, user testing, feature requests, bug fixes, addition of tests, product enhancements, and documentation improvements.
More details on how to contribute will come soon! Meanwhile, feel free to fork the Github repo and make pull requests, create an issue, or send feedback. More details on how to reach us in the Getting help section below.
Thank you for your contributions!
Missing Data Imputation: Imputers¶
- MeanMedianImputer: replaces missing data in numerical variables by mean or median
- ArbitraryNumberImputer: replaces missing data in numerical variables by an arbitrary value
- EndTailImputer: replaces missing data in numerical variables by numbers at the distribution tails
- CategoricalVariableImputer: replaces missing data in categorical variables with the string ‘Missing’
- FrequentCategoryImputer: replaces missing data in categorical variables by the mode
- RandomSampleImputer: replaces missing data with random samples of the variable
- AddNaNBinaryImputer: adds a binary missing indicator to flag observations with missing data
Categorical Variable Encoders: Encoders¶
- OneHotCategoricalEncoder: performs one hot encoding, optional: of popular categories
- CountFrequencyCategoricalEncoder: replaces categories by observation number or percentage
- OrdinalCategoricalEncoder: replaces categories by numbers arbitrarily or ordered by target
- MeanCategoricalEncoder: replaces categories by the target mean
- WoERatioCategoricalEncoder: replaces categories by the weight of evidence
- RareLabelCategoricalEncoder: groups infrequent categories in one group
Numerical Variable Transformation: Transformers¶
- LogTransformer: perform logarithmic transformation of numerical variables
- ReciprocalTransformer: perform reciprocal transformation of numerical variables
- PowerTransformer: perform power transformation of numerical variables
- BoxCoxTransformer: performs Box-Cox transformation of numerical variables
- YeoJohnsonTransformer: performs Yeo-Johnson transformation of numerical variables
Variable Discretisation: Discretisers¶
Can’t get someting to work? Here are places you can find help.
- The docs (you’re here!).
- Stack Overflow. If you ask a question, please tag it with “feature-engine”.
- If you are enrolled in the Feature Engineering for Machine Learning course in Udemy, post a question in a relevant section.
Feature-engine’s license is an open source BSD 3-Clause.
- Quick Start
- Missing data imputation: imputers
- Categorical variable encoding: encoders
- Variable transformation: variable transformers
- Variable discretisation: discretisers
- Outlier capping: cappers