DropFeatures

The DropFeatures() drops a list of variables from the original dataframe. The user can pass a single variable as a string or list of variables to be dropped.

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine.feature_selection import DropFeatures

# Load dataset
def load_titanic():
        data = pd.read_csv('https://www.openml.org/data/get_csv/16826755/phpMYEkMl')
        data = data.replace('?', np.nan)
        data['cabin'] = data['cabin'].astype(str).str[0]
        data['pclass'] = data['pclass'].astype('O')
        data['embarked'].fillna('C', inplace=True)
        return data

# load data as pandas dataframe
data = load_titanic()

# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
            data.drop(['survived', 'name', 'ticket'], axis=1),
            data['survived'], test_size=0.3, random_state=0)

# set up the transformer
transformer = DropFeatures(
    features_to_drop=['survived', 'name', 'sibsp', 'parch',
                      'ticket', 'fare', 'body', 'home.dest']
)

# fit the transformer
transformer.fit(X_train)

# transform the data
train_t = transformer.transform(X_train)

train_t.columns
Index(['pclass', 'sex', 'age', 'cabin', 'embarked' 'boat'],
      dtype='object')

API Reference

class feature_engine.feature_selection.DropFeatures(features_to_drop=None)[source]

DropFeatures() drops the list of variable(s) indicated by the user from the original dataframe and returns the remaining variables.

Parameters

features_to_drop (str or list, default=None) – Variable(s) to be dropped from the dataframe

fit(X, y=None)[source]

Verifies that the input X is a pandas dataframe

Parameters
  • X (pandas dataframe of shape = [n_samples, n_features]) – The input dataframe

  • y (None) – y is not needed for this transformer. You can pass y or None.

transform(X)[source]

Drops the variable or list of variables indicated by the user from the original dataframe and returns a new dataframe with the remaining subset of variables.

Parameters

X (pandas dataframe) – The input dataframe from which features will be dropped

Returns

X_transformed – The transformed dataframe with the remaining subset of variables.

Return type

pandas dataframe of shape = [n_samples, n_features - len(features_to_drop)]