# ArbitraryOutlierCapper¶

The ArbitraryOutlierCapper censors variable values at user pre-defined maximum and minimum values. For more details, read the API Reference below.

```import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.model_selection import train_test_split

from feature_engine import outlier_removers as outr

data = data.replace('?', np.nan)
data['cabin'] = data['cabin'].astype(str).str[0]
data['pclass'] = data['pclass'].astype('O')
data['embarked'].fillna('C', inplace=True)
data['fare'] = data['fare'].astype('float')
data['fare'].fillna(data['fare'].median(), inplace=True)
data['age'] = data['age'].astype('float')
data['age'].fillna(data['age'].median(), inplace=True)
return data

# Separate into train and test sets
X_train, X_test, y_train, y_test = train_test_split(
data.drop(['survived', 'name', 'ticket'], axis=1),
data['survived'], test_size=0.3, random_state=0)

# set up the capper
capper = outr.ArbitraryOutlierCapper(
max_capping_dict={'age': 50, 'fare': 200}, min_capping_dict=None)

# fit the capper
capper.fit(X_train)

# transform the data
train_t= capper.transform(X_train)
test_t= capper.transform(X_test)

capper.right_tail_caps_
```
```{'age': 50, 'fare': 200}
```
```train_t[['fare', 'age']].max()
```
```fare    200
age      50
dtype: float64
```

## API Reference¶

class `feature_engine.outlier_removers.``ArbitraryOutlierCapper`(max_capping_dict=None, min_capping_dict=None, missing_values='raise')[source]

The ArbitraryOutlierCapper() caps the maximum or minimum values of a variable by an arbitrary value indicated by the user.

The user must provide the maximum or minimum values that will be used to cap each variable in a dictionary {feature:capping value}

Parameters
• capping_max (dictionary, default=None) – user specified capping values on right tail of the distribution (maximum values).

• capping_min (dictionary, default=None) – user specified capping values on left tail of the distribution (minimum values).

• missing_values (string, default='raise') – Indicates if missing values should be ignored or raised. If missing_values=’raise’ the transformer will return an error if the training or other datasets contain missing values.

`fit`(X, y=None)[source]
Parameters
• X (pandas dataframe of shape = [n_samples, n_features]) – The training input samples.

• y (None) – y is not needed in this transformer. You can pass y or None.

`right_tail_caps\_`

The dictionary containing the maximum values at which variables will be capped.

Type

dictionary

`left_tail_caps\_`

The dictionary containing the minimum values at which variables will be capped.

Type

dictionary

`transform`(X)[source]

Caps the variable values, that is, censors outliers.

Parameters

X (pandas dataframe of shape = [n_samples, n_features]) – The data to be transformed.

Returns

X_transformed – The dataframe with the capped variables.

Return type

pandas dataframe of shape = [n_samples, n_features]