.. _make_pipeline: .. currentmodule:: feature_engine.pipeline make_pipeline ============= :class:`make_pipeline` is a shorthand for :class:`Pipeline`. While to set up :class:`Pipeline` we create tuples with step names and transformers or estimators, with :class:`make_pipeline` we just add a sequence of transformers and estimators, and the names will be added automatically. Setting up a Pipeline with make_pipeline ---------------------------------------- In the following example, we set up a `Pipeline` that drops missing data, then replaces categories with ordinal numbers, and finally fits a Lasso regression model. .. code:: python import numpy as np import pandas as pd from feature_engine.imputation import DropMissingData from feature_engine.encoding import OrdinalEncoder from feature_engine.pipeline import make_pipeline from sklearn.linear_model import Lasso X = pd.DataFrame( dict( x1=[2, 1, 1, 0, np.nan], x2=["a", np.nan, "b", np.nan, "a"], ) ) y = pd.Series([1, 2, 3, 4, 5]) pipe = make_pipeline( DropMissingData(), OrdinalEncoder(encoding_method="arbitrary"), Lasso(random_state=10), ) # predict pipe.fit(X, y) preds_pipe = pipe.predict(X) preds_pipe In the output we see the predictions made by the pipeline: .. code:: python array([2., 2.]) The names of the pipeline were assigned automatically: .. code:: python print(pipe) .. code:: python Pipeline(steps=[('dropmissingdata', DropMissingData()), ('ordinalencoder', OrdinalEncoder(encoding_method='arbitrary')), ('lasso', Lasso(random_state=10))]) The pipeline returned by :class:`make_pipeline` has exactly the same characteristics than :class:`Pipeline`. Hence, for additional guidelines, check out the :class:`Pipeline` documentation. Forecasting ----------- Let's set up another pipeline to do direct forecasting: .. code:: python import numpy as np import matplotlib.pyplot as plt import pandas as pd from sklearn.linear_model import Lasso from sklearn.metrics import root_mean_squared_error from sklearn.multioutput import MultiOutputRegressor from feature_engine.timeseries.forecasting import ( LagFeatures, WindowFeatures, ) from feature_engine.pipeline import make_pipeline We'll use the Australia electricity demand dataset described here: Godahewa, Rakshitha, Bergmeir, Christoph, Webb, Geoff, Hyndman, Rob, & Montero-Manso, Pablo. (2021). Australian Electricity Demand Dataset (Version 1) [Data set]. Zenodo. https://doi.org/10.5281/zenodo.4659727 .. code:: python url = "https://raw.githubusercontent.com/tidyverts/tsibbledata/master/data-raw/vic_elec/VIC2015/demand.csv" df = pd.read_csv(url) df.drop(columns=["Industrial"], inplace=True) # Convert the integer Date to an actual date with datetime type df["date"] = df["Date"].apply( lambda x: pd.Timestamp("1899-12-30") + pd.Timedelta(x, unit="days") ) # Create a timestamp from the integer Period representing 30 minute intervals df["date_time"] = df["date"] + \ pd.to_timedelta((df["Period"] - 1) * 30, unit="m") df.dropna(inplace=True) # Rename columns df = df[["date_time", "OperationalLessIndustrial"]] df.columns = ["date_time", "demand"] # Resample to hourly df = ( df.set_index("date_time") .resample("h") .agg({"demand": "sum"}) ) print(df.head()) Here, we see the first rows of data: .. code:: python demand date_time 2002-01-01 00:00:00 6919.366092 2002-01-01 01:00:00 7165.974188 2002-01-01 02:00:00 6406.542994 2002-01-01 03:00:00 5815.537828 2002-01-01 04:00:00 5497.732922 We'll predict the next 3 hours of energy demand. We'll use direct forecasting. Let's create the target variable: .. code:: python horizon = 3 y = pd.DataFrame(index=df.index) for h in range(horizon): y[f"h_{h}"] = df.shift(periods=-h, freq="h") y.dropna(inplace=True) df = df.loc[y.index] print(y.head()) This is our target variable: .. code:: python h_0 h_1 h_2 date_time 2002-01-01 00:00:00 6919.366092 7165.974188 6406.542994 2002-01-01 01:00:00 7165.974188 6406.542994 5815.537828 2002-01-01 02:00:00 6406.542994 5815.537828 5497.732922 2002-01-01 03:00:00 5815.537828 5497.732922 5385.851060 2002-01-01 04:00:00 5497.732922 5385.851060 5574.731890 Next, we split the data into a training set and a test set: .. code:: python end_train = '2014-12-31 23:59:59' X_train = df.loc[:end_train] y_train = y.loc[:end_train] begin_test = '2014-12-31 17:59:59' X_test = df.loc[begin_test:] y_test = y.loc[begin_test:] Next, we set up `LagFeatures` and `WindowFeatures` to create features from lags and windows: .. code:: python lagf = LagFeatures( variables=["demand"], periods=[1, 3, 6], missing_values="ignore", drop_na=True, ) winf = WindowFeatures( variables=["demand"], window=["3h"], freq="1h", functions=["mean"], missing_values="ignore", drop_original=True, drop_na=True, ) We wrap the lasso regression within the multioutput regressor to predict multiple targets: .. code:: python lasso = MultiOutputRegressor(Lasso(random_state=0, max_iter=10)) Now, we assemble `Pipeline`: .. code:: python pipe = make_pipeline(lagf, winf, lasso) print(pipe) The steps' names were assigned automatically: .. code:: python Pipeline(steps=[('lagfeatures', LagFeatures(drop_na=True, missing_values='ignore', periods=[1, 3, 6], variables=['demand'])), ('windowfeatures', WindowFeatures(drop_na=True, drop_original=True, freq='1h', functions=['mean'], missing_values='ignore', variables=['demand'], window=['3h'])), ('multioutputregressor', MultiOutputRegressor(estimator=Lasso(max_iter=10, random_state=0)))]) Let's fit the Pipeline: .. code:: python pipe.fit(X_train, y_train) Now, we can make forecasts for the test set: .. code:: python forecast = pipe.predict(X_test) forecasts = pd.DataFrame( pipe.predict(X_test), columns=[f"step_{i+1}" for i in range(3)] ) print(forecasts.head()) We see the 3 hr ahead energy demand prediction for each hour: .. code:: python step_1 step_2 step_3 0 8031.043352 8262.804811 8484.551733 1 7017.158081 7160.568853 7496.282999 2 6587.938171 6806.903940 7212.741943 3 6503.807479 6789.946587 7195.796841 4 6646.981390 6970.501840 7308.359237 To learn more about direct forecasting and how to create features, check out our courses: .. figure:: ../../images/fetsf.png :width: 300 :figclass: align-center :align: left :target: https://www.trainindata.com/p/feature-engineering-for-forecasting Feature Engineering for Time Series Forecasting .. figure:: ../../images/fwml.png :width: 300 :figclass: align-center :align: right :target: https://www.courses.trainindata.com/p/forecasting-with-machine-learning Forecasting with Machine Learning .. figure:: ../../images/feml.png :width: 300 :figclass: align-center :align: left :target: https://www.trainindata.com/p/feature-engineering-for-machine-learning Feature Engineering for Machine Learning | | | | | | | | | | .. figure:: ../../images/cookbook.png :width: 200 :figclass: align-center :align: left :target: https://packt.link/0ewSo Python Feature Engineering Cookbook | | | | | | | | | | | | | Both our book and course are suitable for beginners and more advanced data scientists alike. By purchasing them you are supporting Sole, the main developer of Feature-engine.