LazyProphet: Time Series Forecasting with LightGBM

When we think about boosted trees for time series, the M5 competition usually comes to mind, where a significant portion of the top ten used LightGBM. But when boosting trees are used in the univariate case, their performance is very poor since there are not a lot of exogenous features to exploit.

The first thing to be clear is that DID, the runner-up in the M4 competition, used boosted trees. But it acts as a meta-model to integrate other more traditional time series methods. In the code exposed on M4, benchmarks for all standard boosted trees are pretty bad, sometimes falling short of traditional prediction methods. Here is an excellent job done by the Sktime package and their paper [1]:
insert image description here

Any model with "XGB" or "RF" uses tree-based ensemble. In the list above Xgboost gave the best result of 10.9 in the hourly dataset! Then, but these models are just a simple attempt by Sktime in their framework, and the winner of M4 has a score of 9.3 on the same dataset… . There are a few numbers we need to keep in mind in this graph, such as 10.9 from the hourly dataset for XGB-s and the "best" result for the tree model on the weekly dataset: 9.0 from RF-ts.

The above figure leads us to our goal: to create a rapid modeling program of LightGBM-based time series suitable for personal use, and able to absolutely surpass these numbers, and be comparable to traditional statistical methods in terms of speed.

It sounds difficult, and our first thought might be that our tree has to be optimized. But boosting trees is complex, changes are time-consuming, and the results are not necessarily effective. But one advantage is that we are fitting a single data set, can we start with features?

recommended article

feature

Some feature engineering is seen when looking at other implementations of trees in a univariate space, such as binning, using lag values ​​for targets, simple counters, seasonal dummy variables, and maybe Fourier functions. This is great for using traditional exponential smoothing etc. But our purpose today is to characterize the time element and represent it as tabular data to provide to the tree model, and LazyProphet appears at this time. In addition to this, LazyProphet includes an additional feature engineering element: "connecting" the points.

Very simple, connect the first point of the time series, and connect a line to another point halfway, and then connect the halfway point to the last point. Repeat a few times, while changing which point is used as the "kink" (intermediate node), which is what we call "connection".

The picture below illustrates this well. The blue line is the time series, the other lines are just "connecting the dots":

insert image description here

It turns out that these are just weighted piecewise linear basis functions. A disadvantage of this is that the extrapolation of these lines can be skewed. To fix this, introduce a "decay" factor that penalizes the slope of each line from the midpoint to the last point.

Adding a lag target value and Fourier basis function to this base can approach state-of-the-art performance on some problems. We call it "LazyProphet" because there are few requirements.

Let's take a look at the actual application results.

code

The datasets used here are all open source and published on the M-competitions github. The data has been split into train and test sets, we directly use the training csv for fitting and the test csv for evaluation using SMAPE. Now import LazyProphet:

pip install LazyProphet

Once installed, start coding:

import matplotlib.pyplot as plt
import numpy as np
from tqdm import tqdm
import pandas as pd
from LazyProphet import LazyProphet as lp

train_df = pd.read_csv(r'm4-weekly-train.csv')
test_df = pd.read_csv(r'm4-weekly-test.csv')
train_df.index = train_df['V1']
train_df = train_df.drop('V1', axis = 1)
test_df.index = test_df['V1']
test_df = test_df.drop('V1', axis = 1)

Weekly data will be read in after importing all necessary packages. Create a SMAPE function that will return the SMAPE for a given forecast and actual value:

def smape(A, F):
  return 100/len(A) * np.sum(2 * np.abs(F - A) / (np.abs(A) +       np.abs(F)))

For this experiment an average of all time series will be taken to compare with other models. For a sanity check, we also get the average SMAPE, which ensures that what is done is consistent with what was done in the competition.

smapes = []
naive_smape = []
j = tqdm(range(len(train_df)))
for row in j:
  y = train_df.iloc[row, :].dropna()
  y_test = test_df.iloc[row, :].dropna()
  j.set_description(f'{np.mean(smapes)}, {np.mean(naive_smape)}')
  lp_model = LazyProphet(scale=True,
                          seasonal_period=52,
                          n_basis=10,
                          fourier_order=10,
                          ar=list(range(1, 53)),
                          decay=.99,
                          linear_trend=None,
                          decay_average=False)
  fitted = lp_model.fit(y)
  predictions = lp_model.predict(len(y_test)).reshape(-1)
  smapes.append(smape(y_test.values,     pd.Series(predictions).clip(lower=0)))
  naive_smape.append(smape(y_test.values, np.tile(y.iloc[-1], len(y_test))))  
print(np.mean(smapes))
print(np.mean(naive_smape))

Before looking at the results, a quick introduction to the LazyProphet parameter.

  • scale: This is very simple, just whether to scale the data. The default value is True.

  • seasonal_period: This parameter controls the Fourier basis function for seasonality, as this is the weekly frequency we use 52.

  • n_basis: This parameter controls the weighted piecewise linear basis function. This is just an integer for the number of functions to use.

  • Fourier_order: The number of sine and cosine pairs used for seasonality.

  • ar: The lagged target variable value to use. Multiple lists 1-52 can be obtained.

  • decay: The decay factor used to penalize the "right side" of our basis function. A setting of 0.99 means the slope is multiplied by (1- 0.99) or 0.01.

  • linear_trend: A major disadvantage of trees is that they cannot infer the extent of subsequent data. To overcome this, there are some ready-made tests for polynomial trends that will fit a linear regression to remove the trend. None for testing, True for always detrending, False for no testing and no linear trending.

  • decay_average: Not a useful parameter when using decay rate. It's a trick but don't use it. Passing True just averages all future values ​​of the basis function. This is useful when fitting with elasticnet programs, but less useful for LightGBM in testing.

Let's continue processing the data:

train_df = pd.read_csv(r'm4-hourly-train.csv')
test_df = pd.read_csv(r'm4-hourly-test.csv')
train_df.index = train_df['V1']
train_df = train_df.drop('V1', axis = 1)
test_df.index = test_df['V1']
test_df = test_df.drop('V1', axis = 1)

smapes = []
naive_smape = []
j = tqdm(range(len(train_df)))
for row in j:
  y = train_df.iloc[row, :].dropna()
  y_test = test_df.iloc[row, :].dropna()
  j.set_description(f'{np.mean(smapes)}, {np.mean(naive_smape)}')
  lp_model = LazyProphet(seasonal_period=[24,168],
                          n_basis=10,
                          fourier_order=10,
                          ar=list(range(1, 25)),
                          decay=.99)
  fitted = lp_model.fit(y)
  predictions = lp_model.predict(len(y_test)).reshape(-1)
  smapes.append(smape(y_test.values, pd.Series(predictions).clip(lower=0)))
  naive_smape.append(smape(y_test.values, np.tile(y.iloc[-1], len(y_test))))  
print(np.mean(smapes))
print(np.mean(naive_smape))

So what really needs to be modified is the seasonal_period and ar parameters. When you pass a list to seasonal_period, it builds a seasonal basis function for everything in the list. ar adjusted to fit the new main season 24.

result

For the Sktime results above, the table looks like this:
insert image description here

LazyProphet beats Sktime's best model, which includes several different tree-based methods. Lost to the winner of M4 on the hourly dataset, but on average outperformed ES-RNN overall. The important thing to realize here is that this has only been done with default parameters...

boosting_params = {
    
    
                  "objective": "regression",
                  "metric": "rmse",
                  "verbosity": -1,
                  "boosting_type": "gbdt",
                  "seed": 42,
                  'linear_tree': False,
                  'learning_rate': .15,
                  'min_child_samples': 5,
                  'num_leaves': 31,
                  'num_iterations': 50
                  }

You can pass a dictionary of your parameters when creating the LazyProphet class, which can be optimized for each time series for more benefits.

Compare our results with the goals mentioned above:

  • Zero parameter optimization (slightly modified for different seasonality)

  • Fit each time series separately

  • Predictions were generated "lazy" in under a minute on my local machine.

  • beats all other tree methods in benchmarks

It seems to be very successful so far, but the success may not be fully replicated because the amount of data in other datasets is much smaller, so our method tends to degrade performance significantly. According to the test LazyProphet performs better on high frequency and large amount of data, but LazyProphet is also a good choice for time series modeling, we can test it without taking much time to code, it is well worth the time.

Quote:

[1] Markus Löning, Franz Király: “Forecasting with sktime: Designing sktime’s New Forecasting API and Applying It to Replicate and Extend the M4 Study”, 2020; arXiv:2005.08067

Technology Exchange

Welcome to reprint, collect, like and support!

insert image description here

At present, a technical exchange group has been opened, with more than 2,000 members . The best way to remark when adding is: source + interest direction, which is convenient to find like-minded friends

  • Method 1. Send the following picture to WeChat, long press to identify, and reply in the background: add group;
  • Method ②, add micro-signal: dkl88191 , note: from CSDN
  • Method ③, WeChat search public account: Python learning and data mining , background reply: add group

long press follow

Guess you like

Origin blog.csdn.net/weixin_38037405/article/details/123324318