Neural Network Time Series Forecasting PyTorch-Forecastin!

ce5a15813fea8f51506f6add43a36735.png

来源:数据STUDIO 深度学习初学者
本文约5200字,建议阅读8分钟
本文为你介绍了神经网络时间序列预测PyTorch-Forecastin。

e341df2d0ce948ec30d9a18a68477e91.png

PyTorch-Forecasting [1] time series forecasting using neural networks is simple for data scientists and researchers.

Why is accurate forecasting so important?

Forecasting time series is important in many contexts and is highly relevant to machine learning practitioners. Take demand forecasting as an example, from which many use cases originate. Almost every manufacturer would benefit from a better understanding of the demand for their products in order to optimize production quantities. Underproduce and you'll lose income, overproduce and you'll be forced to sell the excess at a discount. Very related is pricing, which is basically a demand forecast with a particular focus on price elasticity. Pricing is relevant to almost all companies.

Time is of the essence for a host of additional machine learning applications: predictive maintenance, risk scoring, fraud detection, and more. The sequence of events and the timing between them is critical to creating a reliable forecast.

In fact, while time series forecasting may not be as shiny as image recognition or language processing, it is more common in industry. That's because image recognition and language processing are relatively new fields that are often used to power new products, while prediction has been around for decades and is at the heart of many decision-making (support) systems. Adopting high-precision machine learning models, such as those in PyTorch Forecasting, can better support decision-making and even automate it, often directly resulting in millions of dollars in additional profits.

Deep learning emerges as a powerful predictive tool

Deep learning (neural networks) has surpassed traditional methods in time series forecasting in recent years, and to a lesser extent image and language processing. In fact, deep learning surpassed traditional statistical methods only two years ago in predicting pure time series (meaning there are no covariates, e.g., price versus demand). However, with the rapid development of the field, the accuracy advantages associated with neural networks have become so evident that they warrant their increased use in time series forecasting. For example, the state-of-the-art architecture N-BEATS [2] shows an 11% drop in sMAPE on the M4 competition dataset compared to the next-best non-neural network method (i.e., an ensemble of statistical methods). This network is also implemented in PyTorch Forecasting [3] .

Furthermore, deep learning has two advantages even compared to other popular machine learning algorithms such as gradient boosted trees. First, neural network architectures can be designed with an inherent understanding of time, that is, they automatically make connections between data points that are close in time. Therefore, they can capture complex temporal dependencies. In contrast, traditional machine learning models require manual creation of time-series features such as averages over the past x days. This weakens the ability of these traditional machine learning algorithms to model temporal dependencies. Second, most tree-based models output a step function by design. As such, they cannot predict the marginal effects of input changes and, moreover, are notoriously unreliable in out-of-domain predictions. For example, if we only observe prices of 30€ and 50€, a tree-based model cannot assess the effect of changing the price from 30€ to 35€ on demand. As such, they often cannot be used directly to optimize inputs. However, that's often the whole point of creating a machine learning model -- the value is in optimizing covariates. At the same time, neural networks employ continuous activation functions and are particularly good at interpolating in high-dimensional spaces, that is, they can be used to optimize inputs such as prices.

What is PyTorch Forecasting?

38f8272930e9103467eb7b1faef0cbf9.png

PyTorch Forecasting aims to simplify time series forecasting using neural networks for real world cases and studies. It can be easily trained with pandas data by providing a state-of-the-art time series forecasting architecture.

  • The high-level API greatly reduces the user's workload, because the user does not need to have specific knowledge of how to use PyTorch to prepare the training data set. The TimeSeriesDataSet class takes care of variable transformations, missing values, random subsampling, multiple history lengths, and more. You just need to provide the pandas dataframe and specify which variables the model should learn from.

  • The BaseModel class provides general visualization capabilities, such as showing predictions versus actuals and partial dependency graphs. The training progress in the form of indicators and examples can be automatically recorded in tensorboard [4] .

  • State-of-the-art networks were used for prediction with and without covariates. They also come with specialized built-in interpretability. For example, Temporal Fusion Transformer [5][3], beats Amazon's DeepAR by 36-69% on benchmarks with variable and temporal importance measures. You can see more of this in the examples below.

  • There are a number of multi-horizon time-series metrics that can evaluate forecasts for multiple forecast horizons.

  • For scalability, these networks are designed to be used with PyTorch Lightning [6] , allowing out-of-the-box training on CPUs and single and multiple (distributed) GPUs. The Ranger optimizer is implemented for faster model training.

  • To facilitate experimentation and research, adding networks is simple. The code was designed explicitly with PyTorch experts in mind. They will find that even complex ideas come true easily. In fact, one only needs to inherit the BaseModel class and follow the conventions of forward method input and output, and the recording and explaining functions can be enabled immediately.

To get started, a detailed tutorial in the documentation shows the end-to-end workflow. I will also discuss a specific example later in this article.

Why do we need this package?

PyTorch Forecasting helps overcome important barriers to using deep learning. While deep learning has taken hold in image and language processing, the same has not been true for time series forecasting. The field is still dominated by traditional statistical methods such as ARIMA and machine learning algorithms such as gradient boosting, with the exception of Bayesian models. There are two reasons why deep learning has not yet become mainstream for time series forecasting, all of which can already be overcome:

  1. Training a neural network almost always requires a GPU, which isn't always readily available. Hardware requirements are often a significant hurdle. However, this hurdle can be overcome by moving computing to the cloud.

  2. Compared with traditional methods, neural networks are relatively difficult to use. This is especially true for time series forecasting. There is currently a lack of high-level APIs for use with popular frameworks such as Facebook's PyTorch or Google's Tensorflow. For traditional machine learning, there exists the sci-kit learn ecosystem, which provides practitioners with a standardized interface.

Given that its user-unfriendliness requires extensive software engineering, this third hurdle is considered critical in the deep learning community.

273ba0fafdf641406f4a19a42f2ccf74.png

A typical testimonial from a deep learning practitioner

In short, PyTorch Forecasting aims to do what fast.ai [7] did for image recognition and natural language processing. This has greatly facilitated the diffusion of neural networks from academia to the real world. PyTorch Forecasting provides PyTorch with a high-level API and directly uses the pandas data frame to do corresponding work for time series forecasting. For the convenience of learning, unlike fast.ai, this package does not create a brand new API, but builds on the well-established PyTorch and PyTorch Lightning APIs.

How to use PyTorch Forecasting?

This small example demonstrates the power of the package and its most important abstraction. we will

  1. Create a training and validation dataset,

  2. Train the Temporal Fusion Transformer [8]. It's an architecture developed by Oxford University and Google that beats Amazon's DeepAR,

  3. Check the results on the validation set and explain the trained model.

Note: The code below only works with version 0.4.1 of PyTorch Forecasting and version 0.9.0 of PyTorch Lightning. Minimal modifications are required to run in the latest release. A full tutorial [9] with the latest code .


Create datasets for training and validation

First, we need to convert our time series into a pandas data frame, where each row can be identified by a time step and a time series. Fortunately, most datasets are already in this format. In this paper, we will use Kaggle's Stallion dataset [10] to describe the sales of various beverages. Our task is to make a six-month forecast of the sales of an establishment (i.e. a store) by stock keeping unit (SKU), i.e. product. There are about 21000 months of historical sales records. In addition to historical sales records, we also have information on sales prices, agency locations, special days such as holidays, and sales volumes for the industry as a whole.

from pytorch_forecasting.data.examples 
import get_stallion_data
data = get_stallion_data()  # load data as pandas dataframe

This dataset already has the correct format, but some important features are missing. Most importantly, we need to add a time index, which is incremented by one every time step. Also, it would be beneficial to add a date feature, which in this case means extracting the month from the date record.

#添加时间索引
data["time_idx"] = data["date"].dt.year * 12 + data["date"].dt.monthdata["time_idx"] -= data["time_idx"].min()
# 添加额外的特征
# 类别必须是字符串
data["month"] = data.date.dt.month.astype(str).astype("category")
data["log_volume"] = np.log(data.volume + 1e-8)
data["avg_volume_by_sku"] = (
   data
   .groupby(["time_idx", "sku"], observed=True)
   .volume.transform("mean")
)
data["avg_volume_by_agency"] = (
   data
   .groupby(["time_idx", "agency"], observed=True)
   .volume.transform("mean")
)
# 我们想把特殊的日子编码为一个变量,因此需要先把一个热度倒过来。
# 因此需要先进行反向的一键编码
special_days = [
   "easter_day", "good_friday", "new_year", "christmas",
   "labor_day", "independence_day", "revolution_day_memorial",
   "regional_games", "fifa_u_17_world_cup", "football_gold_cup",
   "beer_capital", "music_fest"
]data[special_days] = (
   data[special_days]
   .apply(lambda x: x.map({0: "-", 1: x.name}))
   .astype("category")
)
# 显示样本数据
data.sample(10, random_state=521)

d7fe96067853632e8426f67a902e0c29.png

Randomly extract rows from a data frame

The next step is to convert the data frame into a PyTorch Forecasting dataset. In addition to telling the dataset which features are categorical, which are continuous, which are static, and which change over time, we must also decide how to normalize the data. Here, we standardize each time series separately, noting that the values ​​are always positive.

We also chose to use the past six months of data as a validation set.

from pytorch_forecasting.data import (
    TimeSeriesDataSet,
    GroupNormalizer
)max_prediction_length = 6   #预测6个月
max_encoder_length = 24 # 使用24个月的历史数据
training_cutoff = data["time_idx"].max() - max_prediction_lengthtraining = TimeSeriesDataSet(
    data[lambda x: x.time_idx <= training_cutoff],
    time_idx="time_idx",
    target="volume",
    group_ids=["agency", "sku"],
    min_encoder_length=0,  # 允许没有历史的预测
    max_encoder_length=max_encoder_length,
    min_prediction_length=1,
    max_prediction_length=max_prediction_length,
    static_categoricals=["agency", "sku"],
    static_reals=[
        "avg_population_2017",
        "avg_yearly_household_income_2017"
    ],
    time_varying_known_categoricals=["special_days", "month"],
  # 一组分类变量可以被视为一个变量
    variable_groups={"special_days": special_days},
    time_varying_known_reals=[
        "time_idx",
        "price_regular",
        "discount_in_percent"
    ],
    time_varying_unknown_categoricals=[],
    time_varying_unknown_reals=[
        "volume",
        "log_volume",
        "industry_volume",
        "soda_volume",
        "avg_max_temp",
        "avg_volume_by_agency",
        "avg_volume_by_sku",
    ],
    target_normalizer=GroupNormalizer(
        groups=["agency", "sku"], coerce_positive=1.0
  ), # 使用softplus,beta=1.0,并按组进行规范化处理
  add_relative_time_idx=True, # 作为特征添加
  add_target_scales=True, # 添加为特征
  add_encoder_length=True, # 添加为特性
)# 创建验证集(predict=True),这意味着要预测每个系列的最后一个最大预测长度的时间点
validation = TimeSeriesDataSet.from_dataset(
  training, data, predict=True, stop_randomization=True
)# 为模型创建数据加载器
batch_size = 128
train_dataloader = training.to_dataloader(
    train=True, batch_size=batch_size, num_workers=0
)
val_dataloader = validation.to_dataloader(
    train=False, batch_size=batch_size * 10, num_workers=0
)


Training Temporal Fusion Transformer

Now it's time to create our model. We train this model with PyTorch Lightning. Before training, you can use its learning rate finder to determine the optimal learning rate.

import pytorch_lightning as pl
from pytorch_lightning.callbacks import (
    EarlyStopping,
    LearningRateLogger
)
from pytorch_lightning.loggers import TensorBoardLogger
from pytorch_forecasting.metrics import QuantileLoss
from pytorch_forecasting.models import TemporalFusionTransformer# stop training, when loss metric does not improve on validation set
early_stop_callback = EarlyStopping(
    monitor="val_loss",
    min_delta=1e-4,
    patience=10,
    verbose=False,
    mode="min"
)
lr_logger = LearningRateLogger() # 记录学习率
logger = TensorBoardLogger("lightning_logs") # 记录到tensorboard# 创建训练器
trainer = pl.Trainer(
    max_epochs=30,
    gpus=0,   # 在CPU上训练,使用gpus = [0] 在GPU上运行
    gradient_clip_val=0.1,
    early_stop_callback=early_stop_callback,
    limit_train_batches=30,  # 每30个批次运行一次验证
    # fast_dev_run=True,   # 注释进去以快速检查bug
    callbacks=[lr_logger],
    logger=logger,
)# 初始化模型
tft = TemporalFusionTransformer.from_dataset(
    training,
    learning_rate=0.03,
    hidden_size=16,  # 影响最大的网络规模
    attention_head_size=1,
    dropout=0.1,
    hidden_continuous_size=8,
  output_size=7, # QuantileLoss默认有7个量纲
  loss=QuantileLoss()、
  log_interval=10, # 每10个批次记录一次例子
  reduce_on_plateau_patience=4, # 自动减少学习。
)
tft.size() # 模型中29.6k个参数# 适合网络
trainer.fit(
    tft,
    train_dataloader=train_dataloader,
    val_dataloaders=val_dataloader
)
65e82c46e7090efd5f52ac316cf7138a.png

On my computer, training takes about three minutes, but with larger networks and datasets, it can take hours. During training, we can monitor tensorboard, which can be started with tensorboard --logdir=lightning_logs. For example, we can monitor example predictions on the training and validation sets. As you can see in the image below, the predictions look pretty accurate. In case you were wondering, the gray lines show how much the model pays attention to different points in time when making predictions.

fa781f797414dad7096009c61e366ce7.png

Tensorboard panel showing training instances

Evaluate the trained model

After training, we can evaluate these metrics on the validation dataset and a few examples to see how the model performs. Given that we only used 21000 samples, the results are very reassuring and competitive with the gradient booster results.

from pytorch_forecasting.metrics import MAE
# 根据验证损失加载最佳模型(鉴于
# 我们使用早期停止,这不一定是最后一个 epoch)
best_model_path = trainer.checkpoint_callback.best_model_path
best_tft = TemporalFusionTransformer.load_from_checkpoint(best_model_path) # 计算验证集的平均绝对误差
actuals = torch.cat([y for x, y in iter(val_dataloader)] )
predictions = best_tft.predict(val_dataloader)MAE(predictions, actuals)
c6fd24f2c9a140c3735ca6bcd7489278.png

Looking at the worst performing sMAPE, we can see where the model has problems predicting reliably. These examples can provide important pointers on how to improve the model. This actual versus forecast plot is available for all models.

from pytorch_forecasting.metrics import SMAPE# 计算用于显示的指标
predictions, x = best_tft.predict(val_dataloader)
mean_losses = SMAPE(reduction="none")(predictions, actuals).mean(1)
indices = mean_losses.argsort(descending=True) # 对损失进行排序raw_predictions, x = best_tft.prediction(val_dataloader, mode="raw, return_x=True)# 仅显示两个例子用于演示
for idx in range(2):
  best_tft.plot_prediction(
    x,
    raw_predictions、
    idx=indices[idx]、
    add_loss_to_title=SMAPE()
  )
cb9ae5c5d710809a8dad63710e952350.png

The two worst predictions on the validation set. The white line is how much the converter cares about a certain point in time.

Likewise, we can also visualize random examples from our model. Another feature of PyTorch Forecasting is to explain the trained model. For example, all models allow us to compute partial dependence graphs at any time. However, for the sake of brevity, we will show here some of the built-in interpretation capabilities of the Temporal Fusion Transformer. It realizes the import of variables by designing a neural network.

interpretation = best_tft.interpret_output(
    raw_predictions, reduction="sum"
)best_tft.plot_interpretation(interpretation)
375ee25c9c0273ffb550b6e9b1a273e7.png 92f65988500908ce3fb406d99063d622.png

As expected, volume observed in the past was the top variable in the encoder, while price-related variables were the top predictor in the decoder. Perhaps more interestingly, institutions only came in fifth among static variables. However, given that the second and third variables are related to location, we can expect agencies to rank much higher if these two variables were not included in the model.

Summarize

It is very easy to train a model and gain insight into its inner workings using the PyTorch Forecasting software. As a practitioner, you can use the package to train and explain state-of-the-art models out-of-the-box. With PyTorch Lightning integration, training and prediction are scalable. As a researcher, you can leverage this package to gain automatic tracking and introspection capabilities for your architecture and apply them seamlessly across multiple datasets.

References

[1] PyTorch-Forecasting: https://pytorch-forecasting.readthedocs.io/
[2] N-BEATS: https://openreview.net/forum?id=r1ecqn4YwB
[3] PyTorch Forecasting: https://pytorch -forecasting.readthedocs.io/en/latest/api/pytorch_forecasting.models.nbeats.NBeats.html
[4] tensorboard: https://www.tensorflow.org/tensorboard
[5] temporal fusion converter: https:/ /arxiv.org/pdf/1912.09363.pdf
[6] PyTorch Lightning: https://pytorch-lightning.readthedocs.io/
[7] fast.ai: https://www.fast.ai/
[8] Temporal Fusion Transformer: https://arxiv.org/pdf/1912.09363.pdf
[9] Complete tutorial: https://pytorch-forecasting.readthedocs.io/en/latest/tutorials/stallion.html
[10] Kaggle's Stallion dataset : https://www.kaggl

e.com/utathya/future-volume-prediction

Editor: Wang Jing

7e775995c85166bc842afe156b721833.png

Guess you like

Origin blog.csdn.net/tMb8Z9Vdm66wH68VX1/article/details/131587617