Ali proposed the latest hierarchical time series forecasting method, top-down + bottom-up two-way information fusion

Today I would like to introduce to you a hierarchical time series forecasting work SLOTH published by Ali Ant Group in February this year. SLOTH uses a top-down convolutional network and a bottom-up attention network to carry out the communication between upper-layer nodes and lower-layer nodes. two-way information transfer,

d54b4167f632286d5a2c44c2a570ca7b.png

论文标题:SLOTH: Structured Learning and Task-based Optimization for Time Series Forecasting on Hierarchies

Download address : https://arxiv.org/abs/2302.05650

1

background introduction

Hierarchical time series forecasting is mainly aimed at scenarios where each sequence can form a hierarchical structure in a multivariate time series. In the hierarchical structure, the sequence value of the parent node is equal to the sum of the sequence values ​​of its child nodes. Hierarchical prediction needs to meet two conditions: first, each node in the hierarchy needs to be predicted; second, it is necessary to ensure that the sum of the prediction results of a parent node and the prediction results of its child nodes are equal (or approximately equal), This constraint may be referred to as a hierarchical constraint. Hierarchical forecasting is also common in application scenarios. Compared with basic time series forecasting, hierarchical time series forecasting needs to consider not only how to predict each sequence well, but also how to make the overall hierarchical forecasting results meet the hierarchical constraints. For a more detailed work analysis of the hierarchical forecasting problem, please refer to the previous article Guide to hierarchical time series forecasting .

bc03559079bc62be2706cee2d382a2ce.png

In hierarchical forecasting, time series at different levels reflect different characteristics. The nodes in the upper layer have coarser granularity, smaller noise, and more stable sequence, but less information content; nodes in the lower layer have finer granularity, greater noise, and poorer sequence regularity, but rich in information content. The core of hierarchical estimation is to integrate the respective advantages of the upper sequence and the lower sequence to achieve the final accurate and consistent hierarchical estimation results.

2

SLOTH Hierarchical Forecasting Model

The core network structure of the SLOTH hierarchical prediction model proposed in this paper is shown in the figure below, which mainly includes four parts: time series feature extraction, top-down convolutional network, bottom-up attention network, and basic prediction result output .

d12ca62a62440f3c6cd226808f5069db.png

In the time series feature extraction part , the GRU model is used in this paper to encode the time series of each node in the hierarchy independently to obtain the hidden state representation of each node at each moment. All sequences in the hierarchy share the same GRU model parameters.

The main purpose of the top-down convolutional network part is to apply the stable sequence information of the upper layer to the prediction of the lower layer nodes. The overall structure is as shown in the figure below. For the node representations of each layer, the representations of its parent nodes are concatenated to form a matrix, and then one-dimensional convolution is used to fuse the representations. For example, if the parent node of D is B, and the ancestor node is A, the GRU output representations of A, B, and D are concatenated into a matrix, and convolution is used for information extraction. In this way, the upper layer stationary sequence information is applied to the lower layer nodes.

33e2febdb21c4f5930442452ca3ae3e4.png

The bottom-up attention network part mainly applies the rich information of the lower-layer nodes to the upper-layer nodes through attention. For a parent node, use its representation, make attention with its child nodes, and fuse the representation vectors of its child nodes. The whole process starts from the penultimate layer, and the attention calculation is performed layer by layer. And the representation obtained by top-down convolution will be subtracted, and the original representation will be added to remove the influence of top-down information on bottom-up attention.

a9d8f7dcd5ab2e5277adc1ec426e66b6.png

Finally, the bottom-up attention representation is weighted and fused with the initial GRU representation, and input into the MLP to obtain the basic prediction result. After obtaining the basic forecast results, it is necessary to calibrate the basic forecast results to form forecast results that meet the levels and constraints. The overall optimization goal is to minimize the prediction error, and at the same time require the prediction results to meet and constraints. This is a typical optimization problem with constraints. In this paper, it is directly converted into a Lagrangian dual form for solution, and an OptNet layer is added. Implement gradient updates in the network.

92ad48b91b5d0bd5d4d42b8020cedabe.png

3

Experimental results

In this paper, the prediction results of the model on multiple data sets are compared. The comparison model includes the basic level prediction method of deep network + bottom-up, and some end-to-end level prediction methods are also compared, and the obvious effect is confirmed. promote.

30fbc4a277fd0fb04b5d8f1083fe9066.png

推荐阅读:

我的2022届互联网校招分享

我的2021总结

浅谈算法岗和开发岗的区别

互联网校招研发薪资汇总
2022届互联网求职现状,金9银10快变成铜9铁10!!

公众号:AI蜗牛车

保持谦逊、保持自律、保持进步

发送【蜗牛】获取一份《手把手AI项目》(AI蜗牛车著)
发送【1222】获取一份不错的leetcode刷题笔记

发送【AI四大名著】获取四本经典AI电子书

Guess you like

Origin blog.csdn.net/qq_33431368/article/details/130633344