"Depth demand forecast (Deep Demand Forecast)"

The depth of demand forecasting (Deep Demand Forecast)

 Wait 

Because of the recent work has been in research on demand forecasts. This article is mainly with the current academic research, the actual scene and a summary of their work experiences. Different from the past and introduce demand forecasting, this paper focuses on practical application, not focusing on the technology itself. Depth article named as demand forecasting, on the one hand will introduce some depth case study of the application, on the other hand is to be able to share practical challenges of real-life scenarios deeper level.

Demand forecast scenario type

Demand forecasting is often an important part of the supply chain, it is primarily for the supply chain, such as pre-production, replenishment, inventory management, supply chain operations and other aspects of the service.

I demand forecast based on the scene divided into two types: event-driven (event-driven) and trend cycle type (periodic or trendy). These two types are not completely isolated. Real-life scenarios usually there are two, one of which occupy only a major factor.

Event-driven

Common event-driven scenario is electronic business platform, the demand for fast moving consumer goods and other products 2C forecast also includes a prediction page UV, PV, etc. The main electricity supplier sales through a variety of promotional activities driving events, such as 618, together cost-effective, dual-11, Japan and other big super category sections, and subsections. Sales when there is no activity will be relatively low and stable. And electricity supplier Similarly, sales of fast moving consumer goods tend weekdays stable, while there will be a substantial increase in the part of the holidays, promotions, etc., then the level before the fall.

In event-driven demand forecasting, demand their own performance by activity events, activities before and after demand stabilized, mainly by businesses advertising media placement, and bring traffic flow, driven by the rapid growth in sales, but fell after the event . Cyclical trends in demand in the long range is not obvious, mainly by the brand's influence, reputation and other trends brought about due to changes in market share changes, such as brand influence improve, quality assurance, market share gradually increased, causing the sales rose steadily in the time range.

About event-driven demand forecasting, personally I think that more biased in favor of supervised learning with time attribute. Usually associated with the selected features to events related to the market-oriented features.

Figure 1 example of an event-driven demand

Trend cycle type

Cyclical trends common type of scenario is that some stable manufacturing 2B commodities will be seen less activity, no activity caused by sales of merchandise 2C. Examples of these two products, such as auto parts, medicine, office consumables. Except for some holidays, special events or external factors affecting policy, demand for these commodities performance of the overall presentation of the periodicity and regularity.

Periodic trends demand forecast type, which falls within the limits of the time series forecasting. It is generally selected in time series feature-based characteristics.

FIG 2 Example cyclical trends demand type

Demand forecasting dimension classification

In addition to the scene classification, we can also be classified according to forecast demand forecasting dimensions.

Time dimension demand forecast

Demand forecast suggests that it is time dimension in the time scale of forecast demand, such as hours forecast, daily prediction, forecast monthly, quarterly forecast. The same data, according to the different needs of downstream have different time scales predicted, such as factory production cycle for a month, then we need to do monthly forecast. Market sector needs daily sales forecasting, to guide the market launch and so on.

Time dimension in forecasting demand forecast is very common. Whether it is event-driven cycle or trend type, depending on the business requirements, the need to predict the time dimension.

Event dimensions demand forecast

Demand is predicted events dimension from the events affecting demand, the demand of an event of interest has occurred. For example, in the electricity business scenario, we focus on sales promotions. Similarly, we can turn ordinary date are summarized as an ordinary event. For the type of prediction, we can summarize events dimension demand forecast by the time dimension, we can also predict departure from the event itself.

Predict events dimension focused on event-driven demand forecasting.

Extreme point prediction

Extreme point prediction analogy outlier prediction, such as holidays, peak / underestimated, this super outliers double 11 points. In most games or study because these extreme points are not allowed to affect the overall prediction accuracy, these will be the extreme points out various kinds of data pre-processing (e.g., processing outliers, smoothing, etc.). However, in the actual production business, these extreme outliers that contains useful information, not just a simple pretreatment.

Generally, the extreme points can be predicted by post-treatment, with the trend of the model, there are some external auxiliary information is realized. For example, the 11 double prediction. 11 double sales is likely to exceed the value of all the historical data. If you just use tree models to predict the amount of double 11, it is generally underestimated. Because the tree model can not extrapolate, can not be predicted on the history had no sales. We may consider post-processing prediction model based on the value of the tree, such as the proportion of analogy in the past 11 double-double with 11 pre-sale, the analogy double 11 and double 11 last year and so on. Another point of view, we can predict the dimension based on the needs of the event. We outlier classified as historical event data, using a linear model or other model, demand for event-dimensional predictive modeling.

FIG extreme demands on the Sample Point 3

Demand forecasting methodology

Generally speaking, demand forecasting methods can be divided into simple, traditional time series prediction, forecast into supervised learning and deep learning methods.

  • Simple methods include: direct average, moving average, weighted average, exponential smoothing, exponential smoothing method which is a multiple linear regression
  • Traditional time series prediction comprises: ARIMA, Holt-winters
  • Into a supervised learning prediction include: XGBoost, LightGBM, CatBoost, Random Forest model such as tree
  • Deep learning:
  • Multi-time series forecast: seq2seq, LSTM and its variants, TPA-LSTM, LSNets etc.
  • Multi-time series quantile regression: DeepAR, MQ-RNN, Deep Factors Model

Simple methods described herein are no longer tired, details, refer to the relevant literature references also by the associated web page link. This chapter focuses on what the traditional time-series forecasting and prediction methods for converting a learning supervision. Depth learning method will be described in detail later in the case study.

ARIMA (Autoregressive Integrated Moving Average)

ARIMA focused on data from the related description. It is through the study of historical demand patterns change over time and predict the future. Simple ARIMA typically comprises three important parameters ARIMA (p, d, q). p autoregressive term data, d is the difference order, q is a number corresponding to the moving average.

How to determine the parameters? First, we determine the difference order d. So why do it difference? Differential data object is to more stable (Detrend), in order to make the moving average and auto-correlation, as shown, are the original data, and second-order differential first-order difference. After the second-order difference we can see that the data has stabilized. A similar situation, we can choose d = 2. p and q, respectively, can be obtained by partial autocorrelation autocorrelations and FIG. Meanwhile, the R and python can now obtain the optimal support ARIMA parameters.

a. the original data b. first order difference c. second order difference

ARIMA forecasting data clearly apply to regular time period, does not support the extreme demand forecasts.

Holt-winters

Holt-winters is an exponential smoothing method, generally refers to cubic exponential smoothing, respectively, for the three reference levels (Level,  [official] ), the trend (Trend,  [official] ) and seasonal factors (Seasonal Component,  [official] ). Based on the performance of different seasonal factors, Holt-winters have superimposed (additive) and stacked multiply (multiplicative) two versions. If the seasonal factor was stable on the data, the choice of method of superposition. If performance is directly proportional to the level of the reference sequence and seasonal factors, the choice fold multiplication method.

Superposition model:

[official]

Take a stack of model:

[official]

Holt-winters also applies to the time series data have regularity.

Tree Model

Machine learning methods demand forecasting, tree model has now become everyone's first choice, such as XGBoost, LightGBM, CatBoost and so on. When the tree model prediction, we can model the time series is added (Package consider using Python [tsfresh] ( )), for predicting time series data of . Alternatively, you can consider some dimensions are modeled, such as Monday to build a model on Tuesday to build a model, holidays, build a model. Tree model has been prowess in various time series prediction game, but it has several drawbacks:

  • We can not extrapolate, can not predict the range of data that does not exist in history
  • It requires careful fine tuning, many parameters involved

case study

This chapter focuses on a number of case studies and demand forecasting, such as Amazon's SKU forecasting, and taxi Uber demand forecasting.

Amazon Probabilistic Forecast

Amazon in 2017, 2018 and 2019 were published three papers in the prediction direction needs Probabilistic Forecast, the model involved referred to as: DeepAR, MQ-RNN and Deep Factors Model, as well as the development of Gluon based on AWS and MXNet around Probabilistic.

And go directly to the specific needs predict different future value, the probability forecast (Probablistic Forecast) the distribution of demand forecasting future values ​​or confidence intervals. The starting point is to predict the probability of demand forecasting for replenishment services. After obtaining specific predictive value, plus the safety stock replenishment needs on the predicted value. Previous replenishment of safety stock based on simple demand assumption of normal distribution, the probability prediction based on past data fit better probability distribution, to better serve the replenishment. The following describes some of the details DeepAR, MQ-RNN and Deep Factors three models.

DeepAR:Deep Autoregressive Recurrent Neural Networks

4 DeepAR model illustrated in FIG.

DeepAR的基础是RNN(大多数情况下,我们使用LSTM)。如图所示,在时间步 [official] 时,我们输入 [official] 时间步第 [official] 个时间序列对应的特征 [official] ,以及前一时间步 [official] ,第 [official] 个时间序列的需求值 [official] 。在训练时,输入的是真实的 [official] 的需求值。在预测的时候, [official] 通过极大似然函数进行估计 [official] 。特征和需求值构成组合向量输入到RNN里面,得到隐向量([official] )。通过隐向量计算分布的平均值和方差,计算极大似然估计。那么如何计算极大似然估计呢?通常来说,如果是连续型数值的需求量的话,我们假设为高斯分布。

[official]

这里对方差的处理采用soft-plus activation,主要也是避免方差为0的情况。对于一些计数型的需求预测,我们可以假设为负二项分布。

[official]

负二项分布的参数都通过soft-plus激活函数保证是正数。与高斯分布计算方差不同,负二项分布计算 [official] 。 [official] 是形状参数(shape parameter),对方差进行scale。作者发现这个形状参数的加入能够加速模型的收敛。

既然是预测未来需求的分布,且采用的极大似然估计,那么我们的目标函数变为最大化我们的似然函数,相应的模型训练的损失函数可以确定为 [official] :

[official]

如果用DeepAR预测Multi-Horizon的数据,由于后面的预测值依赖于前面的预测值,所以有时很难保证可以得到很好的效果。因此,通常需要多次预测取平均和不同分位数的结果。

And most forecasts are somewhat different function in model training DeepAR loss of function and final results of the evaluation are different. Target model is fitted distribution requirements, the accuracy of the results of the evaluation are needed to quantify the value. For the different models on the same data set comparison result, the paper mainly mentioned RMSE and Quantile Loss. Quantile Loss to be described, a given input quantiles, and span (Span) [L, L + S) where L is a point in time after the predicted starting point, S is the span. Quantile Loss is in a given span of quantile and the calculated loss. Formula is as follows:

[official]

The final quantile loss is the result of normalization. For example a given  [official] span [0, 8), comparing the predicted value is calculated in true values and different models of quantile loss within the span of a given range.

MQ-RNN: Multi-Horizon Quantile Recurrent Neural Network

The main idea of this is to predict when the windows and window-related data to predict when the next data recently certain period of time, training and selection of model training. For example, input  [official] data, output to t1  [official] , wherein  [official] . For example, the training data in March, the April forecast of demand.

FIG 5 MQ-RNN model diagram

MQ-RNN model results similar to Sequence-to-Sequence, and is divided into Encoder Decoder, not just a Decoder LSTM model. The overall process is as follows:

Input:  [official] The actual value y demand for embedding, and then a feature vector x constituting the combination vector as the input encoder LSTM. This paper also provides a method different encoder, and compared.

Decoder involves two MLP (Multi-layer Perceptron). The first MLP, called global MLP, integrating implicit vector encoder output, plus future features.

[official]

[official] ,  [official] A horizon-specific contexts,  [official] a horizon-agnostic contexts. horizon-specific contexts contain information on future point in time, horizon-agnostic contexts catch is shared information.

The second MLP, called local MLP, parameters are shared in the horizon. Local MLP corresponding to the output of the predicted value Quantiles. T + quantiles each such value corresponding to the time point k

[official]

From the foregoing, loss of function of model training Quantile Loss.

[official]

Wherein  [official] a quantile of the formula q, and the front DeepAR identical.

MQ-RNN greater improvement for DeepAR directly outputs the predicted value different quantile of the objective function to quantile Loss, exemplified herein can be resolved prediction at the cold start and a high-actuation of outliers. But MQ-RNN need to determine the time window. For example, we usually every day to predict the next month, we will use all the historical data for training and validation. MQ-RNN training data may use a fixed month, a quarter or a year, this way can reduce training time, but need to determine the selected window size superior training time by other means.

Deep Factors Model

Deep Factors Model The idea is to put the results into factors leading to random factors (random effect) and fixed factors (fixed effect). Both factors were estimated by different models, then coupled together to obtain a final prediction value.

Figure 6 Deep Factors

Global Effects similar to the horizon-agnostic contexts part MQ-RNN, you want to grab part of the common attributes of data. For example, global factors increase the overall catch data / downward trend. It is a linear combination of the K RNN output of latent global deep factors.

[official]

这里说明一下,大多数LSTM,包括DeepAR和MQ-RNN,我们的做法是先把输入进行embedding后导入LSTM中,但在deep factors中embedding放在了RNN的结果输出后。文中指出这样可以减少方差且提高数据的使用效率。

[official]

Random Effect随机因素,类似于MQ-RNN的horizon-specific contexts的部分,想要抓住local的部分。比如,数据在某小段时间内的波动和起伏:

[official]

对于随机因素的建模,文中建立了不同Local模型,如DF-RNN,DF-LDS和DF-GP

最终,如何得到我们的预测值呢?我们假设可以叠加global factors和random factors的输出。

[official]

其中,p可以是不同的需求分布假设,如高斯分布,负二项分布,泊松分布等。模型的损失函数,与DeepAR一致,评价标准为Quantile Loss。

Amazon的三个模型属于Probablistic Forecast,支持多个时间序列一起预测,可以解决一些SKU的冷启动问题(多个SKU时间序列一起学习,并加入了产品属性特征),更适用于一些有周期性的数据。

Uber Extreme Event Forecaster

Uber的应用是极端事件预测的一种方法。文中主要阐述特殊事件引起的需求巨变,如感恩节,圣诞节,存在大型欺诈事件等。主要的思想是把Uncertainty,分为model uncertainty和forecast uncertainty。模型结构如图7。模型的不确定性通过LSTM autoencoder来学习估计,同时autoencoder起到feature extraction的重要作用。选取模型的中间层输出作为forecaster的部分输入。forecaster用于估计预测的不确定型。整体的模型结构进行分离异步学习。文中指出去掉autoencoder部分,直接把feature extraction加入forecaster也是可以的,但是作者的经验是分离的结构有更好的效果。

FIG 7 Uber Extreme Event Forecaster model structure

Model evaluation criteria SMAPE, loss function 7 as MSE.

Uber experience

Pretreatment:

  • Training and testing data to use the same pretreatment
  • Processing can do first to trend treatment (detrending, de-seasonaling)
  • After normalizing the input feature (Standard / MinMax Scaler) and a log scale value of y accelerates Training

Model structure:

  • The longer the sequence, the greater the effect of dropout
  • dropout acts on the activation function, not used in the weight parameter. The regularization term used in the weight parameters, rather than using on the active function.

Simple model / depth study:

  • Short-term time-series data and uncorrelated, the amount of data
  • Traditional model is known to have achieved good results in
  • Explanatory strong

Deep learning more applicable to:

  • There are many time series, and the series of time long enough
  • Potential correlation
  • Explanatory unimportant
  • Cutting-edge technology exploration and research

What is the predictability (forecastability)?

Business people usually do or some model algorithm engineers in demand to get data directly after various attempts to model, adjusting parameters to optimize. Spend a lot of time to enhance the accuracy of some forecasts SKU's virtually impossible to predict, but can not help downstream control risk. For example, some SKU middleweight small and large fluctuations in demand for their own reasons, very accurate prediction can not be achieved. For at different SKU sometimes be screened predictability.

I.e. a certain predictability SKU / product may be predicted by modeling data, and can achieve a certain accuracy. So there are different predictability different SKU. Such as a SKU predictive accuracy of 30% at most, then can it be said that 30% is a good strategy? How to prove this is a good 30% of it?

In fact, if we are able to provide the SKU at the same time accuracy, providing the coefficient of variation (CoV, Coefficients of Variation), so will the demand forecast of the downstream sector of great help to control risk. The coefficient of variation is a measure of the stability of the prediction model. The following formula, the average size of the window at different times in the same time range, accuracy variance / accuracy rate:

[official]

Low forecast accuracy, low coefficient of variation usually does not exist. If present, then the accuracy should be in the up and down fluctuations, resulting variance is too large. Low accuracy, but a stable coefficient of variation that can help control the risk of downstream, to meet the supply rate.

Amazon coding practices

针对Amazon的三篇论文,结合自己的理解,进行了编码实践。如果有误,欢迎大家指正。也希望和大家有所交流。

jingw2/demand_forecast​github.comicon

参考文献

Edited on 2019-10-10

Guess you like

Origin www.cnblogs.com/cx2016/p/12101278.html