[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

Original link  http://mp.weixin.qq.com/s/d2Yj6rwJkpKQgIc2qER9TA

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

In today 's DT (Data technology) era , data is becoming more and more important, and its core application "prediction" has also become an important force in the Internet industry and industrial transformation. For the retail industry, forecasting is almost the ultimate problem of business intelligence (BI) research. From the perspective of machine learning alone, it is easy to achieve accurate forecasts, but it is difficult to improve corporate profits in combination with business. Prediction accuracy is a core pain point.

business challenges

For the needs of customers in the fashion industry such as clothing , we refer to ZARA and roughly divide the products into : basic models and fashion . For basic models, there are no major changes every year, and the influence of international fashion is not large , so long- term production can be carried out . For fashion, the decision -making power to determine the trend of the trend is not in a certain region , and the buyers in one place have not grown enough to accurately predict the international fashion trend, so it is necessary to combine various factors in different regions to make predictions. Correspondingly, in the composition of new products, the sales forecast strategy is: planned production of basic models, and flexible adjustment of fashion models .

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

solution

Mission /Objective

According to the marketing requirements of clothing retail business, use a variety of data source analysis to achieve accurate sales forecast .

Data source preparation

Sand in, sand out, gold in and gold out. No data or low data quality will affect the model prediction effect. Before establishing a reasonable model, the data should be collected, and additional information (such as weather, location, holiday information , etc.) in addition to the existing sales data should be collected, and then preprocessing should be performed on the basis of the collected data.

With data, there are some features that cannot be directly processed by algorithms, and some data cannot be directly utilized by algorithms.

feature transformation

Convert the features that cannot be processed into clean features that the algorithm can easily handle. An example is as follows:

sale date. As far as the time attribute itself is concerned, it has no meaning for the model, and the date needs to be converted into the pseudo-variables of year, month, day, and week .

product features. From the product information sheet, you can get the style, color, texture and whether the product is a limited edition . However, there are no such variables. This requires us to extract the above features of this product from the product name.

The above examples are only some of the features.

structure

The above explains how to extract relevant features . We have roughly the following training samples (only some features are listed).

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

Divide training set and test set

考虑到最终模型会预测将来的某时间段的销量,为了更真实的测试模型效果,以时间来切分训练集和测试集。具体做法如下假设我们有2014-02-01 ~ 2017-06-17销量相关数据。以2014-02-01 ~ 2016-03-19的销量数据作为训练,2016-03-20~2017-06-17的数据作为测试。

建模

ARIMA,一般应用在股票和电商销量领域

ARIMA模型是指将非平稳时间序列转化为平稳时间序列,然后将结果变量做自回归(AR)和自平移(MA)。

随机森林

用随机的方式建立一个森林,森林由很多决策树组成,随机森林的每一棵决策树之间是没有关联的。在得到森林之后,当有一个新的输入样本进入的时候,就让森林中的每一棵决策树分别进行一下判断,看看这个样本应该属于哪一类(对于分类算法),然后看看哪一类被选择最多,就预测这个样本为那一类

支持向量回归(SVR)

SVR最本质与SVM类似,都有一个margin,只不过SVM的margin是把两种类型分开,而SVR的margin是指里面的数据不会对回归有帮助。

模型优化

1.上线之前的优化:特征提取,样本抽样,参数调参

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

2.上线之后的迭代,根据实际的A/B testing和业务人员的建议改进模型

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

从上图可以看出,在此案例中,svm和随机森林算法模型的预测误差最小运用3种方法预测某商品的销量,其可视化图形如下:

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

[Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

可以看出,销量的预测值的趋势已经基本与真实销量趋势保持一致,但是在预测期较长的区间段,其预测值之间的差别较大。

评估效果不能只看销量,要综合考虑,需要参考业务对接、预测精度、模型可解释性和产业链整体能力等因素综合考虑;不能简单作为企业利润增加的唯一标准。我们的经验是,预测结果仅作为参考一个权重值,还需要专家意见,按照一定的权重来计算

展望

除了以上列举的一些方法,我们已经在尝试更复杂的销售预测模型,如HMM、深度学习(Long Short-Term Memory网络、卷积神经网络(CNN)等;同时需要考虑到模型的可解释性可落地性和可扩展性、避免“黑箱”预测;还在尝试采用混合的机器学习模型,比如GLM+SVR,ARIMA + NNET等。

销售预测几乎是商业智能研究的终极问题,即便通过机器学习算法模型能够提高测试集的预测精度,但是对于未来数据集的预测,想做到精准预测以使企业利润最大化,还需要考虑机器学习模型之外的企业本身因素。比如,企业的整体供应链能力等,如何将企业因素加入到机器学习模型之中,是未来预销售预测的一个难点与方向。因此,要想解决销售预测终极问题还有一段路要走。

 [Big Data Tribe] Time series forecasting based on ARIMA, SVM, and random forest sales

点击阅读原文”下载阅读报告全文。

大数据部落 ——中国专业的第三方数据服务提供商,提供定制化的一站式数据挖掘和统计分析咨询服务
统计分析和数据挖掘咨询服务 : y0.cn/teradat (咨询服务请联系 官网客服
Click here to message me QQ:3025393450

【服务场景】    
科研项目;
    
  
公司项目外包。
【大数据部落】提供定制化的一站式数据挖掘和统计分析咨询服务
[Big Data Tribe] Big Data Tribe provides customized one-stop data mining and statistical analysis consulting services
分享最新的大数据资讯,每天学习一点数据分析,让我们一起做有态度的数据人 [Big Data Tribe] Big Data Tribe provides customized one-stop data mining and statistical analysis consulting services
微信客服号:lico_9e
QQ exchange group: 186388004

[Big data tribe] r language e-commerce website crawler

Guess you like

Origin http://43.154.161.224:23101/article/api/json?id=324968179&siteId=291194637