Timing analysis toolkit recommendation in Python (2)

guide

In the previous tweet, the timing analysis toolkit recommendation in Python (1) introduced three toolkits for timing analysis, focusing on timing feature engineering, sklearn-based timing modeling and more advanced timing modeling tools. Today, this article will introduce four useful toolkits for timing analysis: Prophet, Merlion, Darts and GluonTS.

Continuing the style of the previous tweet, this article mainly briefly introduces the four timing toolkits, including the function positioning, main features, advantages and disadvantages of the toolkits, and lists related papers, documents and github addresses for detailed reference .

01 Prophet

dbd530115da0216d182ba10a848cdd02.png

Prophet, the English original meaning has the meaning of "prophet" or "prophet". If it is placed in the time series, it is naturally used for time series prediction. This is a time series analysis tool designed by Facebook in 2017. It is mainly positioned for time series forecasting. If it is distinguished according to several mainstream modeling methods for time series forecasting , then Prophet should belong to the genre of statistical models. The latest version of Prophet is version 1.0, and its previous version is 0.7, and it just started from 1.0. The toolkit was renamed prophet, while the previous toolkit was called fbprophet, but the main timing prediction model is called Prophet. . The performance of the prophet toolkit is still very powerful. The most important thing is that its degree of automation is quite high. Even the full default parameters can achieve good results, so many other timing toolkits integrate it.

6511784cef806baaf18151a8e0b9bb5d.png

However, the installation of the prophet toolkit is a bit troublesome, mainly due to the problem of pystan dependent installation. After practice, using the conda source to directly conda install prophet can complete the installation smoothly, and the experience is good.

The basic idea of ​​Prophet to achieve time series prediction is to decompose the time series into trend and seasonality by components. The seasonality here includes not only the seasonality of the year, month, week and other date attributes, but also the more general Periodicity), error term (Error), and consideration of the impact of special dates such as holidays (Holiday). Compared with other classic statistical time-series forecasting models, Prophet not only has a more detailed breakdown of components, but also considers trend change points (Trend Changepoints), and also supports the impact of double holidays on holidays (such as China's During the festival, the National Day and Mid-Autumn Festival overlap), in order to further consider the impact of the holiday on the timing.

When Prophet performs time series prediction, it uses dataframe as the input data type, and the dataframe is required to contain two fields ds and y, where ds represents the time column, y represents the time series variable, and then you can have fun by directly calling the fit and predict interfaces . At the same time, Prophet can also perform a quick visual comparison of the prediction results. The black scatter points in the figure below are the real values, while the blue area is the prediction confidence range.

51760e480ae6a27ed4ba740fa918e87c.png

The relevant reference information about Prophet is as follows:

论文:https://peerj.com/preprints/3190.pdf
文档:https://facebook.github.io/prophet/docs
GitHub:https://github.com/facebook/prophet (13.9K star)

02 Merlion

b7c007ca419a18f316c41da5686c4181.png

Merlion is a timing analysis tool newly launched by salesforce in the United States. Its main positioning is timing prediction (Forecasting) and anomaly detection (Anomaly Detection).

Personally, my understanding of salesforce comes from using the AutoML tool transmogify, which is also an automated machine learning framework based on Spark.ml launched by salesforce.

Because Merlion was launched relatively late among the several timing analysis tools compared in this comparison, it has a latecomer advantage to a certain extent. As far as time series prediction and anomaly detection are concerned, Merlion supports both univariate and multivariate time series analysis, and also supports model fusion (Ensemble) and AutoML capabilities (which can be understood as models with model selection and automatic tuning). time-series modeling of reference functions). The following figure is a comparison of the functional coverage of Merlion's github and several other timing analysis tools:

91958d8457b35a33c5592f556b720f33.png

Specific to time series forecasting tasks, Merlion generally supports statistical models and machine learning models. Among them, statistical models include ARIMA, ETS and other common models, and also integrate Prophet; while machine learning models are mainly decision tree-based integrated models. , such as RF and GB, etc. At the same time, as mentioned above, Merlion has built-in AutoML capabilities, which can realize model selection and parameter adjustment, and can also easily integrate the prediction results of multiple models. After all, there is no single model in time series prediction that takes all the data. set situation. Similar to Prophet, Merlion also supports automatic drawing of comparison curves between real values, predicted results and confidence intervals, which is somewhat more intuitive than Prophet, as shown in the figure below.

b23305fcb704ecc793e43052496fa2f3.png

Merlion is a tool that I personally used a lot in the early stage. It is recommended to use offline installation (first download the source code from github, and then pip install the folder). Unlike Prophet, since Merlion supports both univariate and multivariate, it has a built-in custom input data format TimeSeries type, but it can also be very convenient to load and transform from dataframe.

The relevant reference information about Merlion is as follows:

论文:https://arxiv.org/abs/2109.09265
文档:https://opensource.salesforce.com/Merlion/v1.1.0/index.html
GitHub:https://github.com/salesforce/Merlion (2.3k star)

03 Darts

aa7c424105b6bc518f2a3adac44922e2.png

The Darts toolkit is also a powerful timing analysis tool. It also supports many models and task scenarios, and provides a highly integrated calling method, including Prophet, which is also one of its built-in integrated models. The following figure is the module function array given in Github of Darts, from which we can see the supported models and the timing prediction scenarios used:

6729cf23e1c365829d78d9af552864e1.png

My first impression of Darts is that it is very close to Merlion, including that both have customized a TimeSeries data type as the standard input of the model. However, the main differences between the two are also obvious, which can be summarized as follows:

  • The timing analysis tasks supported by Merlion include timing prediction and anomaly detection; while Darts only focuses on timing prediction problems;

  • The models supported by Merlion are mainly statistical models and traditional machine learning models, while the models supported by Darts are more abundant. The biggest feature is the deep learning model, including Transformer, TCN and other new time series modeling methods.

In addition, the Darts toolkit also supports features such as Pipeline and automatic parameter tuning, which can be regarded as a toolkit with relatively complete engineering support. However, the personal experience when trying to use it is not very good 7271a33e811cf912d4b1118324f64e15.png.

Related reference information about Darts is as follows:

论文:https://arxiv.org/abs/2110.03224
文档:https://unit8co.github.io/darts
GitHub:https://github.com/unit8co/darts (3.3k star)

04 GluonTS

18897495ec011b649143dbc7e3490d2b.png

If readers who understand AutoML technology must know that Amazon has released an AutoML framework called AutoGluon, it was only then that I learned about gluon, knowing that this is a deep learning framework launched by Amazon (however, it has not been in-depth so far. Research and exploration have been used...), and GluonTS is a toolkit for timing modeling in the Gluon ecosystem, more precisely, a probabilistic timing model tool based on deep learning. As for timing analysis tasks, they all support timing Prediction and anomaly detection tasks.

Frankly speaking, for me personally, GluonTS only stays at the level of reading its official paper, and the actual tools have not been explored and used, so the description of its performance is only limited to seeing and hearing, and lacks hands-on practice, so I won’t do it here More introduction.

The relevant reference information about GluonTS is as follows:

论文:https://arxiv.org/abs/1906.05264v1
文档:https://ts.gluon.ai/
GitHub:https://github.com/awslabs/gluon-ts/(2.4k star)

05 Summary

Overall, the four timing toolkits have their own characteristics and feature coverage:

  • The function of Prophet is relatively single, and it is only suitable for univariate time-series forecasting models, and only supports this model. But at the same time, the model is also highly professional and mature. The number of stars on GitHub is as high as 13k, and it has become one of the necessary integrated models for many other timing analysis toolkits.

  • Merlion is positioned in time series prediction and anomaly detection scenarios. It supports both univariate and multivariate time series. The models are mainly statistical models and machine learning models. One of the highlights is the Auto capability that supports time series modeling and the Ensemble capability for multiple models.

  • Darts is also a master of time series analysis tools. It is mainly oriented to time series prediction tasks, but it is more excellent in the richness of the model. The highlight is that it supports many deep learning models, including Transformer, TCN and other new stars of sequence models.

  • GluonTS, as a time series modeling tool in the Amazon Gluon ecosystem, is a time series analysis tool focusing on deep learning models. Its applicable tasks include time series prediction and anomaly detection, but it is slightly inferior to Merlion and Darts in terms of model flexibility.

Considering the three tools of tsfresh, tslearn, and sktime introduced in the previous tweets , together with the four tools of Prophet, Merlion, Darts, and GluonTS introduced in this article, they are basically sufficient for mainstream time series data analysis tasks. At the same time, based on this, it is more important to improve the relevant theoretical foundation, so that these tools can be better used and controlled, and it is not in vain to become a real algorithm engineer.

947de0029f8996a7dbedd0254ce82b6f.png

Related Reading:

Guess you like

Origin blog.csdn.net/weixin_43841688/article/details/122295505