Prophet time series forecasting framework entry practice notes

1. Overview of Prophet time series forecasting framework

Prophet is a time series forecasting framework open sourced by Facebook, designed to make time series analysis easier and faster. Prophet can handle time-series data with multiple seasonality and emergencies and still be able to make good forecasts with missing or anomalous data. Prophet adopts an additive model-based method to decompose time series data into three parts: trend, seasonality and holiday, and provide adjustable parameters for each part, so that users can make more flexible model construction and adjustment.

The main features of Prophet are as follows:

  • Simple and easy to use: Prophet's API is simple and easy to understand, which is easy for beginners to use.
  • Flexibility: Prophet provides a variety of parameters and options that allow users to fine-tune the model.
  • Ability to handle multiple seasonal and unexpected events: Prophet can adaptively handle multiple seasonal and unexpected events, such as holidays.
  • Interpretability: Prophet provides good visualization and interpretation tools to enable users to better understand the prediction results.
  • Robustness: Prophet can handle missing data and abnormal situations, and can produce reasonable prediction results.

Prophet has a wide range of applications, including sales forecasting, financial forecasting, power load forecasting, traffic forecasting, and more. It has become one of the go-to frameworks for time series forecasting by many businesses and data scientists.

2. Prophet algorithm principle and papers

The prophet algorithm paper mainly proposes a modular additive model, which can freely combine modules and provide understandable hyperparameter configuration. Thanks to the low cost of visualization and optimization direction analysis of timing prediction problems, non-professional analysts have the ability to use prophet for machine learning and hyperparameter optimization by themselves.

2.1. Overview of the paper

The following is an overview of "Prophet Algorithms: Large-Scale Forecasting":

  1. Introduction

This chapter introduces the Prophet algorithm, which is a forecasting model based on time series analysis and can be used to process time series data with long-term correlation. It can make predictions about trends over a period of time in the future, and it can make predictions on different time scales.

  1. data preprocessing

In the Prophet algorithm, the input data needs to be preprocessed in order to convert it into a form suitable for model training. The preprocessing process includes missing value filling, outlier processing, normalization, etc.

  1. model building

The core of the Prophet algorithm is to build a linear regression model with multiple variables. The model can be expressed as:

y = β 0 + β 1 × d 1 + β 2 × d 2 + . . . + β p × dn ​​+ ε y = β_0 + β_1 × d_1 + β_2 × d_2 + ... + β_p × d_n + εy=b0+b1×d1+b2×d2+...+bp×dn+e

Among them, yyy represents future observations,ddd represents the current observation value,nnn represents the number of variables,β ββ represents the regression coefficient,ε εε represents the noise term. The parameters of the model can be determined by maximum likelihood estimation.

  1. model training

The training process of the Prophet algorithm includes two stages: timing decomposition and parameter learning. In the time series decomposition stage, the time series data needs to be decomposed into different seasonal components and trend components. In the parameter learning stage, the least squares method needs to be used to estimate the regression coefficients.

  1. model evaluation

In order to evaluate the predictive performance of the Prophet algorithm, cross-validation and test sets can be used for evaluation. Cross-validation can divide the data set into several subsets, and train and test on one of the subsets. The test set can be used to evaluate the generalization ability of the model.

  1. Applications

This chapter also provides some practical application cases, showing the prediction effect of Prophet algorithm in different fields. These examples include stock prices, temperature, sales, etc.

  1. Conclusion and Outlook

This chapter summarizes the main characteristics and application prospects of the Prophet algorithm. In the future, as the amount of data continues to increase and forecasting needs change, the Prophet algorithm will be more widely used.

2.2. Principle overview

The Prophet model decomposes a time series into four main components: trend, seasonality, holiday effects, and noise. Among them, the trend is the long-term change trend of the time series, the seasonality is the pattern of periodic change, the holiday effect is the influence on a special date or time period, and the noise is the random change that cannot be predicted.

The Prophet model can use either an additive model or a multiplicative model.

2.2.1. Additive model

In the additive model of the Prophet model, the predicted value of the time series is the sum of the trend, seasonal and holiday effects, that is:
y ( t ) = g ( t ) + s ( t ) + h ( t ) + e ( t ) y(t)=g(t)+s(t)+h(t)+e(t)y(t)=g(t)+s(t)+h(t)+e(t)

in,

  • g ( t ) g(t) g ( t ) represents the trend of the time series,
  • s ( t ) s(t) s ( t ) represents the seasonality of the time series,
  • h ( t ) h(t) h ( t ) represents the holiday effect of the time series,
  • e ( t ) e(t) e ( t ) represents the error term of the model. In additive models, trend, seasonality, and holiday effects are raw values ​​and are all interpreted as deviations from the mean of the time series.

2.2.2. Multiplicative model

In the multiplicative model of the Prophet model, the predicted value of the time series is the product of the trend, seasonal and holiday effects, namely:
y ( t ) = g ( t ) ∗ s ( t ) ∗ h ( t ) ∗ e ( t ) y(t)=g(t)*s(t)*h(t)*e(t)y(t)=g(t)s(t)h(t)e(t)

in,

  • g ( t ) g(t) g ( t ) represents the trend of the time series,
  • s ( t ) s(t) s ( t ) represents the seasonality of the time series,
  • h ( t ) h(t) h ( t ) represents the holiday effect of the time series,
  • e ( t ) e(t)e ( t ) represents the error term of the model. In multiplicative models, trend, seasonality, and holiday effects are all expressed as percentage changes and are all interpreted as deviations from the mean of the time series.

Both additive and multiplicative models have their own advantages and disadvantages and are suitable for different types of time series. In general, the additive model is suitable for situations where the trend and seasonality of the time series are independent of the data scale, such as temperature and rainfall; while the multiplicative model is suitable for situations where the trend and seasonality of the time series are related to the data scale, such as commodity sales volume and stock price. When using the Prophet model for time series forecasting, it is necessary to select the appropriate model according to the specific situation.

3. Installation

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple prophet

Prophet is from version v1.1 and needs to be supported by Python 3.7 or above.

4. Quick start

The input to Prophet is always a dataframe with two columns: ds and y.

  • The ds (datestamp) column should be in the format Pandas expects, ideally YYYY-MM-DD for dates and YYYY-MM-DD HH:MM:SS for timestamps.
  • The y column must be numeric and represents the metric we wish to predict.
import pandas as pd
from prophet import Prophet
from prophet.plot import add_changepoints_to_plot
import numpy as np

# 读取数据源 df,略
live_df = df[['recordtime','live']].copy()

insert image description here
According to Prophet rules, input data column names are converted to ds and y.

live_df = live_df.rename(columns={
    
    'recordtime':'ds','live':'y'})

Define holidays:

# 定义节假日
chinese_holiday = pd.DataFrame({
    
    
  'holiday': 'Lunar_festivals',
  'ds': pd.to_datetime(['2023-01-21', '2023-01-22', '2023-01-23',
                        '2023-01-24', '2023-01-25', '2023-01-26',
                        '2023-01-27', '2023-04-05', '2023-06-22',
                        '2023-06-23', '2023-06-24', '2023-09-29',
                        '2023-09-30']),
  'lower_window': 0,
  'upper_window': 1,
})
china_holiday = pd.DataFrame({
    
    
  'holiday': 'china',
  'ds': pd.to_datetime(['2023-01-01', '2023-01-02', '2023-05-01',
                        '2023-05-02', '2023-05-03', '2023-10-01',
                        '2023-10-02', '2023-10-03', '2023-10-04',
                        '2023-10-05', '2023-10-06']),
  'lower_window': 0,
  'upper_window': 1,
})
holidays = pd.concat((chinese_holiday, china_holiday))

Built-in Holidays:
A collection of built-in country-specific holidays is available using the add_country_holidays method (Python). A country name is specified, then the major holidays for that country will be included in addition to any holidays specified via the holidays parameter above.

We fit the model by instantiating a new Prophet object. Pass any predictor settings to the constructor. Then, you call its fit method and pass in the historical data frame. Fitting should take 1-5 seconds.

model = Prophet(holidays=holidays)
model.fit(live_df)

Make predictions on a dataframe with column ds containing the dates for which predictions are to be made. You can use the helper method Prophet.make_future_dataframe() to obtain a dataframe that extends to a specified time in the future (for example, 48 time point periods specified in this article, with intervals of 30 minutes). By default it will also include dates from the historical data, so we will see how well the model fits.

future = model.make_future_dataframe( periods=48, freq='30min', include_history=False) 
future.tail(48)

The predict method will assign each future row a predicted value called yhat. If you pass in historical dates, it will provide an in-sample fit. Here the forecast object is a new dataframe containing the forecast values ​​(column yhat ) and the components and uncertainty interval columns.

forecast = m.predict(future)
forecast[['ds', 'yhat', 'yhat_lower', 'yhat_upper']].tail(48)

insert image description here
A forecast plot can be plotted by calling the Prophet.plot method and passing in a forecast data frame.

fig1 = model.plot(ff)

insert image description here
If you want to see the forecast components, you can use the Prophet.plot_components method. By default, you will see the trend, annual seasonality, and weekly seasonality of the time series. If you include holidays, you'll see that here too.

fig = model.plot_components(forecast)

insert image description here
insert image description here
Among them, the trend parameter.

parameter describe
growth It refers to the trend function of the model. There are currently two values, linear and logistic
changepoints Refers to a specific date on which the trend of the model will change. And changepoints refers to all the dates of potential changepoints, if not specified, the model will automatically identify them.
n_changepoints The maximum number of Changepoints. If changepoints is not None, this parameter does not take effect.
changepoint_range It refers to the time range in which the changepoint appears in the historical data. It is used in conjunction with n_changeponits. The changepoint_range determines that the changepoint can appear at the time point closest to the current time. The larger the changepont_range, the closer the changepoint can appear. When changepoints is specified, this parameter does not take effect
changepoint_prior_scale Set the flexibility of automatic mutation point selection, the larger the value, the easier the changepoint will appear

Regarding the prediction component and image analysis, it will be expanded in subsequent blog posts.

5. Summary

5.1. Easy and quick start for non-professional developers

Prophet is relatively easy to use, and it shields many statistical knowledge points of time series models.

5.2. Fast

Fast speed has two meanings, one is fast application, and the other is fast model fitting.

5.3. Comparison of ARMA in Prophet and statsmodels

Both Prophet and statsmodels are tools for time series forecasting, but their design ideas are slightly different. Prophet is a prediction method based on an additive model, while statsmodels is a more general time series analysis library that provides a variety of modeling methods, including ARMA, ARIMA, VAR, etc. Here are some comparisons between them:

  1. Applicability: Prophet is designed to solve time series problems in business and social needs, such as sales forecasting, weather forecasting, etc. And statsmodels is a more general time series analysis tool, suitable for a wide range of statistical analysis problems.

  2. Accuracy: Prophet is designed so that it can generate accurate predictions in a short amount of time. In contrast, statsmodels may require more parameter tuning and model testing to generate accurate predictions.

  3. Flexibility: Prophet's design makes it very easy to use, but this also leads to certain limitations. Prophet can only be used for time series forecasting and cannot perform other types of time series analysis. On the contrary, statsmodels provides more flexibility and extensibility, and users can solve different types of problems by customizing models and fitting methods.

  4. Interpretability: Prophet's output contains detailed components and decomposition results, enabling users to better understand the basis of model predictions. In contrast, the output of statsmodels may require more statistical knowledge to understand.

In short, Prophet and statsmodels have their own advantages, and it may be more suitable to use one of them in different scenarios. If you need time series forecasting for business or social needs, or if you are not familiar with the statistical knowledge of time series analysis, then Prophet is a good choice. If you need to do more general time series analysis, or if you need custom models and fitting methods to solve specific problems, then statsmodels may be a better fit for your needs.

reference:

Mathematics Life. Research on Prophet, Facebook's Time Series Prediction Algorithm . Zhihu. 2018.12

Hengsha number. Easy-to-use time series analysis 2: Prophet model . Zhihu. 2022.07

Guo Fei. 【Reading Papers】prophet . 2019.10

Wu Yiji. python | prophet's case practice: trend test, mutation point test, etc. CSDN blog. 2022.07

Fulu Network R&D Team. How to train a NB Prophet model . Blog Garden 2020.07

muzhen. prophet Thesis reading notes . Zhihu. 2022.12

Model perspective. Time series forecasting Prophet model and Python implementation . Zhihu. 2023.03

Mathematics Life. Research on Prophet, Facebook's Time Series Prediction Algorithm . Zhihu. 2018.12

https://facebook.github.io/prophet/docs/quick_start.html#python-api

https://github.com/facebook/prophet

Guess you like

Origin blog.csdn.net/xiaoyw/article/details/130470530