Seasonality in Time Series: 3 Models and 8 Modeling Approaches

Analyzing and dealing with seasonality is a key task in time series analysis. In this article, we will describe three types of seasonality and 8 common modeling methods.

What is seasonality?

Seasonality is one of the key factors that constitute a time series and refers to systematic movements that repeat with similar intensities over a period of time.

Seasonal changes can be caused by various factors, such as weather, calendar or economic conditions. There are examples of this in various applications. Flights are more expensive in summer due to holidays and travel. Another example is consumer spending, which increased due to the December holiday.

Seasonality is when the mean for some periods differs from the mean for other periods. This problem causes the series to be non-stationary. This is why it is important to analyze seasonality when building models.

3 modes

Three types of seasonal patterns can occur in a time series. Seasonality can be deterministic or random. On the stochastic side, seasonal patterns may or may not be stationary.

These seasonalities are not mutually exclusive. Time series can have both deterministic and stochastic seasonal components.

1. Definite seasonality

A time series with deterministic seasonality has a constant seasonal pattern. It always comes out in a predictable way, both in intensity and in periodicity:

Similar Intensity: During the same season, the level of the season + pattern remains constant;

Invariant periodicity: The positions of the peaks and troughs do not change. This means that the time between each repetition of the seasonal pattern is constant.

For example, the following is a synthetic monthly time series with deterministic seasonality:

 import numpy as np
 
 period = 12
 size = 120
 beta1 = 0.3
 beta2 = 0.6
 sin1 = np.asarray([np.sin(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 cos1 = np.asarray([np.cos(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 
 xt = np.cumsum(np.random.normal(scale=0.1, size=size))
 
 series_det = xt + beta1*sin1 + beta2*cos1 + np.random.normal(scale=0.1, size=size)

We can also use Fourier series to model seasonality. Fourier series are sine and cosine waves of different periods. If seasonality is deterministic, then it is very accurate to describe it with Fourier series.

2. Stochastic stationary seasonality

 beta1 = np.linspace(-.6, .3, num=size)
 beta2 = np.linspace(.6, -.3, num=size)
 sin1 = np.asarray([np.sin(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 cos1 = np.asarray([np.cos(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 
 xt = np.cumsum(np.random.normal(scale=0.1, size=size))
 
 # synthetic series with stochastic seasonality
 series_stoc = xt + beta1*sin1 + beta2*cos1 + np.random.normal(scale=0.1, size=size)

A stochastic stationary seasonal evolution over a continuous seasonal cycle such as a year. While the intensity is hard to predict, the periodicity remains roughly constant.

With deterministic seasonality, the forecast for a given month does not change from year to year. For stochastic stationary seasonality, the best guess depends on the value of the same month in the previous year.

3. Stochastic non-stationary seasonality

Seasonal patterns can vary significantly over several seasons, and the periodicity of this seasonality also changes over time. This means that the positions of the peaks and troughs are different.

Examples of such seasonal patterns appear in different fields. These data include consumer series or industrial production data. When a time series has synthetic seasonality, changes are difficult to predict.

Tests for Seasonal Time Series

Visualizing time series is an easy way to examine seasonal patterns. But visualization cannot systematically illustrate seasonal patterns, so a more systematic approach is needed to describe the seasonality of time series.

1. Measuring Seasonal Intensity

We can quantify the strength of seasonal patterns according to:

 import pandas as pd
 from statsmodels.tsa.api import STL
 
 def seasonal_strength(series: pd.Series) -> float:
     # time series decomposition
     series_decomp = STL(series, period=period).fit()
     
     # variance of residuals + seasonality
     resid_seas_var = (series_decomp.resid + series_decomp.seasonal).var()
     # variance of residuals
     resid_var = series_decomp.resid.var()
 
     # seasonal strength
     result = 1 - (resid_var / resid_seas_var)
 
     return result

This function estimates the strength of seasonality, whether it is deterministic or stochastic.

 # strong seasonality in the deterministic series
 seasonal_strength(series_det)
 # 0.93
 
 # strong seasonality in the stochastic series
 seasonal_strength(series_stoc)
 # 0.91

If the value is higher than 0.64 [2], a seasonal difference filter needs to be applied. Another way to detect seasonality is the QS test, which checks for autocorrelation when the seasonality is lagged.

2. Detect non-stationary seasonality

There are a number of statistical tests to test whether the seasonal pattern is non-stationary.

A common example is the Canova-Hansen (CH) test. Its assumptions are as follows:

  • H0 (null hypothesis): the seasonal pattern is stationary (no seasonal unit root);
  • H1: The series contains a seasonal unit root

OCSB test and HEGY test are two alternatives for CH. These methods are all available in Python's pmdarima library.

 from pmdarima.arima import nsdiffs
 
 period = 12 # monthly data
 
 nsdiffs(x=series_det, m=period, test='ch')
 nsdiffs(x=series_det, m=period, test='ocsb')
 
 nsdiffs(x=series_stoc, m=period, test='ch')
 nsdiffs(x=series_stoc, m=period, test='ocsb')

The function nsdiffs returns the number of seasonal difference steps required to make the series stationary.

3. Correlation detection

There are other detections designed for seasonal data. For example, the Kendall test of seasonality is a nonparametric test that checks for monotonic trends in seasonal time series.

Detect seasonal patterns

Seasonality refers to patterns that recur over a period of time. This is an important source of variation and is important for modeling.

There are various methods for dealing with seasonality, some of which remove the seasonal component before modeling. Seasonally adjusted data (time series minus the seasonal component) emphasize long-term effects such as trends or business cycles. While other methods add additional variables to capture seasonal periodicity.

Before discussing the different methods, let's create a time series and describe its seasonal pattern, we also continue to use the above code

 period = 12 # monthly series
 size = 120
 
 beta1 = np.linspace(-.6, .3, num=size)
 beta2 = np.linspace(.6, -.3, num=size)
 sin1 = np.asarray([np.sin(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 cos1 = np.asarray([np.cos(2 * np.pi * i / 12) for i in np.arange(1, size + 1)])
 
 xt = np.cumsum(np.random.normal(scale=0.1, size=size))
 
 yt = xt + beta1 * sin1 + beta2 * cos1 + np.random.normal(scale=0.1, size=size)
 
 yt = pd.Series(yt)

Seasonal patterns are then described by intensity:

 seasonal_strength(yt, period=12)
 # 0.90

The result is 0.90, which shows that the seasonality is indeed strong. The autocorrelation plot of this time series is shown in the figure below:

Then use the Canova-Hansen test we introduced above to see the seasonal unit root:

 from pmdarima.arima import nsdiffs
 
 nsdiffs(x=yt, m=period, test='ch')
 # 0

The result is 0, indicating that there is no seasonal unit root. That is to say, the seasonal pattern is stable.

So, how do we deal with seasonal patterns like this?

seasonal modeling

1. Dummy variable

Seasonal dummy variables are a set of binary variables. They indicate whether an observation belongs to a given period (e.g. January).

Here is an example of how to create these variables:

 from sktime.transformations.series.date import DateTimeFeatures
 from sklearn.preprocessing import OneHotEncoder
 
 monthly_feats = DateTimeFeatures(ts_freq='M',
                                  keep_original_columns=False,
                                  feature_scope='efficient')
 
 datetime_feats = monthly_feats.fit_transform(yt)
 datetime_feats = datetime_feats.drop('year', axis=1)
 
 encoder = OneHotEncoder(drop='first', sparse=False)
 encoded_feats = encoder.fit_transform(datetime_feats)
 
 encoded_feats_df = pd.DataFrame(encoded_feats,
                                 columns=encoder.get_feature_names_out(),
                                 dtype=int)

This code produces the following data.

Information about quarter and month is obtained in each observation (table on the left). This information is stored in a datetime_feats object. One-hot coding is then used to create dummy variables (table on the right).

Seasonal dummy variables are very effective if seasonality is known. Because of the fixed seasonality, the seasonal pattern is fixed, that is, the intensity and periodicity are basically unchanged. And we can also analyze the seasonal effect and its change by testing the coefficient of the seasonal dummy variable, which is conducive to the interpretability of the model.

But the disadvantage of seasonal dummy variables is also obvious, it assumes that different periods are independent. For example, January observations are correlated with December observations. Dummy variables are blind to this correlation. So dummy variables can create a lot of problems if the seasonal pattern changes.

2. Fourier series

Fourier series are periodic and deterministic variables based on sine and cosine waves. In contrast to seasonal dummy variables, these trigonometric functions model seasonality as a periodic pattern, and this structure better reflects reality.

sktime contains nice methods:

 from sktime.transformations.series.fourier import FourierFeatures
 
 fourier = FourierFeatures(sp_list=[12],
                           fourier_terms_list=[4],
                           keep_original_columns=False)
 
 fourier_feats = fourier.fit_transform(yt)

Two main parameters need to be specified here:

  • sp_list: Seasonal period as a list (for example, 12 months of data)
  • fourier_terms_list: The number of terms, which refers to the number of sine and cosine series to be included. These all affect the smoothness of the representation.

Fourier series are explanatory variables that can be added to the model. And it is possible to combine these properties with hysteresis properties.

3. Radial basis function

Radial basis functions (RBFs) are an alternative to Fourier series. It has been used to create repeating bell curves to simulate repeating patterns.

There is a RepeatingBasisFunction method in the scikit-lego package:

 from sklego.preprocessing import RepeatingBasisFunction
 
 rbf_encoder = RepeatingBasisFunction(n_periods=4,
                                      column='month_of_year',
                                      input_range=(1, 12),
                                      remainder='drop',
                                      width=0.25)
 
 rbf_features = rbf_encoder.fit_transform(datetime_feats)
 rbf_features_df = pd.DataFrame(rbf_features,
                                columns=[f'RBF{i}'
                                         for i in range(rbf_features.shape[1])])

The three most important parameters of this method are as follows:

  • n_periods: the number of basis functions to include
  • input_range: The input range of the column. For example, in the above example, we use (1,12), which is the range of months;
  • width: the width of the radial basis function, the main function is to control its smoothness

Like Fourier series, RBF variables can be used as explanatory variables in models.

4. Seasonal autoregressive

Autoregressive is the basis of most predictive models. The idea is to use recent past observations (lags) to predict future values. This concept can be extended to seasonal models. Seasonal autoregressive models include past values ​​for the same season as predictors.

SARIMA is a popular method that applies this idea:

 import pmdarima as pm
 model = pm.auto_arima(yt, m=12, trace=True)
 
 model.summary()
 # Best model:  ARIMA(0,1,0)(1,0,0)[12]

Using seasonal lags as explanatory variables is an effective way to model seasonality. But when using this method, seasonal unit roots should be taken care of. Because non-stationary data can cause many problems.

5. Add additional variables

Methods such as seasonal dummy variables or Fourier series can capture periodic patterns. But these methods are alternative methods.

We can also model seasonality by adding additional variables, such as exogenous variables such as temperature or the number of working days in a month to simulate seasonality.

6. Seasonal difference

Deal with seasonality by removing it from the data before modeling. This method is called seasonal differencing.

Seasonal differencing is the process of taking the difference between consecutive observations in the same season. This maneuver is especially useful for removing roots from seasonal units.

Seasonal differences can be done using the diff method:

 from sklearn.model_selection import train_test_split
 from sktime.forecasting.compose import make_reduction
 from sklearn.linear_model import RidgeCV
 
 train, test = train_test_split(yt, test_size=12, shuffle=False)
 
 train_sdiff = train.diff(periods=12)[12:]
 
 forecaster = make_reduction(estimator=RidgeCV(),
                             strategy='recursive',
                             window_length=3)
 
 forecaster.fit(train_sdiff)
 diff_pred = forecaster.predict(fh=list(range(1, 13)))

We built a Ridge regression model on the differenced series. By restoring the difference operation, the forecast on the original scale can be obtained.

7. Time series decomposition

Seasonality can also be removed using time series decomposition methods such as STL.

What is the difference between differencing and decomposition?

Both differencing and decomposition are used to remove seasonality from a time series. But the transformed data is modeled differently.

When differencing is applied, the model uses the differed data. So it is necessary to restore the difference operation to obtain the prediction on the original scale.

With a decomposition-based approach, two sets of forecasts are required. One is the seasonal part and the other is the seasonally adjusted data. The final forecast is the sum of the partial forecasts.

Here's an example of how a decomposition-based approach might work:

 from statsmodels.tsa.api import STL
 from sktime.forecasting.naive import NaiveForecaster
 
 # fitting the seasonal decomposition method
 series_decomp = STL(yt, period=period).fit()
 
 # adjusting the data
 seas_adj = yt - series_decomp.seasonal
 
 # forecasting the non-seasonal part
 forecaster = make_reduction(estimator=RidgeCV(),
                             strategy='recursive',
                             window_length=3)
 
 forecaster.fit(seas_adj)
 
 seas_adj_pred = forecaster.predict(fh=list(range(1, 13)))
 
 # forecasting the seasonal part
 seas_forecaster = NaiveForecaster(strategy='last', sp=12)
 seas_forecaster.fit(series_decomp.seasonal)
 seas_preds = seas_forecaster.predict(fh=list(range(1, 13)))
 
 # combining the forecasts
 preds = seas_adj_pred + seas_preds

In this example, we build a Ridge regression model to forecast seasonally adjusted data. The two forecasts are then added together.

8. Dynamic linear model (DLM)

The parameters of a regression model are usually static. They do not change over time, or are time invariant. DLM is a special case of linear regression. Its main characteristic is that the parameters change over time, rather than being static.

dlm assumes that the structure of seasonal time series changes with the seasons. Therefore a reasonable approach is to build a model with time-varying parameters. Parameters that vary with the seasons.

Chapter 15 of the book in Ref. [4] provides a neat R example of this approach. They used the time-varying MARSS (Multiple Autoregressive State Space) method to model seasonal variations.

Summarize

Time series modeling is not a simple task and requires consideration of multiple factors and techniques. The presence of seasonality can have an important impact on the analysis and forecasting of time series data. Identifying and understanding seasonal patterns can help reveal cyclical changes in data, develop seasonal adjustment strategies, and make more accurate forecasts. Time series modeling often requires a combination of experience and domain knowledge, while flexibly using different techniques and methods to obtain accurate and reliable models and forecast results.

Author: Vitor Cerqueira

https://avoid.overfit.cn/post/89fc450f115643ad8ef53cd31712ce68

Quote:

[1] Canova, F. and Hansen, Bruce E. (1995) “Are seasonal patterns constant over time? A test for seasonal stability”. Journal of Business & Economic Statistics, 13(3), pp. 237–252

[2] Wang, X, Smith, KA, Hyndman, RJ (2006) “Characteristic-based clustering for time series data”, Data Mining and Knowledge Discovery, 13(3), 335–364.

[3] Holmes, Elizabeth E., Mark D. Scheuerell, and EJ Ward. “Applied time series analysis for fisheries and environmental data.” NOAA Fisheries, Northwest Fisheries Science Center, Seattle, WA (2020).

[4] Holmes, Elizabeth E., Mark D. Scheuerell, and EJ Ward. “Applied time series analysis for fisheries and environmental data.” NOAA Fisheries, Northwest Fisheries Science Center, Seattle, WA (2020).

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/131780557