Cracking Time Series: A Comprehensive Guide to Moving Averages


foreword

A time series (English: time series) is a set of data point sequences arranged in chronological order. Usually the time interval of a set of time series is a constant value (such as 1 second, 5 minutes, 12 hours, 7 days, 1 year), so the time series can be analyzed and processed as discrete time data. Time series are widely used in mathematical statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, EEG, control engineering, aeronautics, communication engineering and most applications involving time data measurement Science and Engineering. (Source: Wikipedia)

1. Introduction to time series

1-1. Time series definition

Time Series : A time series is a set of data points arranged in chronological order. These data points can be observed at continuous points in time (eg, every second, minute, hour) or at regular intervals (eg, daily, weekly, monthly, quarterly, yearly).

1-2. Time series characteristics

Time series data have special properties that set them apart from other types of statistics. Here are some key features :

  • Time dependence: A key property of time series data is the dependence between observations. This means that data at one point in time may be affected by its historical data. For example, today's stock price may be influenced by stock prices over the past few days or months.

  • Seasonality: Many time series data exhibit a seasonal pattern, meaning that there is a certain pattern or trend in the data at certain times of the year. For example, retail sales typically increase during holiday seasons such as Christmas.

  • Trend: A trend is a consistent rising or falling pattern that time-series data exhibits over time. For example, a company's annual sales may show a trend of continued growth.

  • Periodicity: Periodicity refers to patterns or fluctuations that data exhibit at regular time intervals. This differs from seasonality in that cyclical patterns are not necessarily related to calendar time such as season or month. For example, an economy may go through cycles of growth and recession for several years.

  • Instability: Many time series data can become unstable over time, which can be caused by changes in market conditions, changes in policies, advances in technology, etc.

1-3. Time series effect

Time series analysis has important applications in many fields, mainly in the following roles :

  • Forecasting: A major application of time series analysis is forecasting future data points. By understanding past data patterns, we can predict future trends, seasonal patterns, and more. For example, a business might use time series analysis to forecast future sales for better inventory management and resource planning.

  • Anomaly detection: Time series analysis can also be used to detect outliers or mutations in the data. For example, if a server has a sudden increase in traffic, it could mean that the server is under attack, or has some other problem.

  • Understanding underlying patterns and relationships: Time series analysis can help us understand the underlying patterns and relationships of the data. For example, we can use time series analysis to understand business cycles, or to understand stock price fluctuations.

  • Policy or program evaluation: Time series analysis can also be used to evaluate the effectiveness of policies or programs. For example, a government might use time-series analysis to assess the impact of tax policies, or to evaluate the effects of public health interventions.

  • Signal processing: In the field of signal processing, time series analysis can be used to extract useful signals, or to remove noise.

2. Statistical methods

2-1. Introduction to moving average method

2-1-1. Basic principles and calculation process

The basic principle of the moving average method : by calculating the average value of a continuous period of data points in the data set, it can smooth the data and reveal the underlying trend or periodic pattern of the data. This approach is particularly useful for time series data, as it helps us smooth out short-term fluctuations in order to better understand the long-term trends in the data. The key concept of the moving average method is the "moving window". This window defines the number of data points over which we want to calculate the average. For example, if we have a set of daily sales data, we can choose a 7-day window, which means that we calculate the average sales of the last 7 days each time.

The calculation process of the moving average method is as follows :

  • Choose a window size. This window size determines the number of data points we will consider.
  • For each time point, calculate the average of all data points within the window. This average is the moving average at that point in time.
  • Move the window one step forward and repeat step 2 until a moving average is calculated for all time points.

2-1-2. Classification of moving average method

Moving averages are a commonly used time series analysis method, primarily used to smooth data to reveal underlying trends or cyclical patterns. According to different calculation methods, the moving average method can be mainly divided into the following types :

  • Simple Moving Average (SMA): This is the most basic moving average method, which calculates the average value of the data in each window. For example, a 7-day simple moving average is the average of the past 7 days of data.
  • Weighted Moving Average (WMA): In a weighted moving average, each data point has a weight that determines the importance of that data point in the average. Typically, more recent data are given greater weight because they are more reflective of the current situation.
  • Exponential Moving Average (EMA): The exponential moving average is a special weighted moving average that gives each data point a weight that decays exponentially over time. This means that the most recent data will be given the greatest weight, while the older data will be given less weight.
  • Cumulative Moving Average (CMA): The cumulative moving average is the average of all data from the beginning of the data to the current point. The peculiarity of this approach is that each new data point affects all averages.

Each of these moving average methods has advantages and disadvantages and is suitable for different situations. Simple moving averages are easy to understand and calculate, but may ignore recent data changes. Weighted and exponential moving averages better reflect recent data changes, but are more complex to calculate. Cumulative moving averages can reflect long-term trends, but may be affected by earlier data.

2-1-3. Simple moving average method

Simple Moving Average (SMA) : is a commonly used time series analysis method for smoothing data and identifying trends. It generates forecasts by calculating the average of a series of consecutive data points, thereby reducing the volatility of the data and better showing long-term trends.

The calculation method of the simple moving average is very simple : for a given time series data, select a fixed-size window (such as n data points), and then calculate the average value of the data points in the window as the predicted value. The window slides forward over time, and the average is recalculated each time. In this way, we can obtain a series of smoothed forecast values, which can be used to analyze trends and periodicities.

Here is a sample code for implementing a simple moving average in Python :

import numpy as np
import matplotlib.pyplot as plt

def simple_moving_average(data, window_size):
    weights = np.repeat(1.0, window_size) / window_size
    sma = np.convolve(data, weights, 'valid')
    return sma

# 原始数据
data = np.array([3, 5, 7, 6, 9, 8, 7, 6, 7, 8, 10, 12])

# 移动平均窗口大小
window_size = 3

# 计算简单移动平均
sma = simple_moving_average(data, window_size)

# 绘制原图形
plt.plot(data, label='Original')
# 绘制经过移动平均的图形
plt.plot(range(window_size-1, len(data)), sma, label='SMA')
plt.xlabel('Time')
plt.ylabel('Value')
plt.legend()
plt.show()

output :
insert image description here

Conclusion : By using the simple moving average method, we can smooth the time series data and better observe the long-term trend. Note that the choice of window size affects smoothness and responsiveness, with smaller window sizes more sensitive to recent changes, and larger window sizes better for analyzing long-term trends.

2-1-4. Weighted moving average method

Weighted Moving Average (WMA) : It is a time series analysis method based on weight distribution. Compared with the simple moving average method, the weighted moving average method assigns different weights to the data at different time points to better Reflect the impact of different time points on the predicted value.

The weighted moving average is calculated as follows : For a given time series data, select a fixed-size window (such as n data points), and assign a weight to each data point within the window. Typically, newer data points have higher weights and older data points have lower weights. Then, a weighted average is calculated in proportion to the weights as the predicted value. Over time, the window slides forward and the weighted average is recalculated.

The following is a sample code that implements a weighted moving average using Python, and draws the original graph and the graph after the weighted moving average :

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置字体为中文宋体
plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题

def weighted_moving_average(data, weights):
    """
    计算加权移动平均

    参数:
    data: 时间序列数据(一维数组)
    weights: 权重(一维数组,与数据点对应)

    返回值:
    移动平均结果(一维数组)
    """
    ma = np.convolve(data, weights, mode='valid')
    return ma


# 示例数据
data = [10, 12, 15, 14, 16, 18, 17, 19, 20, 22]
weights = [0.1, 0.2, 0.3, 0.4]

# 计算加权移动平均
wma = weighted_moving_average(data, weights)

# 绘制原图形
plt.plot(data, label='原图形')

# 绘制加权移动平均后的图形
plt.plot(range(len(weights) - 1, len(data)), wma, label='加权移动平均')

# 添加图例和标题
plt.legend()
plt.title('加权移动平均')

# 显示图形
plt.show()

The output looks like this :
insert image description here

2-1-5. Exponential Moving Average (EMA)

Exponential Moving Average (EMA) : It is a commonly used time series analysis method for smoothing data and capturing changes in trends. Unlike simple moving averages and weighted moving averages, exponential moving averages give more weight to recent data points, with older data points getting progressively less weight to better reflect the influence of recent data on the forecasted value.

The calculation method of the exponential moving average is as follows : For a given time series data, select a smoothing coefficient (generally denoted as α), usually ranging from 0 to 1. Then, the exponential moving average is calculated according to the following formula:

EMA ( t ) = α ∗ data ( t ) + ( 1 − α ) ∗ EMA ( t − 1 ) EMA(t) = α * data(t) + (1 - α) * EMA(t-1)EMA(t)=adata(t)+(1a )EMA(t1)

Among them, EMA(t) represents the exponential moving average at the current moment, data(t) represents the original data at the current moment, and EMA(t-1) represents the exponential moving average at the previous moment. By continually updating the exponential moving average, a series of smoothed forecasts can be obtained.

The following is a sample code that implements exponential moving average using Python, and draws the original graph and the graph after exponential moving average :

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置字体为中文宋体
plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题


def exponential_moving_average(data, alpha):
    """
    计算指数移动平均

    参数:
    data: 时间序列数据(一维数组)
    alpha: 平滑系数

    返回值:
    移动平均结果(一维数组)
    """
    ema = [data[0]]
    for i in range(1, len(data)):
        ema.append(alpha * data[i] + (1 - alpha) * ema[i - 1])
    return ema


# 示例数据
data = [10, 12, 15, 14, 16, 18, 17, 19, 20, 22]
alpha = 0.3

# 计算指数移动平均
ema = exponential_moving_average(data, alpha)

# 绘制原图形
plt.plot(data, label='原图形')

# 绘制指数移动平均后的图形
plt.plot(range(len(data)), ema, label='指数移动平均')

# 添加图例和标题
plt.legend()
plt.title('指数移动平均')

# 显示图形
plt.show()


output :
insert image description here

2-1-6. Cumulative Moving Average (CMA)

Cumulative Moving Average (CMA) : It is a method for smoothing time series data, which is a cumulative averaging method. Unlike simple moving averages and weighted moving averages, cumulative moving averages do not require specifying a fixed window size, but instead include all previous data in the average calculation.

The calculation method of the cumulative moving average is as follows : for a given time series data, first initialize a cumulative counter and a cumulative sum. Then, for each data point, cumulative calculations are performed in turn. Specific steps are as follows:

1. Initialize the cumulative counter count to 0 and the cumulative sum to 0.
2. For each data point data[i], do the following:

  • Increment the cumulative counter count by 1.
  • Add the cumulative sum to the current data point data[i].
  • Calculates the cumulative moving average cma as the cumulative sum and dividing cumulative_sum by the cumulative counter count.
  • Store the current cumulative moving average cma into the result list.

By continuously accumulating and calculating the cumulative moving average, a series of smoothed forecast values ​​can be obtained.

The following is a sample code that uses Python to implement the cumulative moving average, and draw the original graph and the graph after the cumulative moving average :

import numpy as np
import matplotlib.pyplot as plt

plt.rcParams['font.sans-serif'] = ['SimHei']  # 设置字体为中文宋体
plt.rcParams['axes.unicode_minus'] = False  # 解决负号显示问题

def cumulative_moving_average(data):
    """
    计算累积移动平均

    参数:
    data: 时间序列数据(一维数组)

    返回值:
    移动平均结果(一维数组)
    """
    cma = []
    cumulative_sum = 0

    for i in range(len(data)):
        cumulative_sum += data[i]
        cma.append(cumulative_sum / (i + 1))

    return cma


# 示例数据
data = [10, 12, 15, 14, 16, 18, 17, 19, 20, 22]

# 计算累积移动平均
cma = cumulative_moving_average(data)

# 绘制原图形
plt.plot(data, label='原图形')

# 绘制累积移动平均后的图形
plt.plot(range(len(data)), cma, label='累积移动平均')

# 添加图例和标题
plt.legend()
plt.title('累积移动平均')

# 显示图形
plt.show()

output :

insert image description here

2-1-7. Comparison

Here are the advantages and disadvantages of Simple Moving Average (SMA), Weighted Moving Average (WMA), Exponential Moving Average (EMA) and Cumulative Moving Average (CMA) :

1. Simple Moving Average (SMA)

Advantages :

  • Easy to calculate and understand.
  • Can effectively smooth data and help identify long-term trends.

Disadvantages :

  • All data points are weighted equally, possibly disregarding recent data changes.
  • When new data points are added, the oldest data points will be removed, which may cause unnecessary jumps.

2. Weighted moving average (WMA)

Advantages :

  • More weight is given to recent data, which can better reflect the latest data changes.

Disadvantages :

  • The calculation complexity is high, and weights need to be set.
  • While reflecting recent data changes, more sudden data changes may still be overlooked.

3. Exponential Moving Average (EMA)

Advantages :

  • It has higher sensitivity to recent data changes, and can fully reflect all data points, and will not cause jumps due to the removal of data points.
  • EMA is more flexible than WMA, and only needs to set an attenuation factor.

Disadvantages :

  • High computational complexity, not as intuitive as SMA.
  • When the data fluctuates wildly, EMA may generate too much noise.

4. Cumulative moving average (CMA)

Advantages :

  • It can reflect the changes of all historical data and help to observe long-term trends.

Disadvantages :

  • Not suitable for dealing with trended or seasonal data.
  • Unresponsive to new data changes.

Summarize

See here for the moving average method first!

Guess you like

Origin blog.csdn.net/weixin_42475060/article/details/131289832