简单移动平均(Simple moving average)

数据 $p_1, p_2, ..., p_M$ 的窗口为n的简单移动平均：
$\overline{p}_{SM}=\frac{p_M+p_{M-1}+...+p_{M-(n-1)}}{n}=\frac{1}{n}\sum_{i=0}^{n-1}{p_{M-i}}$

迭代的形式：
$\overline{p}_{SM}=\overline{p}_{SM, prev}+\frac{p_M}{n}-\frac{p_{M-N}}{n}$

实现

简单实现

import pandas as pd
s = [1,2,3,5,6,10,12,14,12,30]
# 等价于data = pd.DataFrame(s)
data = pd.Series(s)
# 窗口为3的简单移动平均
smv_3=data.rolling(window=3).mean()

print('时间序列数据:')
print(data)
print('\n窗口为3的简单移动平均:')
print(smv_3)

结果：

时间序列数据:
0     1
1     2
2     3
3     5
4     6
5    10
6    12
7    14
8    12
9    30
dtype: int64

窗口为3的简单移动平均:
0          NaN
1          NaN
2     2.000000
3     3.333333
4     4.666667
5     7.000000
6     9.333333
7    12.000000
8    12.666667
9    18.666667
dtype: float64

平均结果的NaN和数据中的NaN怎么办？

min_periods字段

平均结果的NaN

import pandas as pd
s = [1,2,3,5,6,10,12,14,12,30]
data = pd.Series(s)
# data = pd.DataFrame(s)
# 窗口为3的简单移动平均，min_periods指的是如ugo没有3个数，有2个数都可以做简单移动平均
smv_3=data.rolling(window=3, min_periods=2).mean()

print('时间序列数据:')
print(data)
print('\n窗口为3的简单移动平均:')
print(smv_3)

结果：

时间序列数据:
0     1
1     2
2     3
3     5
4     6
5    10
6    12
7    14
8    12
9    30
dtype: int64

窗口为3的简单移动平均:
0          NaN
1     1.500000
2     2.000000
3     3.333333
4     4.666667
5     7.000000
6     9.333333
7    12.000000
8    12.666667
9    18.666667
dtype: float64

数据中的NaN

import pandas as pd
# 第三个数据是NaN
s = [1,2,None,5,6,10,12,14,12,30]
data = pd.Series(s)
# data = pd.DataFrame(s)
# 窗口为3的简单移动平均
smv_3=data.rolling(window=3, min_periods=2).mean()

print('时间序列数据:')
print(data)
print('\n窗口为3的简单移动平均:')
print(smv_3)

结果：

时间序列数据:
0     1.0
1     2.0
2     NaN
3     5.0
4     6.0
5    10.0
6    12.0
7    14.0
8    12.0
9    30.0
dtype: float64

窗口为3的简单移动平均:
0          NaN
1     1.500000
2     1.500000
3     3.500000
4     5.500000
5     7.000000
6     9.333333
7    12.000000
8    12.666667
9    18.666667
dtype: float64

指数移动平均（Exponential moving average）

数据 $p_1, p_2, ..., p_M$ 的指数移动平均的迭代形式：
$\overline{p}_{EM}=\left\{ \begin{array}{rcl} p_1 && {M=1}\\ \alpha p_M+(1-\alpha)\overline{p}_{SM, prev} && {M \neq 1} \end{array} \right.$

如何理解指数移动平均

把指数移动的公式展开：
$\overline{p}_{EM}=\alpha(p_M+(1-\alpha)p_{M-1}+(1-\alpha)^2p_{M-2}+...+(1-\alpha)^{M-1}p_1)$

当 $M>>1$ 时，他其实就是一个历史数据的指数加权平均：
$\overline{p}_{EM}=\sum_{i=0}^{+\infin}\alpha(1-\alpha)^{i}p_{M-i}$

时间 T	：近->远
值		： p_M->P_1
权重		： w_0->w_{M-1}

我们可以发现权重 $p_{M-i}$ 对应的权重 $w_i=\alpha(1-\alpha)^{i}$ ，我们计算所有权重之和看是否为1： $\sum_{i=0}^{\infin}{w_i}=\alpha(1+(1-\alpha)+(1-\alpha)^2+...)=\alpha\frac{1}{1-(1-\alpha)}=1$ 。

因此，我们用迭代的方式计算指数移动平均相当于把权 $w_i=\alpha(1-\alpha)^i$ 赋给过去的值。

而下面考察权重的变化，如果 $\alpha$ 足够小，则
$w_{1/\alpha}=\alpha(1-\alpha)^{1/\alpha}=\alpha\frac{1}{e}=36.8\%\alpha$

指数移动平均与简单移动平均的关系

可以通过指数移动平均与简单移动平均的关系来知道 $\alpha$ 的选取。

注：下面质心的推导的坐标系是这样的：

0------>1------>2------>...
	   	p_M     p_{M_1}

选取 $\alpha$ 的方法 × 两者的质心相同

简单移动平均的质心为 $\frac{1+N}{2}$
指数移动平均的质心为 $\alpha[1+2(1-\alpha)+3(1-\alpha)^2+...]=\frac{1}{\alpha}$

令二者的质心相同，则 $\frac{1+N}{2}=\frac{1}{\alpha}$ ，即 $\alpha=\frac{2}{N+1}$ 。当我想我的指数移动平均的历史加权的质心和N窗口简单移动平均质心是一样的。

选取 $\alpha$ 的方法 × 指定指数移动平均的质心

令质心为c，则 $c=\frac{1}{\alpha}$ ，即 $\alpha=\frac{1}{c}$ 。

例子，我令最新的数据为质心，则c=1，此时 $\alpha=\frac{1}{1}=1$ ;

注意如果质心选择的原点不一定会有影响

p_M对应坐标1是我们之前的推导方式

0------>1------>2------>...
	   	p_M     p_{M_1}

我们现在变为p_M对应坐标0

0------>1------>2------>...
p_M     p_{M_1}

例子，我令最新的数据为质心，则 $c'=0$ ，此时有 $c=c'+1$ ,则 $\alpha=\frac{1}{c}=\frac{1}{c'+1}=1$ 。

选取 $\alpha$ 的方法 × 最近的N个数所累计的权重和

拖尾的权重{ $w_N, w_{N+1}, ...$ }之和:
$w_{n, ...}=\alpha((1-\alpha)^N+(1-\alpha)^{N+1}, ...)=a(1-\alpha)^N(1+(1-\alpha)+...)=(1-\alpha)^N$

最近的N个数所累计的权重和:
$w_{0, .., n-1}=1-w_{n, ...}=1-(1-\alpha)^N$

如何选取 $\alpha$ ，我们可以说让最近N个数所累计的权重和为刚刚 $\frac{1}{2}$ ，此时 $\frac{1}{2}=1-(1-\alpha)^N$ ，所以 $\alpha=1-e^{\ln(0.5)/N}$ 。
当 $\alpha$ 足够小时, $w_{0, .., n-1}=1-(1-\alpha)^N=1-((1-\alpha)^{1/-\alpha})^{-\alpha N}=1-e^{-\alpha N}$
取 $\alpha$ （足够小），则意味着 $\frac{1}{\alpha}$ 个最近的数据包含了约 $1-e^{-1}=63.2\%$ 的权重

偏差修正

偏差修正的主要目的是为了提高指数加权平均的精确度，主要是针对前期的加权平均值的计算精度。

下面的例子中 $\alpha=0.1;\beta=1-\alpha=0.9$ 。观察到的序列为{40, 50}，而的计算的指数平均为{4, 7.1}，这样的结果精度是很差的。
在这里插入图片描述

下面我们进行修正偏差，得到修正指数平均 $v_t'$ ：
$v_t'=\frac{v_t}{1-\beta^t}$

因此，

原序列：{40, 50}
指数平均序列：{4, 7.1}
修正偏差后的指数平均序列：{40, 37.37}

实现

Series.ewm(com=None, span=None, halflife=None, alpha=None, min_periods=0, adjust=True, ignore_na=False, axis=0)

$\alpha$ 选择参数

span, com, halflife分别对应前面所说的选择方法

com : float. optional (注意，这里的com，第一个元素是对应0的)
Center of mass: \alpha = 1 / (1 + com),

span : float, optional
Specify decay in terms of span, \alpha = 2 / (span + 1)

halflife : float, optional
Specify decay in terms of halflife, \alpha = 1 - exp(log(0.5) / halflife)

基本实现

# adjust=True的情况
import pandas as pd
s = [40, 50]
data = pd.Series(s)
# data = pd.DataFrame(s)
# span=19对应a=0.1
emv=data.ewm(span=19).mean()
ans_1=0.1*50+0.9*40
print('时间序列数据:')
print(data)
print('\na=0.1指数移动平均:')
print(emv)
print('ans of time 1:', ans_1)

时间序列数据:
0    40
1    50
dtype: int64

a=0.1指数移动平均:
0    40.000000
1    45.263158
dtype: float64
ans of time 1: 41.0

我们发现计算结果，跟我们推到的结果ans_1不一致。原因在于，我们之前关于移动评价的公式是假设权重之和为1,但这只有是无穷级数求和的结果。

真正的公式(即adjust=True)应该是
$y_t = \frac{x_t+(1-\alpha)x_{t-1}+(1-\alpha)^2x_{t-2}+...+(1-\alpha)^tx_0}{1+(1-\alpha)+(1-\alpha)^2+...+(1-\alpha)^t}$
快速迭代的公式(即adjust=False)应该是
$y_0=x_0, \\ y_t=(1-\alpha)y_{t-1}+\alpha x_t$
只有在t足够大时，adjust=False的结果才会接近adjust=True的结果，而adjust=False计算得快，但有一定精度损失

import pandas as pd
s = [40, 50]
data = pd.Series(s)
# data = pd.DataFrame(s)
# span=19对应a=0.1
emv=data.ewm(span=19, adjust=False).mean()
ans_1=0.1*50+0.9*40
print('时间序列数据:')
print(data)
print('\na=0.1指数移动平均:')
print(emv)
print('ans of time 1:', ans_1)

结果：

时间序列数据:
0    40
1    50
dtype: int64

a=0.1指数移动平均:
0    40.0
1    41.0
dtype: float64
ans of time 1: 41.0

数据中有NaN怎么办？

参考前面的公式

When ignore_na is False (default), weights are based on absolute positions. For example, the weights of x and y used in calculating the final weighted average of [x, None, y] are (1-alpha)**2 and 1 (if adjust is True), and (1-alpha)**2 and alpha (if adjust is False).
When ignore_na is True (reproducing pre-0.15.0 behavior), weights are based on relative positions. For example, the weights of x and y used in calculating the final weighted average of [x, None, y] are 1-alpha and 1 (if adjust is True), and 1-alpha and alpha (if adjust is False).

个人认为，四种搭配中这两种是比较合理的：

1. ignore_na=True, adjust=False
2. ignore_na=False, adjust=True

下面的例子都考虑adjust=False的情况

# ignore_na=True，忽略NaN
import pandas as pd
s = [40, None, None, 50, None]
data = pd.Series(s)
# data = pd.DataFrame(s)
# span=19对应a=0.1
emv=data.ewm(span=19, adjust=False, ignore_na=True).mean()
ans_1=0.1*50+0.9*40
print('时间序列数据:')
print(data)
print('\na=0.1指数移动平均:')
print(emv)
print('ans of time 4:', ans_1)

结果：

时间序列数据:
0    40.0
1     NaN
2     NaN
3    50.0
4     NaN
dtype: float64

a=0.1指数移动平均:
0    40.0
1    40.0
2    40.0
3    41.0
4    41.0
dtype: float64
ans of time 4: 41.0

# ignore_na=False，不忽略NaN
import pandas as pd
s = [40, None, None, 50, None]
data = pd.Series(s)
# data = pd.DataFrame(s)
# span=19对应a=0.1
emv=data.ewm(span=19, adjust=False, ignore_na=False).mean()
ans_1=(0.1*50+(0.9**3)*40)/(0.1+0.9**3)
print('时间序列数据:')
print(data)
print('\na=0.1指数移动平均:')
print(emv)
print('ans of time 4:', ans_1)

结果:

时间序列数据:
0    40.0
1     NaN
2     NaN
3    50.0
4     NaN
dtype: float64

a=0.1指数移动平均:
0    40.000000
1    40.000000
2    40.000000
3    41.206273
4    41.206273
dtype: float64
ans of time 4: 41.20627261761158

二元移动窗口函数

一些统计计算符，比如相关性和协方差，需要在两个时间序列上进行计算。例如，经济分析通常喜欢比较一只股票与基础指数标普500之间的相关性。

简单移动平均 & 指数移动平均

简单移动平均(Simple moving average)

实现

简单实现

平均结果的NaN和数据中的NaN怎么办？

平均结果的NaN

数据中的NaN

指数移动平均（Exponential moving average）

如何理解指数移动平均

指数移动平均与简单移动平均的关系

选取 $\alpha$ 的方法 × 两者的质心相同

选取 $\alpha$ 的方法 × 指定指数移动平均的质心

注意如果质心选择的原点不一定会有影响

选取 $\alpha$ 的方法 × 最近的N个数所累计的权重和

偏差修正

实现

$\alpha$ 选择参数

基本实现

数据中有NaN怎么办？

二元移动窗口函数

更多

猜你喜欢

简单移动平均 & 指数移动平均

简单移动平均(Simple moving average)

实现

简单实现

平均结果的NaN和数据中的NaN怎么办？

平均结果的NaN

数据中的NaN

指数移动平均（Exponential moving average）

如何理解指数移动平均

指数移动平均与简单移动平均的关系

选取 α \alpha α的方法 × 两者的质心相同

选取 α \alpha α的方法 × 指定指数移动平均的质心

注意如果质心选择的原点不一定会有影响

选取 α \alpha α的方法 × 最近的N个数所累计的权重和

偏差修正

实现

α \alpha α选择参数

基本实现

数据中有NaN怎么办？

二元移动窗口函数

更多

猜你喜欢

选取 $\alpha$ 的方法 × 两者的质心相同

选取 $\alpha$ 的方法 × 指定指数移动平均的质心

选取 $\alpha$ 的方法 × 最近的N个数所累计的权重和

$\alpha$ 选择参数