Time series preprocessing
Stationarity test
Characteristic statistics
-
Probability distributions
-
Distribution function
-
Density function
-
-
Characteristic statistics
(study sequence low-order moments)- Mean
- variance
- Autocovariance function
- Autocorrelation coefficient
Definition of stationary time series
-
Yan Pingyuan
-
Wide and steady
- Defined by characteristic statistics
Statistical properties of stationary time series
-
Constant mean
- Each statistic has a large number of sample observations
- Reduce the number of random variables and increase the sample variables of variables to be estimated
- Simplify the difficulty of statistical analysis and improve the estimation accuracy of feature statistics
-
The autocovariance function and autocorrelation function only depend on the translation length of time and have nothing to do with the start and end points of time
-
Delay k autocovariance function
-
Autocorrelation coefficient
- Normative
- symmetry
- Non-negative qualitative
- Non-uniqueness
-
-
The meaning of stationary time series
-
Data structure of traditional statistical analysis
- Limited number of variables, each variable has multiple observations
-
Time series data structure
- Multiple random variables can be listed, and each variable has only one sample observation value
Stationarity test
-
Graph verification method
-
Sequence diagram test
-
Plane two-dimensional coordinate graph x time, y sequence value
-
Obviously increasing trend
- Is not a stationary series
-
Fluctuates along the horizontal line
- Is a stationary series
-
-
-
Autocorrelation graph test
-
Planar two-dimensional coordinate dangling line graph x autocorrelation coefficient y delay period number
-
Obvious fluctuation
- Non-stationary series
-
Fluctuations near the zero axis
- Stationary series
-
-
The operation is simple and widely used, but the conclusion is subjective
-
-
Statistical test method
-
Unit root test
-
Hypothesis-based thinking
- Construct test statistics
- Judge according to the value of the test statistic
-
Pure randomness test
Pure random sequence white noise sequence
- Pure randomness
- Homogeneity of variance
- No analytical value
Non-pure random sequence
Pure randomness test
-
Assumptions
-
Null hypothesis H0
- White Noise
-
Alternative Hypothesis H1
- Non-white noise
-
-
Test statistics
-
Q statistics
- QBP statistics or Q statistics
- QBL statistics or LB statistics
-
-
test
-
P value is significantly greater than 0.05
-
Sequence cannot reject the null hypothesis
- White noise sequence
-
-
#时序图
#2.1 默认格式输出
yield=c(15.2,16.9,15.3,14.9,15.7,15.1,16.7)
#用行输入的方式将7个序列值赋值给向量yield
yield=ts(yield,start=1884)
#指定yield为时序变量,观察值起始时间为1884年,数据频率为年度数据
plot(yield)
#绘制yield的时序图,按R语言默认格式输出
#自定义图形参数
#点线结构参数
#type="p" 点 type="o" 线穿过点
#type="l" 线 type="h" 悬垂线
#type="b" 点连线 type="s" 阶梯线
#散点图
plot(yield,type="p")
#点线图
plot(yield,type="o")
#符号参数
plot(yield,type="o",pch=17)
#连线类型参数
#lty=1 实线 lty=4 点+短虚线
#lty=2 虚线 lty=5 长虚线
#lty=3 点线 lty=6 点+长虚线
plot(yield,lty=2)
#线的宽度参数
#lwd=1 默认宽度
#lwd=k 默认宽度的k倍
#lwd=-k 默认宽度的1/k倍
plot(yield,lwd=2)
#颜色参数
#col=1 col="black" 黑色
#col=2 col="red" 红色
#col=3 col="green" 绿色
#col=4 col="blue" 蓝色
#添加文本
plot(yield,main="1884-1890 年英格兰和威尔士地区小麦平均亩产量",xlab="年份",ylab="亩产量")
#指定坐标轴范围
#指定输出横轴范围
plot(yield,xlim=c(1886,1890))
#指定输出纵轴范围
plot(yield,ylim=c(15,16))
#添加参照线
#添加一条垂线
plot(yield)
abline(v=1887,lty=2)
#添加多条垂直参照线
plot(yield)
abline(v=c(1885,1889),lty=2)
#添加水平线
plot(yield)
abline(h=c(15.5,16.5),lty=2)
#绘制序列自相关图
# acf(x,lag=)
#-x:变量名
#-lag:延迟阶数,若用户不特殊指定的话,系统会根据序列长度自动指定延迟阶数
acf(yield)
#2.1 时序图检验
sha=read.table("E:/data/file4.csv",sep=",",header=T)
output=ts(sha$output,start=1964)
plot(output)
#有明显递增趋势→不满足均值、方差为常数→不是平稳时间序列
#2.2
a=read.table("E:/data/file5.csv",sep=",",header=T)
milk=ts(a$milk,start=c(1962,1),frequency=12)
plot(milk)
#有明显递增趋势及周期性→不是平稳时间序列
#2.3
b=read.table("E:/data/file6.csv",sep=",",header=T)
temp=ts(b$temp,start=1949)
plot(temp)
#最高温度在37度上下波动→是平稳序列
#自相关图检验
#2.1
acf(output,lag=25)
#平面二维悬垂线
#不是平稳时间序列
#2.2
acf(milk)
#不是平稳时间序列
#2.3
acf(temp)
#是平稳序列
#2.4
# rnorm(n=,mean=,sd=)
#n:随机数个数
#mean:均值,缺省值默认为0;
#sd:标准差,缺省值默认为1;
#rnorm函数也可简写为rnorm(n,均值,标准差)
#如果要产生n个服从标准正态分布的随机数,可以简写为rnorm(n)
#标准正态白噪声序列时序图
white_noise<-rnorm(1000)
white_noise<-ts(white_noise)
plot(white_noise)
#白噪声序列样本自相关图
acf(white_noise)
# Box.test函数
# Box.test(x,type=,lag=)
# X:检验统计量类型
# (1)type="Box-Pierce",输出白噪声检验的Q统计量,该统计量为系统默认输出结果。
# (2)type="Ljung-Box",输出白噪声检验的LB统计量。
# -lag:延迟阶数。lag=n 表示输出滞后n阶的白噪声检验统计量,忽略该选项时,默认输出滞后1阶的检验统计量结果。
Box.test(white_noise,lag=6)
Box.test(white_noise,lag=12)
# for函数
# for(x,in,n1:n2) state
# -x:循环变量名
# -n1:n2:给出的循环取值区间
# -state:需要循环执行的命令
for(i in 1:2) print(Box.test(white_noise,lag=6*i))
for(i in 1:2) print(Box.test(temp,lag=6*i))
# 输出有误
for(i in 1:2) print(Box.test(temp,lag=6*i))
#续2.3
for (i in 1:2) print(Box.test(temp,lag=6*1))
#2.5
c=read.table("E:/data/file7.csv",sep=",",header=T)
prop=ts(c$prop,start=1950)
plot(prop) #时序图
acf(prop) #自相关图
for (i in 1:2) print(Box.test(prop,lag=6*1)) #随机性检验(白噪声检验)