Time series preprocessing

Time series preprocessing

Stationarity test

Characteristic statistics

  • Probability distributions

    • Distribution function

    • Density function

  • Characteristic statistics
    (study sequence low-order moments)

    • Mean
    • variance
    • Autocovariance function
    • Autocorrelation coefficient

Definition of stationary time series

  • Yan Pingyuan

  • Wide and steady

    • Defined by characteristic statistics

Statistical properties of stationary time series

  • Constant mean

    • Each statistic has a large number of sample observations
    • Reduce the number of random variables and increase the sample variables of variables to be estimated
    • Simplify the difficulty of statistical analysis and improve the estimation accuracy of feature statistics
  • The autocovariance function and autocorrelation function only depend on the translation length of time and have nothing to do with the start and end points of time

    • Delay k autocovariance function

      • Autocorrelation coefficient

        • Normative
        • symmetry
        • Non-negative qualitative
        • Non-uniqueness

The meaning of stationary time series

  • Data structure of traditional statistical analysis

    • Limited number of variables, each variable has multiple observations
  • Time series data structure

    • Multiple random variables can be listed, and each variable has only one sample observation value

Stationarity test

  • Graph verification method

    • Sequence diagram test

      • Plane two-dimensional coordinate graph x time, y sequence value

        • Obviously increasing trend

          • Is not a stationary series
        • Fluctuates along the horizontal line

          • Is a stationary series
    • Autocorrelation graph test

      • Planar two-dimensional coordinate dangling line graph x autocorrelation coefficient y delay period number

      • Obvious fluctuation

        • Non-stationary series
      • Fluctuations near the zero axis

        • Stationary series
    • The operation is simple and widely used, but the conclusion is subjective

  • Statistical test method

    • Unit root test

    • Hypothesis-based thinking

      • Construct test statistics
      • Judge according to the value of the test statistic

Pure randomness test

Pure random sequence white noise sequence

  • Pure randomness
  • Homogeneity of variance
  • No analytical value

Non-pure random sequence

Pure randomness test

  • Assumptions

    • Null hypothesis H0

      • White Noise
    • Alternative Hypothesis H1

      • Non-white noise
  • Test statistics

    • Q statistics

      • QBP statistics or Q statistics
      • QBL statistics or LB statistics
  • test

    • P value is significantly greater than 0.05

      • Sequence cannot reject the null hypothesis

        • White noise sequence

Map

#时序图
#2.1 默认格式输出
yield=c(15.2,16.9,15.3,14.9,15.7,15.1,16.7)
  #用行输入的方式将7个序列值赋值给向量yield
yield=ts(yield,start=1884)
  #指定yield为时序变量,观察值起始时间为1884年,数据频率为年度数据
plot(yield)
  #绘制yield的时序图,按R语言默认格式输出

#自定义图形参数
#点线结构参数
#type="p"   点     type="o"   线穿过点
#type="l"   线     type="h"    悬垂线
#type="b" 点连线   type="s"    阶梯线

#散点图
plot(yield,type="p")

#点线图
plot(yield,type="o")

#符号参数
plot(yield,type="o",pch=17)

#连线类型参数
#lty=1    实线     lty=4    点+短虚线
#lty=2    虚线     lty=5    长虚线
#lty=3    点线     lty=6    点+长虚线
plot(yield,lty=2)

#线的宽度参数
#lwd=1       默认宽度
#lwd=k     默认宽度的k倍
#lwd=-k   默认宽度的1/k倍
plot(yield,lwd=2)

#颜色参数
#col=1     col="black"     黑色
#col=2     col="red"       红色
#col=3     col="green"     绿色
#col=4     col="blue"      蓝色

#添加文本
plot(yield,main="1884-1890 年英格兰和威尔士地区小麦平均亩产量",xlab="年份",ylab="亩产量")

#指定坐标轴范围
#指定输出横轴范围
plot(yield,xlim=c(1886,1890))

#指定输出纵轴范围
plot(yield,ylim=c(15,16))

#添加参照线
#添加一条垂线
plot(yield)
abline(v=1887,lty=2)

#添加多条垂直参照线
plot(yield)
abline(v=c(1885,1889),lty=2)

#添加水平线
plot(yield)
abline(h=c(15.5,16.5),lty=2)

#绘制序列自相关图
# acf(x,lag=)
#-x:变量名
#-lag:延迟阶数,若用户不特殊指定的话,系统会根据序列长度自动指定延迟阶数

acf(yield)

#2.1 时序图检验
sha=read.table("E:/data/file4.csv",sep=",",header=T)
output=ts(sha$output,start=1964)
plot(output)
#有明显递增趋势→不满足均值、方差为常数→不是平稳时间序列

#2.2 
a=read.table("E:/data/file5.csv",sep=",",header=T)
milk=ts(a$milk,start=c(1962,1),frequency=12)
plot(milk)
#有明显递增趋势及周期性→不是平稳时间序列

#2.3
b=read.table("E:/data/file6.csv",sep=",",header=T)
temp=ts(b$temp,start=1949)
plot(temp)
#最高温度在37度上下波动→是平稳序列

#自相关图检验
#2.1
acf(output,lag=25)
#平面二维悬垂线
#不是平稳时间序列

#2.2
acf(milk)
#不是平稳时间序列

#2.3
acf(temp)
#是平稳序列

#2.4
# rnorm(n=,mean=,sd=)
#n:随机数个数
#mean:均值,缺省值默认为0;
#sd:标准差,缺省值默认为1;
#rnorm函数也可简写为rnorm(n,均值,标准差)
#如果要产生n个服从标准正态分布的随机数,可以简写为rnorm(n)

#标准正态白噪声序列时序图
white_noise<-rnorm(1000)
white_noise<-ts(white_noise)
plot(white_noise)

#白噪声序列样本自相关图
acf(white_noise)

# Box.test函数
# Box.test(x,type=,lag=)
# X:检验统计量类型
# (1)type="Box-Pierce",输出白噪声检验的Q统计量,该统计量为系统默认输出结果。
# (2)type="Ljung-Box",输出白噪声检验的LB统计量。
# -lag:延迟阶数。lag=n 表示输出滞后n阶的白噪声检验统计量,忽略该选项时,默认输出滞后1阶的检验统计量结果。

Box.test(white_noise,lag=6)

Box.test(white_noise,lag=12)

# for函数
# for(x,in,n1:n2) state
# -x:循环变量名
# -n1:n2:给出的循环取值区间
# -state:需要循环执行的命令

for(i in 1:2) print(Box.test(white_noise,lag=6*i))

for(i in 1:2) print(Box.test(temp,lag=6*i))
# 输出有误

for(i in 1:2) print(Box.test(temp,lag=6*i))

#续2.3
for (i in 1:2) print(Box.test(temp,lag=6*1))

#2.5
c=read.table("E:/data/file7.csv",sep=",",header=T)
prop=ts(c$prop,start=1950)
plot(prop)  #时序图
acf(prop)  #自相关图
for (i in 1:2) print(Box.test(prop,lag=6*1))  #随机性检验(白噪声检验)

Guess you like

Origin blog.csdn.net/qq_43210525/article/details/106633215