(A) The combined data from different sources
It should be noted that, due to yields from Wind import (data block type), while the stock data is to use quantmod package crawling (for the Zoo, XTS type), resulting in a data type and time does not match the problem.
First by setting the UTC (Standard Time) to avoid inconsistency time zone (since merged back is based on the index), then the Treasury daily return import, and then converted into xts type of data, and finally to merge according to the index, and finally get two needed variables. See in particular the following:
`read.table '= IR (" Clipboard ", header = T) Sys.setenv (the TZ = " UTC " ) # Set zone R & lt data.frame = (IR [, 2 ]) DATE = as.POSIXlt (IR [,. 1 ] ) rownames (R & lt) = DATE GR = as.xts (R & lt) # into xts type data DATAl = merge.xts (SR_daily, GR / 30, the Join = " Inner " ) DATA2 = merge.xts (GR_daily, GR / 30 , the Join = " inner " ) = Data merge.xts (SR_daily / 100, GR / 30, the Join = " inner " ) # Merge function in addition to the inner parameter, there Outer, left and other parameters x = data1 [1] -data1 [2 ] and = data2 [1] -data2 [2]
(B) the regression to the mean
#回归分析 setSymbolLookup(SZZZ=list(name="000001.ss",src='yahoo')) getSymbols("SZZZ",from=from,to=to) ZR=dailyReturn(na.approx(SZZZ[,4]),type="log") lr=lm(ZR~SR_daily) summary(lr) #异方差检验 library(lmtest) sde=resid(lr) chartSeries(sde) bptest(lr,studentize=FALSE)#Breusch-Pagan test gqtest(lr)#Quandt the Test-Goldfeld # autocorrelation test the ACF (SDE) of pAcF (SDE) dwtest (LR) # Durbin-Watson the Test bgtest (LR) # Breusch-Godfrey the Test Box.test (sde1) # Box-Pierce the Test # normality hist (SDE, nclass = 200 ) ST (SDE) # shapiro.te analysis
(C) quantile regression
As can be seen from the least squares method, minimizing the traditional regression residuals squared highly susceptible to extreme values and belongs to mean regression, this method can not be obtained under different distributions of data relationships, and the mean regression quantile regression the difference is that the parameter estimation, by empowering the residual distribution in different estimation coefficients, belonging to estimate weighted by a minimum, which is a parameter estimation optimization problem solving.
Quantile regression does not consider the same variance, normal distribution assumption has outliers resistance to capture the characteristics of the tail of the distribution and other characteristics, the regression results more robust.
library(quantreg) r2=rq(ZR~SR_daily,tau=c(0.05,0.25,0.5,0.75,0.95)) summary(r2) Z=as.numeric(ZR) S=as.numeric(SR_daily) taus=c(0.05,0.25,0.5,0.75,0.95) plot(Z,S) for(i in 1:length(taus)){ abline(rq(ZR~SR_daily,tau=taus[i])) }
(D) ridge regression - least squares regression with two norm, punishment
In practice, if an insufficient number of samples makes X'X irreversible, or collinearity, we can not use OLS regression, then we can make by adding a penalty function reversible, or at the expense of small deviations (ie, biased estimate ) to reduce the variance. Depending on penalty function, estimation elastic web (ridge regression, Lasso, ENet etc.), non-convex ( SACD, etc.), Minimax bumps ( the MCP, etc.) and the like. Specific ways of how to join the penalty function can self-study. The following is a ridge regression achieved:
library(MASS) S1=SR_daily S2=dailyReturn(na.approx(AB[,4]),type="log") r3=lm.ridge(ZR~S1+S2) plot(lm.ridge(ZR~S1+S2,lambda=seq(0,10,0.5)))#得到岭迹图
select(lm.ridge(ZR~S1+S2,lambda=seq(0,10,0.5)))
(E) Lasso regression - to solve the multicollinearity
library(lars) data=matrix(nrow=length(S1),ncol=2) data[,1]=S1 data[,2]=S2 r4=lars(data,ZR) plot(r4)