Regression to the mean, quantile regression, ridge regression, Lasso return

(A) The combined data from different sources

It should be noted that, due to yields from Wind import (data block type), while the stock data is to use quantmod package crawling (for the Zoo, XTS type), resulting in a data type and time does not match the problem.

First by setting the UTC (Standard Time) to avoid inconsistency time zone (since merged back is based on the index), then the Treasury daily return import, and then converted into xts type of data, and finally to merge according to the index, and finally get two needed variables. See in particular the following:

`read.table '= IR (" Clipboard ", header = T) 
Sys.setenv (the TZ = " UTC " ) # Set zone 
R & lt data.frame = (IR [, 2 ]) 
DATE = as.POSIXlt (IR [,. 1 ] ) 
rownames (R & lt) = DATE 
GR = as.xts (R & lt) # into xts type data 
DATAl = merge.xts (SR_daily, GR / 30, the Join = " Inner " ) 
DATA2 = merge.xts (GR_daily, GR / 30 , the Join = " inner " ) = Data merge.xts (SR_daily / 100, GR / 30, the Join = " inner " ) # Merge function in addition to the inner parameter, there Outer, left and other parameters
x = data1 [1] -data1 [2 ] 
and = data2 [1] -data2 [2]

 

(B) the regression to the mean

#回归分析
setSymbolLookup(SZZZ=list(name="000001.ss",src='yahoo'))
getSymbols("SZZZ",from=from,to=to)
ZR=dailyReturn(na.approx(SZZZ[,4]),type="log")
lr=lm(ZR~SR_daily)
summary(lr)

#异方差检验
library(lmtest)
sde=resid(lr)
chartSeries(sde)
bptest(lr,studentize=FALSE)#Breusch-Pagan test
gqtest(lr)#Quandt the Test-Goldfeld 

# autocorrelation test 
the ACF (SDE) 
of pAcF (SDE) 
dwtest (LR) # Durbin-Watson the Test 
bgtest (LR) # Breusch-Godfrey the Test 
Box.test (sde1) # Box-Pierce the Test 

# normality 
hist (SDE, nclass = 200 ) 
ST (SDE) # shapiro.te analysis

(C) quantile regression

As can be seen from the least squares method, minimizing the traditional regression residuals squared highly susceptible to extreme values ​​and belongs to mean regression, this method can not be obtained under different distributions of data relationships, and the mean regression quantile regression the difference is that the parameter estimation, by empowering the residual distribution in different estimation coefficients, belonging to estimate weighted by a minimum, which is a parameter estimation optimization problem solving.

Quantile regression does not consider the same variance, normal distribution assumption has outliers resistance to capture the characteristics of the tail of the distribution and other characteristics, the regression results more robust.

library(quantreg)
r2=rq(ZR~SR_daily,tau=c(0.05,0.25,0.5,0.75,0.95))
summary(r2)

Z=as.numeric(ZR)
S=as.numeric(SR_daily)
taus=c(0.05,0.25,0.5,0.75,0.95)
plot(Z,S)
for(i in 1:length(taus)){
  abline(rq(ZR~SR_daily,tau=taus[i]))
  }

 

 (D) ridge regression - least squares regression with two norm, punishment

In practice, if an insufficient number of samples makes X'X irreversible, or collinearity, we can not use OLS regression, then we can make by adding a penalty function reversible, or at the expense of small deviations (ie, biased estimate ) to reduce the variance. Depending on penalty function, estimation elastic web (ridge regression, Lasso, ENet etc.), non-convex ( SACD, etc.), Minimax bumps ( the MCP, etc.) and the like. Specific ways of how to join the penalty function can self-study. The following is a ridge regression achieved:

library(MASS)
S1=SR_daily
S2=dailyReturn(na.approx(AB[,4]),type="log")
r3=lm.ridge(ZR~S1+S2)
plot(lm.ridge(ZR~S1+S2,lambda=seq(0,10,0.5)))#得到岭迹图
select(lm.ridge(ZR~S1+S2,lambda=seq(0,10,0.5)))

 

 (E) Lasso regression - to solve the multicollinearity

library(lars)
data=matrix(nrow=length(S1),ncol=2)
data[,1]=S1
data[,2]=S2
r4=lars(data,ZR)
plot(r4)

 

Guess you like

Origin www.cnblogs.com/amosding/p/12318015.html