R and money games: US stocks and ARIMA forecasting

Seem sudden, it seems reasonable, Mr. Buffett and we witness together again, and again, one double, one of Wall Street Cheese blown. Living in the tide of history, tiny we can not help asking: what after that? There will be Li again? So there will be this record: to predict the trend of US stocks by using ARIMA model.

1. Get Train Dataset and Test Dataset


In the present example is simply the first quarter of 2020, Dow's closing price data set (data sources Yahoo! Finance), the front 95% of the predicted data is used in this training set, as later 5% of the data in this predictive test set.
library(quantmod)
stock <- getSymbols("^DJI", from="2020-01-01", from="2020-03-31", auto.assign=FALSE)
names(stock) <- c("Open", "High", "Low", "Close", "Volume", "Adjusted")
stock <- stock$Close
stock <- na.omit(stock)
train.id <- 1: (0.95*length(stock))
train <- stock[train.id]
test <- stock[-train.id]

2. Stationarity Test


Since the input data is required of ARIMA time series stationary. If input data is non-stationary time series, the need for data smoothing process. Identifying whether a data set stationary time series, the present example by two methods: 1) brute observation; 2) white noise test.

In fact, for many times again blown down the Dow is down, leaving aside the various methods of observation and testing, we all know that he is non-stationary time series up. Here are two ways to make a version is: when we encounter less obvious how the time series can do?

2.1 Observational Method

The Cliffs downward curves indicate the non-stationary time series training set.

library(ggplot2)
library(scales)
plot<-ggplot(data=train) +
      geom_line(aes(x=as.Date(Index), y=Close), size=1, color="#0072B2")+
      scale_x_date(labels=date_format("%m/%d/%Y"), breaks=date_breaks("2 weeks"))+
      ggtitle("Dow Jones Industrial Average") +
      xlab("")+
      theme_light()
print(plot)

01

2.2 Ljung-Box Test Statistics

Ljung-Box test obtained using a p-value = 2.2e-16 <0.05, whereby the time sequence is rejected assuming white noise.

Box.test(train, lag=1, type = "Ljung-Box")

3. Differencing


We found that the above-described present non-stationary time series training set, we use differential it a smooth process. After the training set, respectively second order differential and difference, in fact, can not easily see the first difference sequence and a smooth second order difference whether an order from FIG. So we ADF test was carried out. From the test results can be seen: Original sequence: p-value = 0.5336> 0.05, it is assumed stationary reject sequence; first difference: p-value = 0.4495> 0.05, it is assumed stationary reject sequence; second order difference: p- value = 0.01 <0.05, which is assumed to accept a stationary sequence.

So we will use its second-order difference sequence ARIMA forecasting.

library("tseries")
train.diff1 <- diff(train, lag = 1, differences = 1)
train.diff2 <- diff(train, lag = 1, differences = 2)
adf.test(train)
adf.test(na.exclude(train.diff1))
adf.test(na.exclude(train.diff2))

02

4. ARIMA Model


4.1 Choosing the order

When we determine to predict with a second-order difference sequence, you will need the model order selection. As shown below, for the ACF, in step 1-2 the hysteresis outside two standard deviations, so q = 2; for PACF, also the lag in the outer 1-2 Order 2 times the standard deviation, so p = 2, so the We will choose the model ARIMA (2,2,2).

acf <- acf(na.omit(train.data.diff2$Close), plot=TRUE)
pacf <- pacf(na.omit(train.data.diff2$Close), plot=TRUE)

03
04

In order to ensure optimal model is selected, it is recommended more choices close to the model, and then select the best model according to AIC or BIC criteria guidelines. For example using the method given automatic order, we derive a model ARIMA (1,1,0)

library(forecast)
auto.arima(train.data,trace=TRUE) #Best model is ARIMA(1,1,0)

By comparison found or model ARIMA (2,2,2) Jiaoyou:

data.autofit<-arima(train.data,order=c(1,1,0))
AIC(data.autofit)
BIC(data.autofit)
data.fit<-arima(train.data,order=c(2,2,2))
AIC(data.fit)
BIC(data.fit)
Model AIC BIC
ARIMA(1,1,0) 930.5894 934.6755
ARIMA(2,2,2) 919.8881 930.0149
4.2 Model Validation

A fitting residual is white noise test, to give p-value = 0.8221> 0.05, and acf decreases rapidly after the lag = 1, the residual white noise can be obtained.

forecast <-forecast(data.fit, h=4, level=c(99.5))
forecast.data <- data.frame("Date"=index(train), "Input"=forecast$x, "Fitted"=forecast$fitted, "Residuals"=forecast$residuals)
acf(forecast.data$Residuals)
Box.test(forecast.data$Residuals, lag=sqrt(length(forecast.data$Residuals)), type = "Ljung-Box")

05
07

We will fit the training set data and data simultaneously drawn on the map, you can see the difference between the two is in the acceptable range.
06

4.3 Forecast and Test Data

The comparison of predicted results with the test set, both the maximum relative error of 0.056, showing that the expression of the model is adequate, good prediction results.

08
09

5. Forecast


Above has been found suitable predictive models, so you can use this model ARIMA (2,2,2) to predict the next five days, the trend of the Dow. Will predict the future Dow 22500 fluctuate slightly mean (downward trend), the fluctuation range of about 16000-26000. Simply put, this model is forecasting the outlook is not optimistic.
data.forecast<-arima(stock,order=c(2,2,2))

10
11

Guess you like

Origin www.cnblogs.com/yukiwu/p/12620739.html