Difference | vs time series regression

 

 

summary:

(1) time series and regression analysis of the core difference is that for the assumptions of the data : Regression analysis assumes that each sample data point is independent ; the time series is the use of data between the correlation of prediction. Such as: time series analysis in a base model is AR (Auto-Regressive) model, which uses past data points to predict the future.

(2) Although the AR model (autoregressive model) and linear regression appears to have a lot of similarities. However, due to lack of independence , using linear regression to solve the AR model parameters it would be biased . But because of this solution is the same , so in practical application or use linear regression to approximate AR model.

(3) ignore or assume independence of the data is likely to result in failure of the model . Modeling and forecasting of financial markets in particular, need to pay attention to this.

  This article will first explain the specific differences between the two assumptions of the data, then the AR model (Autoregressive model from back Model) Why Although it seems like a regression analysis, but there are still differences, and finally also mentioned both a common confusion after the financial problems may arise direction.

 

A regression analysis of the data assumptions: independence


 

In regression analysis, we assume that the data are independent of each other 's. This independence is reflected in two aspects: on the one hand, the independent variable (X) is fixed, the value that has been observed, on the other hand, each of the dependent variable (y) is the error term is independent and identically distributed, the linear regression model, the normal error term is independent and identically distributed, zero mean and satisfies, constant variance.

Independence is a concrete manifestation of this data: In the regression analysis, the data sequence can be any exchange . When modeling, you can randomly selected data sequentially to train the model, part of the data can also be randomly selected to split the training set and validation set. Because of this, the validation set, each prediction error value is relatively constant: there will be no accumulation of errors, resulting in low prediction accuracy more.

 

Second, the assumption of time-series data: correlation


 

But for the time series analysis, we must assume that the use of data and correlation . The core reason is that we do not have any other external data to predict the future can only use existing data to. Therefore, we need to assume that there is a correlation between each data point, and find correlations corresponding through modeling, use it to predict future data trends. This is why the classic time series analysis (ARIMA) will come to observe the correlation between data ACF (autocorrelation coefficient) and PACF (partial autocorrelation coefficients).

Time series correlation assumptions directly contrary to the independence assumption regression analysis. In the multi-time series prediction, on the one hand, for independent variable to predict the future may not be true observed, on the other hand, as more and more distant forecast, the error will gradually accumulate: for your long-term future would be better than recent predictions forecasting more uncertain. Thus, time series analysis need to adopt a completely different perspective, to be analyzed in different models.

 

Three, AR model (autoregressive model) and linear regression model "similar" difference &


 

  Time series analysis is a basic model AR (Auto-Regressive) model. It uses past data points to predict the future. For example, AR (1) model with the current time point data of the predicted value in the future, their mathematical relationship may be expressed as:

  Form and its expression is indeed a linear regression model is very similar, or even ordinary AR (n) and the linear regression models have high similarity. The only difference is that the right side of the equation the independent variable (X) became the last of the dependent variable (y). It is precisely because of this small difference, leading to the solution of the two are completely different. In the AR model, since the model independent variables become the past due to the variable, so that between the independent variables and the past errors correlated. And this correlation AR model makes use of a linear model obtained (autoregressive model) solution would be biased estimate (biased).

   For actual proof of these conclusions we need to introduce too many concepts. Here we analyze only a special case of AR (1) model as. Without loss of generality, we can translate the data AR (1) model expressed in the following form:

 

 

 

   For these models, linear regression would give the following estimates:

 

 

   For general linear regression model, since all the arguments it will be deemed to have been observed real value. So when we take the mean, known as the denominator we can, get unbiased conclusions from past observations and the independent nature of the error in the future.

 

 

   But in the time series can not get interest unbiased, because the numerator and denominator interfere with each other. Because the argument can not be regarded as known, and future observations will contact each other and past error term. Thus, correlation of such solutions using a linear model of the AR model obtained would be biased estimate (biased).

  More intuitive analog data may explain the problem [1] . As shown below, the left is the average value when the parameter is really made by data simulation 0.9, a true value can be seen (black line) and simulated values (red line) with a certain gap, but increases as the amount of data the gap is gradually narrowing. The right is really different parameters when the magnitude of the deviation. It can be seen that there has been an error thereof, but the amount of data increases, the error becomes gradually smaller.

 

 

   In fact, we will use an approximate linear regression model to solve the AR model . Because although the result will be biased, but it is consistent with estimates. That is, when the amount of data large enough, to solve the values converge to the true value. Here we do not launched.

 

四、忽视独立性的后果:金融方向的常见错误


 

  希望看到这里你已经弄懂了为什么不能混淆模型的假设:尤其是独立性或相关性的假设。接下来我会说一个我见过的因为混淆假设导致的金融方向的错误。

随着机器学习的发展,很多人希望能够将机器学习和金融市场结合起来。利用数据建模来对股票价格进行预测。他们会用传统的机器学习方法将得到的数据随机的分配成训练集和测试集。利用训练集训练模型去预测股票涨跌的概率(涨或跌的二维分类问题)。然后当他们去将模型应用到测试集时,他们发现模型的表现非常优秀——能够达到80~90%的准确度。但是在实际应用中却没有这么好的表现。

  造成这个错误的原因就是他们没有认识到数据是高度相关的。对于时间序列,我们不能通过随机分配去安排训练集和测试集,否则就会出现“利用未来数据”来预测“过去走向”的问题。这个时候,即使你的模型在你的测试集表现出色,也不代表他真的能预测未来股价的走向。

 

【参考】 

【1】知乎 时间序列和回归分析有什么本质区别?

 

Guess you like

Origin www.cnblogs.com/zwt20120701/p/12192834.html