[Data analysis] Predictive analysis using machine learning algorithms (2): Linear Regression (2021-01-14)

Machine learning methods in time series forecasting (2): Linear Regression

This article is the second article in the series of " Machine Learning Methods in Time Series Forecasting ". If you are interested, you can read the previous article first:
[Data Analysis] Predictive analysis using machine learning algorithms (1): Moving Average (Moving Average) Average)

The linear regression model returns an equation that determines the relationship between the independent variable and the dependent variable.
Insert picture description here
Among them, x represents the independent variable, and θ represents the weight. For the stock price prediction problem in this article, we do not have a set of independent variables. We only have dates, so we extract features such as day, month, year, Monday/Friday, etc. from the date column, and then fit a linear regression model.

The source data and code of this article are on my GitHub, friends who need it can download it by themselves: https://github.com/Beracle/02-Stock-Price-Prediction.git

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

Import Data.

df = pd.read_csv('NSE-TATAGLOBAL11.csv')
df.head()

Insert picture description here
First we set the date as the index. In order not to destroy the original data, we define a new data set.

# setting the index as date
df['Date'] = pd.to_datetime(df.Date,format='%Y-%m-%d')
df.index = df['Date']

#creating dataframe with date and the target variable
data = df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close'])

for i in range(0,len(data)):
     new_data['Date'][i] = data['Date'][i]
     new_data['Close'][i] = data['Close'][i]

We use the add_datepart() function to parse the date. If the fastai package is not installed, you can use pip install fastai to install it, if you are in a Jupyter environment, please use! pip install fastai.

#create features
from fastai.tabular import add_datepart
add_datepart(new_data, 'Date')
new_data.drop('Elapsed', axis=1, inplace=True)  #elapsed will be the time stamp
new_data

Insert picture description here
In addition, we can add functions that we think are relevant to forecasting. In this article, my assumption is that the first and last days of the week may have a much greater impact on the closing price of stocks than other days. Therefore, I created a function to determine whether a given day is Monday/Friday or Tuesday/Wednesday/Thursday.

If the day of the week is equal to 0 or 4, the column value will be 1, otherwise it will be 0. Similarly, we can create multiple elements freely.

new_data['mon_fri'] = 0
for i in range(0,len(new_data)):
    if (new_data['Dayofweek'][i] == 0 or new_data['Dayofweek'][i] == 4): #如果是星期一或星期五,列值为1
        new_data['mon_fri'][i] = 1
    else:
        new_data['mon_fri'][i] = 0

Divide the data into a training set and a prediction set to check the performance of the model.

#split into train and validation
train = new_data[:987]
valid = new_data[987:]

x_train = train.drop('Close', axis=1)
y_train = train['Close']
x_valid = valid.drop('Close', axis=1)
y_valid = valid['Close']

Import the linear regression model. Please install the sklearn package through pip or conda first.

#implement linear regression
from sklearn.linear_model import LinearRegression
model = LinearRegression()
model.fit(x_train,y_train)

The prediction effect is tested by "root mean square error".

#make predictions and find the rmse
preds = model.predict(x_valid)
rmse = np.sqrt(np.mean(np.power((np.array(y_valid)-np.array(preds)),2)))
rmse

Insert picture description here
The value of RMSE is higher than the value obtained by the "moving average" method we used before, which indicates that the effect of "linear regression" is poor. It can be seen more intuitively through the drawing.

#plot
valid['Predictions'] = 0
valid['Predictions'] = preds

valid.index = new_data[987:].index
train.index = new_data[:987].index

plt.figure(figsize=(16,8))
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])
plt.show()

Insert picture description here
Obviously, it is not appropriate to use linear regression to predict the data in this article.

Guess you like

Origin blog.csdn.net/be_racle/article/details/112604437