[Data analysis] Predictive analysis using machine learning algorithms (1): Moving Average (2021-01-14)

Machine learning method in time series forecasting (1): Moving Average (Moving Average)

1. Background introduction

If possible, everyone wants to be a prophet and predict what will happen in the future. In fact, this kind of prediction is very difficult. Imagine someone who knows the direction of the market in advance, then he will become a billionaire. But people are always working hard in this direction, especially in today's rapid development of science and technology, predicting the future is no longer vain and nonsense. Machine learning algorithms provide us with new ideas for prediction. Forecasting and modeling based on time series play an important role in data mining and analysis.

We hope to use machine learning algorithms based on time series models to predict the development trends of stocks, supermarket sales, and ticket ordering. The forecast is not blind, but based on certain historical data. For example, it is necessary to predict the next sales situation of a certain supermarket to provide better guidance for the purchase, and to ensure that the daily sales of goods are sufficient. Or to predict the growth of a certain stock to maximize its gains and minimize its losses, then we must analyze their historical data.

This article and the next five articles will use 6 methods for data prediction analysis. For these 6 machine learning or deep learning algorithms mentioned, I will introduce them in detail in a future article. The main purpose of these articles is to introduce how to apply these methods to time series forecasting, which is more biased towards solving practical problems.

2. Data set

The data set and the code in the article are placed on my GitHub, friends who need it can download it by themselves: https://github.com/Beracle/02-Stock-Price-Prediction.git

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

#setting figure size
from matplotlib.pylab import rcParams
rcParams['figure.figsize'] = 20,10

#for normalizing data
from sklearn.preprocessing import MinMaxScaler
scaler = MinMaxScaler(feature_range=(0, 1))

Import Data.

df = pd.read_csv('NSE-TATAGLOBAL11.csv')
df.head()

Insert picture description here
You can see that there are multiple variables in the data set: date, opening price, highest price, lowest price, last transaction, closing price, total transaction volume and turnover.

  • Open and Close represent the beginning and final price of the stock on a certain day.
  • High, Low and Last represent the highest, lowest and last price of the stock for the day.
  • Total Trade Quantity (Total Trade Quantity) is the number of stocks purchased or sold that day, and Turnover is the company's turnover on that day.

Note that the market is closed on weekends and public holidays. In the above table, some date values ​​are missing, such as 2018-10-02 (national holiday), 2018-10-06 (weekend), and 2018-10-07 (weekend).

The profit and loss calculation is usually determined by the closing price of the stock that day, so we treat the closing price as the target variable.

Use date as index. (This step is the key to time series forecasting technology)

#setting index as date
df['Date'] = pd.to_datetime(df.Date, format = '%Y-%m-%d')
df.index = df['Date']

Take a look at the time series trend graph of the original data.

plt.figure(figsize=(16,8))
plt.plot(df['Close'], label='Close Price history')

Insert picture description here

3. Moving Average

"Average" is one of the most common things in our daily lives. For example, calculate the average score to determine overall performance, or find the average temperature over the past few days to understand the current temperature.

Moving Average (Moving Average) is used to measure the direction of the current trend. Moving average is consistent with the concept of average in the general sense, and both are mathematical results obtained by calculating the average of past data. Moving averages are often used for forecasting in the financial field. The calculated average results are drawn as icons so that smooth data can be observed instead of focusing on the daily price fluctuations inherent in all financial markets. Moving average can filter high-frequency noise, reflecting the mid-to-long-term low-frequency trend, and assisting investors in making investment judgments. Instead of using simple averages, "moving" averages use moving average techniques, which use the latest set of values ​​for each forecast. In other words, for each subsequent step, the predicted value is considered while removing the oldest observation from the set. The data set is constantly "moving".
Insert picture description here
This calculation method ensures that only current information is accounted for. In fact, any moving average algorithm will show a certain degree of lag. It trades smoothness at the cost of hysteresis, and moving average must be a trade-off between smoothness and hysteresis.

In order not to destroy the original data set, we redefine a DataFrame.

#creating dataframe with date and the target variable
data = df.sort_index(ascending=True, axis=0)
new_data = pd.DataFrame(index=range(0,len(df)),columns=['Date', 'Close'])

for i in range(0,len(data)):
     new_data['Date'][i] = data['Date'][i]
     new_data['Close'][i] = data['Close'][i]

new_data

Insert picture description here
Divide the data into "training set" and "test set".

# splitting into train and validation
train = new_data[:987]
valid = new_data[987:]

print('Shape of training set:')
print(train.shape)
print('Shape of validation set:')
print(valid.shape)

Insert picture description here
Moving average method. The moving window is set to 248, so if you want to get the first prediction data, you need to start with the 739th number in the original data set and start the sum and average.

# making predictions 
preds = [] #移动平均求出的预测集
for i in range(0,valid.shape[0]):
    a = train['Close'][len(train)-248+i:].sum() + sum(preds) #从739开始往后做移动平均
    b = a/248 #移动窗口设置为248
    preds.append(b)

Look at the effect through the Root Mean Square Error (RMSE).
Insert picture description here

# checking the results (RMSE value) 
# 比如RMSE=10,可以认为回归效果相比真实值平均相差10
rms=np.sqrt(np.mean(np.power((np.array(valid['Close'])-preds),2)))
print('RMSE value on validation set:')
print(rms)

Insert picture description here
Look at the prediction effect intuitively through the graph.

valid['Predictions'] = 0
valid['Predictions'] = preds
plt.plot(train['Close'])
plt.plot(valid[['Close', 'Predictions']])

Insert picture description here
It can be seen that the prediction effect of the moving average algorithm on the data selected in this article is not good.

Guess you like

Origin blog.csdn.net/be_racle/article/details/112600268