LSTM prediction algorithm for computer graduates - stock prediction, weather prediction, house price prediction

0 Introduction

Today the senior will introduce to you the basics of LSTM

LSTM-based prediction algorithm - stock prediction, weather prediction, house price prediction

1 Using LSTM network for time series prediction based on Keras

Time series forecasting is a relatively difficult forecasting problem.

Unlike common regression prediction models, "serial dependence" between input variables adds complexity to time series problems.

A type of neural network that is specifically designed to handle sequence dependencies is called Recurrent Neural Networks (RNN). Due to its excellent performance during training, Long Short-Term Memory Network (LSTM) is a type of recurrent neural network (RNN) widely used in deep learning.

In this article, we will introduce how to use the keras deep learning package in R to build an LSTM neural network model to achieve time series prediction.

  • How to establish a corresponding LSTM network for time series prediction problems based on regression, window method and time step.
  • For very long sequences, how to maintain the network's state (memory) about the sequence when building an LSTM network and using the LSTM network to make predictions.

2 Long and short memory network

Long short memory networks, or LSTM networks, are a type of recurrent neural network (RNN) that overcomes the vanishing gradient problem by "backpropagating through time" during training.

LSTM networks can be used to build large-scale recurrent neural networks to handle complex sequence problems in machine learning and achieve good results.

In addition to neurons, LSTM networks also have memory modules between neural network layers.

A memory module has a special composition that makes it "smarter" than traditional neurons and can generate memories for the previous and next parts of the sequence. Modules have different "gates" to control the status and output of the module. Once an input sequence is received and processed, each gate in the module uses an S-shaped activation unit to control whether it is activated, thereby changing the module state and adding information (memory) to the module.

An activation unit has three types of gates:

  • Forget Gate: Decide what information to discard.
  • Input Gate: Determines which values ​​in the input are used to update the memory state.
  • Output Gate: determines the output value based on the input and memory status.

Each activation unit is like a mini state machine, and the weights of each gate in the unit are obtained through training.

3 LSTM network structure and principle

Long short term memory, what we call LSTM, is specifically designed to solve a long-standing problem. All RNNs have a chain form of repeated neural network modules. In standard RNN, this repeated structural module has only a very simple structure, such as a tanh layer.

Insert image description here

LSTM has the same structure, but the repeated modules have a different structure. Instead of a single neural network layer, there are four, interacting in a very specific way.

Insert image description here

Don't worry about the details here. We will analyze the LSTM parsing graph step by step. Now, let's get familiar with the icons of the various elements used in the diagram.

Insert image description here

In the above illustration, each black line carries an entire vector from the output of one node to the input of other nodes. The pink circle represents pointwise operations, such as the sum of vectors, and the yellow matrix is ​​the learned neural network layer. The lines that come together represent the connection of the vectors, and the lines that separate represent the content being copied and then distributed to different locations.

3.1 Core idea of ​​LSTM

The key to LSTM lies in the entire state of the cell (as shown below) and the horizontal line passing through the cell.

The cellular state resembles a conveyor belt. Operates directly on the entire chain, with only a few small linear interactions. It would be easy for information to flow on it and remain the same.

Insert image description here
The gate can selectively let information pass through, mainly through a sigmoid neural layer and a point-by-point multiplication operation.

Insert image description here
Each element of the sigmoid layer output (which is a vector) is a real number between 0 and 1, representing the weight (or proportion) of letting the corresponding information pass. For example, 0 means "let no information pass" and 1 means "let all information pass".

LSTM achieves information protection and control through three such structures. These three gates are input gate, forget gate and output gate.

3.2 Forgetting Gate

The first step in our LSTM is to decide what information we will discard from the cell state. This decision is made through a layer called the forget gate. The gate reads the sum and outputs a value between 0 and 1 for each number in the cell state. 1 means "keep it completely" and 0 means "discard it completely".

Let’s go back to the language model example of predicting the next word based on what we’ve seen. In this problem, the cell state may contain the gender of the current subject, so the correct pronoun can be chosen. When we see a new subject, we want to forget the old subject.

Insert image description here
in

Insert image description here

It represents the output of the hidden layer at the previous moment,

Insert image description here

Represents the input of the current cell. σ represents the sigmod function.

3.3 Input gate

The next step is to decide how much new information to add to the cell state. Implementing this requires two steps: first, a sigmoid layer called the "input gate layer" determines which information needs to be updated; a tanh layer generates a vector, which is the alternative content for updating. In the next step, we combine these two parts to update the cell's status.

Insert image description here

3.4 Output gate

Ultimately, we need to determine what value to output. This output will be based on our cell state, but also a filtered version. First, we run a sigmoid layer to determine which part of the cell state will be output. Next, we pass the cell state through tanh (getting a value between -1 and 1) and multiply it with the output of the sigmoid gate, and we end up outputting only the part of the output we determined.

In the language model example, since it sees a pronoun, it may need to output information related to a verb. For example, it might be possible to output whether the pronoun is singular or negative, so that if it is a verb, we also know the inflection that the verb needs to undergo.

Insert image description here

4 Weather prediction based on LSTM

4.1 Dataset

Insert image description here

As shown above, observations are recorded every 10 minutes, there are 6 observations in an hour, and 144 (6x24) observations in a day.

Given a specific time, let's say you want to predict the temperature for the next 6 hours. To make this prediction, a 5-day observation period was chosen. Therefore, create a window containing the last 720 (5x144) observations to train the model.

The function below returns the above time window for model training. The parameter history_size is the sliding window size of past information. target_size is the future time step that the model needs to learn to predict, and also serves as the label that needs to be predicted.

The first 300,000 rows of the data are used as the training data set below, and the rest are used as the validation data set. A total of about 2100 days of training data.

4.2 Forecast example

In the multi-step prediction model, given past sample values, a series of future values ​​are predicted. For multi-step models, the training data again includes the past five days of records sampled hourly. However, the model here needs to learn to predict the temperature for the next 12 hours. Since the data is sampled every 10 minutes, the output is 72 predicted values.

future_target = 72
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
                                                 TRAIN_SPLIT, past_history,
                                                 future_target, STEP)
x_val_multi, y_val_multi = multivariate_data(dataset, dataset[:, 1],
                                             TRAIN_SPLIT, None, past_history,
                                             future_target, STEP)

Partition the data set

train_data_multi = tf.data.Dataset.from_tensor_slices((x_train_multi, y_train_multi))
train_data_multi = train_data_multi.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()

val_data_multi = tf.data.Dataset.from_tensor_slices((x_val_multi, y_val_multi))
val_data_multi = val_data_multi.batch(BATCH_SIZE).repeat()

Plot sample point data

def multi_step_plot(history, true_future, prediction):
    plt.figure(figsize=(12, 6))
    num_in = create_time_steps(len(history))
    num_out = len(true_future)

    plt.plot(num_in, np.array(history[:, 1]), label='History')
    plt.plot(np.arange(num_out)/STEP, np.array(true_future), 'bo',
           label='True Future')
    if prediction.any():
        plt.plot(np.arange(num_out)/STEP, np.array(prediction), 'ro',
                 label='Predicted Future')
    plt.legend(loc='upper left')
    plt.show()
for x, y in train_data_multi.take(1):
  multi_step_plot(x[0], y[0], np.array([0]))

Insert image description here

The task here is a little more complex than the previous one, so the model now consists of two LSTM layers. Finally, since we need to predict the next 12 hours of data, the Dense layer will output 72.

multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(32,
                                          return_sequences=True,
                                          input_shape=x_train_multi.shape[-2:]))
multi_step_model.add(tf.keras.layers.LSTM(16, activation='relu'))
multi_step_model.add(tf.keras.layers.Dense(72))

multi_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(clipvalue=1.0), loss='mae')

train

multi_step_history = multi_step_model.fit(train_data_multi, epochs=EPOCHS,
                                          steps_per_epoch=EVALUATION_INTERVAL,
                                          validation_data=val_data_multi,
                                          validation_steps=50)

Insert image description here

Insert image description here

5 Stock price prediction based on LSTM

5.1 Dataset

Stock data has a total of nine dimensions, namely

Insert image description here

5.2 Implementation code

import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
plt.rcParams['font.sans-serif']=['SimHei']#显示中文
plt.rcParams['axes.unicode_minus']=False#显示负号

def load_data():
    test_x_batch = np.load(r'test_x_batch.npy',allow_pickle=True)
    test_y_batch = np.load(r'test_y_batch.npy',allow_pickle=True)
    return (test_x_batch,test_y_batch)

#定义lstm单元
def lstm_cell(units):
    cell = tf.contrib.rnn.BasicLSTMCell(num_units=units,forget_bias=0.0)#activation默认为tanh
    return cell

#定义lstm网络
def lstm_net(x,w,b,num_neurons):
    #将输入变成一个列表,列表的长度及时间步数
    inputs = tf.unstack(x,8,1)
    cells = [lstm_cell(units=n) for n in num_neurons]
    stacked_lstm_cells = tf.contrib.rnn.MultiRNNCell(cells)
    outputs,_ =  tf.contrib.rnn.static_rnn(stacked_lstm_cells,inputs,dtype=tf.float32)
    return tf.matmul(outputs[-1],w) + b

#超参数
num_neurons = [32,32,64,64,128,128]

#定义输出层的weight和bias
w = tf.Variable(tf.random_normal([num_neurons[-1],1]))
b = tf.Variable(tf.random_normal([1]))

#定义placeholder
x = tf.placeholder(shape=(None,8,8),dtype=tf.float32)

#定义pred和saver
pred = lstm_net(x,w,b,num_neurons)
saver = tf.train.Saver(tf.global_variables())

if __name__ == '__main__':

    #开启交互式Session
    sess = tf.InteractiveSession()
    saver.restore(sess,r'D:\股票预测\model_data\my_model.ckpt')

    #载入数据
    test_x,test_y = load_data()

    #预测
    predicts = sess.run(pred,feed_dict={
    
    x:test_x})
    predicts = ((predicts.max() - predicts) / (predicts.max() - predicts.min()))#数学校准

    #可视化
    plt.plot(predicts,'r',label='预测曲线')
    plt.plot(test_y,'g',label='真实曲线')
    plt.xlabel('第几天/days')
    plt.ylabel('开盘价(归一化)')
    plt.title('股票开盘价曲线预测(测试集)')
    plt.legend()
	plt.show()
    #关闭会话
    sess.close()	

Insert image description here

6 lstm predicts the number of air passengers

data set

airflights passengers dataset download address

https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv

This dataset contains the number of air passengers per month from 1949 to 1960, a total of 12*12=144 numbers.

In the following program, we use the data from 1949-1952 to predict the data of 1953, and the data from 1950-1953 to predict the data of 1954, and so on to train the model.

prediction code

import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
import os
 
# super parameters
EPOCH = 400
learning_rate = 0.01
seq_length = 4   # 序列长度
n_feature = 12   # 序列中每个元素的特征数目。本程序采用的序列元素为一年的旅客,一年12个月,即12维特征。
 
# data
data = pd.read_csv('airline-passengers.csv')   # 共 "12年*12个月=144" 个数据
data = data.iloc[:, 1:5].values        # dataFrame, shape (144,1)
data = np.array(data).astype(np.float32)
sc = MinMaxScaler()
data = sc.fit_transform(data)          # 归一化
data = data.reshape(-1, n_feature)     # shape (12, 12)
 
trainData_x = []
trainData_y = []
for i in range(data.shape[0]-seq_length):
    tmp_x = data[i:i+seq_length, :]
    tmp_y = data[i+seq_length, :]
    trainData_x.append(tmp_x)
    trainData_y.append(tmp_y)
 
# model
class Net(nn.Module):
    def __init__(self, in_dim=12, hidden_dim=10, output_dim=12, n_layer=1):
        super(Net, self).__init__()
        self.in_dim = in_dim
        self.hidden_dim = hidden_dim
        self.output_dim = output_dim
        self.n_layer = n_layer
        self.lstm = nn.LSTM(input_size=in_dim, hidden_size=hidden_dim, num_layers=n_layer, batch_first=True)
        self.linear = nn.Linear(hidden_dim, output_dim)
 
    def forward(self, x):
        _, (h_out, _) = self.lstm(x)  # h_out是序列最后一个元素的hidden state
                                      # h_out's shape (batchsize, n_layer*n_direction, hidden_dim), i.e. (1, 1, 10)
                                      # n_direction根据是“否为双向”取值为1或2
        h_out = h_out.view(h_out.shape[0], -1)   # h_out's shape (batchsize, n_layer * n_direction * hidden_dim), i.e. (1, 10)
        h_out = self.linear(h_out)    # h_out's shape (batchsize, output_dim), (1, 12)
        return h_out
 
train = True
if train:
    model = Net()
    loss_func = torch.nn.MSELoss()
    optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
    # train
    for epoch in range(EPOCH):
        total_loss = 0
        for iteration, X in enumerate(trainData_x):  # X's shape (seq_length, n_feature)
            X = torch.tensor(X).float()
            X = torch.unsqueeze(X, 0)                # X's shape (1, seq_length, n_feature), 1 is batchsize
            output = model(X)       # output's shape (1,12)
            output = torch.squeeze(output)
            loss = loss_func(output, torch.tensor(trainData_y[iteration]))
            optimizer.zero_grad()   # clear gradients for this training iteration
            loss.backward()         # computing gradients
            optimizer.step()        # update weights
            total_loss += loss
 
        if (epoch+1) % 20 == 0:
            print('epoch:{:3d}, loss:{:6.4f}'.format(epoch+1, total_loss.data.numpy()))
    # torch.save(model, 'flight_model.pkl')  # 这样保存会弹出UserWarning,建议采用下面的保存方法,详情可参考https://zhuanlan.zhihu.com/p/129948825
    torch.save({
    
    'state_dict': model.state_dict()}, 'checkpoint.pth.tar')
 
else:
    # model = torch.load('flight_model.pth')
    model = Net()
    checkpoint = torch.load('checkpoint.pth.tar')
    model.load_state_dict(checkpoint['state_dict'])
 
# predict
model.eval()
predict = []
for X in trainData_x:             # X's shape (seq_length, n_feature)
    X = torch.tensor(X).float()
    X = torch.unsqueeze(X, 0)     # X's shape (1, seq_length, n_feature), 1 is batchsize
    output = model(X)             # output's shape (1,12)
    output = torch.squeeze(output)
    predict.append(output.data.numpy())
 
# plot
plt.figure()
predict = np.array(predict)
predict = predict.reshape(-1, 1).squeeze()
x_tick = np.arange(len(predict)) + (seq_length*n_feature)
plt.plot(list(x_tick), predict, label='predict data')
 
data_original = data.reshape(-1, 1).squeeze()
plt.plot(range(len(data_original)), data_original, label='original data')
 
plt.legend(loc='best')
plt.show()

operation result

Insert image description here

Insert image description here

7 finally

Guess you like

Origin blog.csdn.net/HUXINY/article/details/133024003