Article directory
0 Introduction
Today the senior will introduce to you the basics of LSTM
LSTM-based prediction algorithm - stock prediction, weather prediction, house price prediction
1 Using LSTM network for time series prediction based on Keras
Time series forecasting is a relatively difficult forecasting problem.
Unlike common regression prediction models, "serial dependence" between input variables adds complexity to time series problems.
A type of neural network that is specifically designed to handle sequence dependencies is called Recurrent Neural Networks (RNN). Due to its excellent performance during training, Long Short-Term Memory Network (LSTM) is a type of recurrent neural network (RNN) widely used in deep learning.
In this article, we will introduce how to use the keras deep learning package in R to build an LSTM neural network model to achieve time series prediction.
- How to establish a corresponding LSTM network for time series prediction problems based on regression, window method and time step.
- For very long sequences, how to maintain the network's state (memory) about the sequence when building an LSTM network and using the LSTM network to make predictions.
2 Long and short memory network
Long short memory networks, or LSTM networks, are a type of recurrent neural network (RNN) that overcomes the vanishing gradient problem by "backpropagating through time" during training.
LSTM networks can be used to build large-scale recurrent neural networks to handle complex sequence problems in machine learning and achieve good results.
In addition to neurons, LSTM networks also have memory modules between neural network layers.
A memory module has a special composition that makes it "smarter" than traditional neurons and can generate memories for the previous and next parts of the sequence. Modules have different "gates" to control the status and output of the module. Once an input sequence is received and processed, each gate in the module uses an S-shaped activation unit to control whether it is activated, thereby changing the module state and adding information (memory) to the module.
An activation unit has three types of gates:
- Forget Gate: Decide what information to discard.
- Input Gate: Determines which values in the input are used to update the memory state.
- Output Gate: determines the output value based on the input and memory status.
Each activation unit is like a mini state machine, and the weights of each gate in the unit are obtained through training.
3 LSTM network structure and principle
Long short term memory, what we call LSTM, is specifically designed to solve a long-standing problem. All RNNs have a chain form of repeated neural network modules. In standard RNN, this repeated structural module has only a very simple structure, such as a tanh layer.
LSTM has the same structure, but the repeated modules have a different structure. Instead of a single neural network layer, there are four, interacting in a very specific way.
Don't worry about the details here. We will analyze the LSTM parsing graph step by step. Now, let's get familiar with the icons of the various elements used in the diagram.
In the above illustration, each black line carries an entire vector from the output of one node to the input of other nodes. The pink circle represents pointwise operations, such as the sum of vectors, and the yellow matrix is the learned neural network layer. The lines that come together represent the connection of the vectors, and the lines that separate represent the content being copied and then distributed to different locations.
3.1 Core idea of LSTM
The key to LSTM lies in the entire state of the cell (as shown below) and the horizontal line passing through the cell.
The cellular state resembles a conveyor belt. Operates directly on the entire chain, with only a few small linear interactions. It would be easy for information to flow on it and remain the same.
The gate can selectively let information pass through, mainly through a sigmoid neural layer and a point-by-point multiplication operation.
Each element of the sigmoid layer output (which is a vector) is a real number between 0 and 1, representing the weight (or proportion) of letting the corresponding information pass. For example, 0 means "let no information pass" and 1 means "let all information pass".
LSTM achieves information protection and control through three such structures. These three gates are input gate, forget gate and output gate.
3.2 Forgetting Gate
The first step in our LSTM is to decide what information we will discard from the cell state. This decision is made through a layer called the forget gate. The gate reads the sum and outputs a value between 0 and 1 for each number in the cell state. 1 means "keep it completely" and 0 means "discard it completely".
Let’s go back to the language model example of predicting the next word based on what we’ve seen. In this problem, the cell state may contain the gender of the current subject, so the correct pronoun can be chosen. When we see a new subject, we want to forget the old subject.
in
It represents the output of the hidden layer at the previous moment,
Represents the input of the current cell. σ represents the sigmod function.
3.3 Input gate
The next step is to decide how much new information to add to the cell state. Implementing this requires two steps: first, a sigmoid layer called the "input gate layer" determines which information needs to be updated; a tanh layer generates a vector, which is the alternative content for updating. In the next step, we combine these two parts to update the cell's status.
3.4 Output gate
Ultimately, we need to determine what value to output. This output will be based on our cell state, but also a filtered version. First, we run a sigmoid layer to determine which part of the cell state will be output. Next, we pass the cell state through tanh (getting a value between -1 and 1) and multiply it with the output of the sigmoid gate, and we end up outputting only the part of the output we determined.
In the language model example, since it sees a pronoun, it may need to output information related to a verb. For example, it might be possible to output whether the pronoun is singular or negative, so that if it is a verb, we also know the inflection that the verb needs to undergo.
4 Weather prediction based on LSTM
4.1 Dataset
As shown above, observations are recorded every 10 minutes, there are 6 observations in an hour, and 144 (6x24) observations in a day.
Given a specific time, let's say you want to predict the temperature for the next 6 hours. To make this prediction, a 5-day observation period was chosen. Therefore, create a window containing the last 720 (5x144) observations to train the model.
The function below returns the above time window for model training. The parameter history_size is the sliding window size of past information. target_size is the future time step that the model needs to learn to predict, and also serves as the label that needs to be predicted.
The first 300,000 rows of the data are used as the training data set below, and the rest are used as the validation data set. A total of about 2100 days of training data.
4.2 Forecast example
In the multi-step prediction model, given past sample values, a series of future values are predicted. For multi-step models, the training data again includes the past five days of records sampled hourly. However, the model here needs to learn to predict the temperature for the next 12 hours. Since the data is sampled every 10 minutes, the output is 72 predicted values.
future_target = 72
x_train_multi, y_train_multi = multivariate_data(dataset, dataset[:, 1], 0,
TRAIN_SPLIT, past_history,
future_target, STEP)
x_val_multi, y_val_multi = multivariate_data(dataset, dataset[:, 1],
TRAIN_SPLIT, None, past_history,
future_target, STEP)
Partition the data set
train_data_multi = tf.data.Dataset.from_tensor_slices((x_train_multi, y_train_multi))
train_data_multi = train_data_multi.cache().shuffle(BUFFER_SIZE).batch(BATCH_SIZE).repeat()
val_data_multi = tf.data.Dataset.from_tensor_slices((x_val_multi, y_val_multi))
val_data_multi = val_data_multi.batch(BATCH_SIZE).repeat()
Plot sample point data
def multi_step_plot(history, true_future, prediction):
plt.figure(figsize=(12, 6))
num_in = create_time_steps(len(history))
num_out = len(true_future)
plt.plot(num_in, np.array(history[:, 1]), label='History')
plt.plot(np.arange(num_out)/STEP, np.array(true_future), 'bo',
label='True Future')
if prediction.any():
plt.plot(np.arange(num_out)/STEP, np.array(prediction), 'ro',
label='Predicted Future')
plt.legend(loc='upper left')
plt.show()
for x, y in train_data_multi.take(1):
multi_step_plot(x[0], y[0], np.array([0]))
The task here is a little more complex than the previous one, so the model now consists of two LSTM layers. Finally, since we need to predict the next 12 hours of data, the Dense layer will output 72.
multi_step_model = tf.keras.models.Sequential()
multi_step_model.add(tf.keras.layers.LSTM(32,
return_sequences=True,
input_shape=x_train_multi.shape[-2:]))
multi_step_model.add(tf.keras.layers.LSTM(16, activation='relu'))
multi_step_model.add(tf.keras.layers.Dense(72))
multi_step_model.compile(optimizer=tf.keras.optimizers.RMSprop(clipvalue=1.0), loss='mae')
train
multi_step_history = multi_step_model.fit(train_data_multi, epochs=EPOCHS,
steps_per_epoch=EVALUATION_INTERVAL,
validation_data=val_data_multi,
validation_steps=50)
5 Stock price prediction based on LSTM
5.1 Dataset
Stock data has a total of nine dimensions, namely
5.2 Implementation code
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import tensorflow as tf
plt.rcParams['font.sans-serif']=['SimHei']#显示中文
plt.rcParams['axes.unicode_minus']=False#显示负号
def load_data():
test_x_batch = np.load(r'test_x_batch.npy',allow_pickle=True)
test_y_batch = np.load(r'test_y_batch.npy',allow_pickle=True)
return (test_x_batch,test_y_batch)
#定义lstm单元
def lstm_cell(units):
cell = tf.contrib.rnn.BasicLSTMCell(num_units=units,forget_bias=0.0)#activation默认为tanh
return cell
#定义lstm网络
def lstm_net(x,w,b,num_neurons):
#将输入变成一个列表,列表的长度及时间步数
inputs = tf.unstack(x,8,1)
cells = [lstm_cell(units=n) for n in num_neurons]
stacked_lstm_cells = tf.contrib.rnn.MultiRNNCell(cells)
outputs,_ = tf.contrib.rnn.static_rnn(stacked_lstm_cells,inputs,dtype=tf.float32)
return tf.matmul(outputs[-1],w) + b
#超参数
num_neurons = [32,32,64,64,128,128]
#定义输出层的weight和bias
w = tf.Variable(tf.random_normal([num_neurons[-1],1]))
b = tf.Variable(tf.random_normal([1]))
#定义placeholder
x = tf.placeholder(shape=(None,8,8),dtype=tf.float32)
#定义pred和saver
pred = lstm_net(x,w,b,num_neurons)
saver = tf.train.Saver(tf.global_variables())
if __name__ == '__main__':
#开启交互式Session
sess = tf.InteractiveSession()
saver.restore(sess,r'D:\股票预测\model_data\my_model.ckpt')
#载入数据
test_x,test_y = load_data()
#预测
predicts = sess.run(pred,feed_dict={
x:test_x})
predicts = ((predicts.max() - predicts) / (predicts.max() - predicts.min()))#数学校准
#可视化
plt.plot(predicts,'r',label='预测曲线')
plt.plot(test_y,'g',label='真实曲线')
plt.xlabel('第几天/days')
plt.ylabel('开盘价(归一化)')
plt.title('股票开盘价曲线预测(测试集)')
plt.legend()
plt.show()
#关闭会话
sess.close()
6 lstm predicts the number of air passengers
data set
airflights passengers dataset download address
https://raw.githubusercontent.com/jbrownlee/Datasets/master/airline-passengers.csv
This dataset contains the number of air passengers per month from 1949 to 1960, a total of 12*12=144 numbers.
In the following program, we use the data from 1949-1952 to predict the data of 1953, and the data from 1950-1953 to predict the data of 1954, and so on to train the model.
prediction code
import numpy as np
import matplotlib.pyplot as plt
import pandas as pd
import torch
import torch.nn as nn
from sklearn.preprocessing import MinMaxScaler
import os
# super parameters
EPOCH = 400
learning_rate = 0.01
seq_length = 4 # 序列长度
n_feature = 12 # 序列中每个元素的特征数目。本程序采用的序列元素为一年的旅客,一年12个月,即12维特征。
# data
data = pd.read_csv('airline-passengers.csv') # 共 "12年*12个月=144" 个数据
data = data.iloc[:, 1:5].values # dataFrame, shape (144,1)
data = np.array(data).astype(np.float32)
sc = MinMaxScaler()
data = sc.fit_transform(data) # 归一化
data = data.reshape(-1, n_feature) # shape (12, 12)
trainData_x = []
trainData_y = []
for i in range(data.shape[0]-seq_length):
tmp_x = data[i:i+seq_length, :]
tmp_y = data[i+seq_length, :]
trainData_x.append(tmp_x)
trainData_y.append(tmp_y)
# model
class Net(nn.Module):
def __init__(self, in_dim=12, hidden_dim=10, output_dim=12, n_layer=1):
super(Net, self).__init__()
self.in_dim = in_dim
self.hidden_dim = hidden_dim
self.output_dim = output_dim
self.n_layer = n_layer
self.lstm = nn.LSTM(input_size=in_dim, hidden_size=hidden_dim, num_layers=n_layer, batch_first=True)
self.linear = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
_, (h_out, _) = self.lstm(x) # h_out是序列最后一个元素的hidden state
# h_out's shape (batchsize, n_layer*n_direction, hidden_dim), i.e. (1, 1, 10)
# n_direction根据是“否为双向”取值为1或2
h_out = h_out.view(h_out.shape[0], -1) # h_out's shape (batchsize, n_layer * n_direction * hidden_dim), i.e. (1, 10)
h_out = self.linear(h_out) # h_out's shape (batchsize, output_dim), (1, 12)
return h_out
train = True
if train:
model = Net()
loss_func = torch.nn.MSELoss()
optimizer = torch.optim.Adam(model.parameters(), lr=learning_rate)
# train
for epoch in range(EPOCH):
total_loss = 0
for iteration, X in enumerate(trainData_x): # X's shape (seq_length, n_feature)
X = torch.tensor(X).float()
X = torch.unsqueeze(X, 0) # X's shape (1, seq_length, n_feature), 1 is batchsize
output = model(X) # output's shape (1,12)
output = torch.squeeze(output)
loss = loss_func(output, torch.tensor(trainData_y[iteration]))
optimizer.zero_grad() # clear gradients for this training iteration
loss.backward() # computing gradients
optimizer.step() # update weights
total_loss += loss
if (epoch+1) % 20 == 0:
print('epoch:{:3d}, loss:{:6.4f}'.format(epoch+1, total_loss.data.numpy()))
# torch.save(model, 'flight_model.pkl') # 这样保存会弹出UserWarning,建议采用下面的保存方法,详情可参考https://zhuanlan.zhihu.com/p/129948825
torch.save({
'state_dict': model.state_dict()}, 'checkpoint.pth.tar')
else:
# model = torch.load('flight_model.pth')
model = Net()
checkpoint = torch.load('checkpoint.pth.tar')
model.load_state_dict(checkpoint['state_dict'])
# predict
model.eval()
predict = []
for X in trainData_x: # X's shape (seq_length, n_feature)
X = torch.tensor(X).float()
X = torch.unsqueeze(X, 0) # X's shape (1, seq_length, n_feature), 1 is batchsize
output = model(X) # output's shape (1,12)
output = torch.squeeze(output)
predict.append(output.data.numpy())
# plot
plt.figure()
predict = np.array(predict)
predict = predict.reshape(-1, 1).squeeze()
x_tick = np.arange(len(predict)) + (seq_length*n_feature)
plt.plot(list(x_tick), predict, label='predict data')
data_original = data.reshape(-1, 1).squeeze()
plt.plot(range(len(data_original)), data_original, label='original data')
plt.legend(loc='best')
plt.show()
operation result