[PyTorch deep learning] Using LSTM to predict the time series of passenger flow in Beijing Metro Xizhimen Station (with source code)

If you need source code and data sets, please like and follow the collection and leave a private message in the comment area~~~

Before time series forecasting, let's briefly introduce the RNN cyclic neural network and LSTM neural network

1. RNN recurrent neural network

For ordinary neural networks, such as multi-layer perceptrons, the previous input is completely unrelated to the next input, which makes it difficult to accurately describe the time relationship in the time series when dealing with time series problems. In order to better deal with time series problems, scholars have proposed a recurrent neural network structure (Recurrent Neural Network), the most basic recurrent neural network consists of an input layer, a hidden layer and an output layer

If the connecting line with the arrow with W is removed, it is an ordinary fully connected neural network. Among them, X is a vector representing the value of the input layer; S is a vector representing the value of the hidden layer, which not only depends on the current input X, but also depends on the value of the hidden layer at the previous moment; O is also a vector, It represents the value of the output layer; U is the weight matrix from the input layer to the hidden layer; V is the weight matrix from the hidden layer to the output layer; W is the value of the hidden layer at the previous moment as the weight matrix input at the current moment

 Expand the RNN structure according to the timeline

After the network receives the input X_t at time t, the value of the hidden layer is S_t, and the output value is O_t. The value of S_t not only depends on X_t, but also depends on S_t−1. Therefore, under the RNN structure, the information at the current moment is as follows A moment will also be input into the network, and the information in the network forms time correlation, which solves the problem of processing time series. What the neural network model "learns" is implicit in the "weights" W. The basic neural network only establishes full connections between layers, and the biggest difference between RNN and them is that the neurons in the same layer also establish full connections at different times, that is, W is related to time

Four structures of RNN:

one to one structure

 

The most typical neural network belongs to the one to one structure, which predicts an output value for a given input value

one to many and many to one structures 

When the input value is one and the output value is multiple, such as inputting a keyword in the network and outputting a poem with this keyword as the theme through the network, it is a one-to-many scenario. When there are multiple input values ​​and one output value, such as inputting a piece of speech to judge the emotion classification of this speech, or inputting past stock information to judge whether it will rise or fall in the future, such classification problems are many to one scenarios.

many to many structure

The first many to many structure is suitable for scenarios such as machine translation and automatic question answering. For example, input a sentence of English and output a sentence of Chinese. Both input and output are sequences. The second many-to-many structure is suitable for classification and named entity labeling of each frame of the video, such as inputting a video and classifying each frame of it

 2. LSTM

The biggest difference between RNN and LSTM lies in the neuron structure distributed in the hidden layer. The neuron structure of traditional RNN is simple, for example, it only contains one activation function layer. The memory unit (Block) of LSTM is more complicated. The state c is added to the LSTM model, which is called the cell state (cell state), which is used to save the long-term state. The key to LSTM is how to control the long-term state c

LSTM uses three control switches, the first switch is responsible for controlling how to continue to save the long-term state c, the second switch is responsible for controlling the input of the immediate state to the long-term state c, and the third switch is responsible for controlling whether to use the long-term state c as the current LSTM input of. The three switches in Figure 1.16 are controlled by three gates. This gate for removing and adding information in the cell state is a method for selectively passing information.

LSTM uses two gates to control the content of the unit state c, one is the forget gate (forget gate), the forget gate determines how much of the unit state c_(t-1) at the previous moment is retained to the current moment c_t, and the other is the input gate (input gate), the input gate determines how much of the input x_t of the network at the current moment is saved to the unit state c_t. The LSTM uses an output gate to control how much of the cell state c_t is output to the current output value h_t of the LSTM. The three gates are introduced one by one below. In the figure below, the yellow rectangle is the learned neural network layer, the pink circle represents the operation operation, and the arrow represents the vector transmission process.

 

W_f is the weight matrix of the forget gate, [ℎ_t−1,x_t] means connecting two vectors into a longer vector, b_f is the bias term of the forget gate, and σ is the sigmoid function

 

The sigmoid function is called the input gate, which determines what value will be updated, the tanh layer creates a new vector of candidate values, and C̃t is added to the state

Under the control of the forget gate, the network can save information from a long time ago, and under the control of the input gate, useless information cannot enter the network

The output gate controls the influence of long-term memory on the current output, which is jointly determined by the output gate and the unit state

The core of LSTM is the state of the unit. The transfer of the unit state is similar to a conveyor belt, which runs directly on the entire time chain. The intermediate value has a small amount of linear interaction, which is convenient for saving relevant information.

 3. PyTorch implements LSTM time series prediction

The following takes the inbound passenger flow data of Xizhimen Station of the Beijing Subway in 2016 as an example with a time granularity of 15 minutes, and uses PyTorch to build an LSTM network to realize the prediction of inbound passenger flow data

After importing the data, the visualization is as follows

 From the output passenger flow map, it can be seen that the passenger flow trend of Xizhimen subway station from morning to night is still very regular, and the passenger flow peaks in the morning and evening can be clearly observed during the day. Then input the processed data into the LSTM model for training, hoping to predict the passenger flow through the LSTM model

Next, start data preprocessing, remove invalid data, and normalize the data to [0, 1]. Normalization of data can improve the convergence speed and accuracy of the model in deep learning. Use the dropna() function to remove the row and column where the null value in the data is located, use astype to change the array type, and manually fix the size of the data value in the data set to [0, 1]

Next, start data preprocessing, remove invalid data, and normalize the data to [0, 1]. Normalization of data can improve the convergence speed and accuracy of the model in deep learning. Use the dropna() function to remove the row and column where the null value in the data is located, use astype to change the array type, and manually fix the size of the data value in the data set to [0, 1]

Divide the training set and the test set, 70% of the data is used as the training set, and 30% of the data is used as the test set

Change the data dimension. For a sample, there is only one sequence, so batch_size=1, we predict the third one based on the first two time granularities, so feature=2

Define the model and regress the output value to the final result of traffic prediction. The first part of the model is a two-layer RNN

The input dimension is 8, and the hidden layer dimension is 50, where the hidden layer dimension can be specified arbitrarily, using the mean square loss function

Start training the model. Here we train for 100 epochs and output the training result every 5 times, that is, the Loss value. The training results are as follows

From the Loss value output during the training process, we can see that the loss value is gradually decreasing, and the model training is quite effective. After the training is completed, switch to the test mode, start to predict the passenger flow and output the prediction results

Use the matplotlib package to draw the actual results and predicted results, where the real data is represented in blue, and the predicted results are represented in orange

 

The passenger flow data predicted by the trained LSTM model can more accurately fit the real passenger flow data, indicating that the time series prediction ability of the LSTM model is relatively good, but it can also be clearly seen that the maximum value and the minimum value LSTM prediction There are still some discrepancies, especially the maximum value, which is quite different, which is also a target direction for model improvement

The last part of the code is as follows

import torch
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
# % matplotlib inline
from torch import nn
from torch.autograd import Variable
data_csv = pd.read_csv("./LSTM.csv")
data_csv.head()
data_csv = data_csv.dropna()
dataset = data_csv.values
dataset = dataset.astype('float32')
max_value = np.max(dataset)
min_value = np.min(dataset)
scalar = max_value - min_value
dataset = list(map(lambda x: (x-min_value) / scalar, dataset))
def create_dataset(dataset, step=8):
    dataX, dataY = [], []
    for i in range(len(dataset) - step):
        a = dataset[i:(i + step)]
        dataX.append(a)
        dataY.append(dataset[i + step])
    return np.array(dataX), np.array(dataY)
train_size = int(len(data_X) * 0.7)
test_size = len(data_X) - train_size
train_X = data_X[:train_size]
train_Y = data_Y[:train_size]
test_X = data__Y.reshape(-1, 1, 1)
test_X = test_X.reshape(-1, 1, 8)
test_Y = test_Y.reshape(-1, 1, 1)

train1 = torch.Module):
    def __init__(self, input_size, hidden_size, output_size=1, num_layers=2):
        super(lstm_linear, self).__init__()
        
        self.lstm = nn.LSTM(input_size, hidden_size, num_layers) # rnn
        self.linear = nn.Linear(hidden_size, output_size) # 回归
        
    def forward(self, x):
        x, _ = self.lstm(x) # (seq, batch, hidden)
        s, b, h = x.shape
        x = x.view(s*b, h) # 转换成线性层的输入格式
        x = self.linear(x)
        x = x.view(s, b, -1)
        return x
# 开始训练
for e .zero_grad()
    loss.backward()
    optimizer.step()
    if (e + 1) % 5 == 0: 
        print('Epoch: {}, Loss: {:.5f}'.format(e + 1, loss.item()))
# 画出实际结果和预测的结果
plt.plot(test2,  label='real')
plt.plot(pred_test,label='prediction')
plt.legend(loc='best')

 It's not easy to create and find it helpful, please like, follow and collect~~~

Guess you like

Origin blog.csdn.net/jiebaoshayebuhui/article/details/130446565