**Table of contents**

Understand the structure of LSTM

LSTM unit with only forget gate

Independent Recurrent (IndRNN) unit

Bidirectional RNN structure (LSTM)

## introduction

LSTM (Long Short-Term Memory) is a commonly used recurrent neural network (RNN) model used to process sequence data and has the ability to remember long and short term. **In time series prediction, LSTM can be used as both a multivariate prediction mechanism and a unit prediction mechanism. **

As a multivariate prediction mechanism, LSTM can process historical data of multiple related variables so that it can predict the future values of these variables. Specifically, we can use the historical data of multiple variables as the input of LSTM, and use the future values of multiple variables as the output of LSTM. During the training process, we can use the error backpropagation algorithm to update the parameters of the LSTM to optimize the prediction performance of the model.

As a unit prediction mechanism, LSTM can predict the future value of a single variable, such as stock price, sales volume, etc. In unit time series forecasting, we need to analyze historical data, identify factors such as trends, seasonality, and cyclicality, and use these factors to predict future values. LSTM can predict future values by learning patterns and regularities in historical data.

The advantage of LSTM as a multivariate prediction mechanism and a unit prediction mechanism is that it can handle long-term dependencies in sequence data, thereby capturing complex patterns and patterns in the data. It can adaptively learn and adjust model parameters to improve the model's prediction performance and generalization ability.

In general, LSTM is widely used as a multivariate prediction mechanism and a unit prediction mechanism, and can be used to predict data in many fields such as stock prices, meteorological data, and traffic flow.

**(There is code at the end of the article that you can copy and paste to run)**

## LSTM prediction renderings

Here I will first show you the prediction effect of LSTM (the prediction here refers to the prediction of unknown data and not the prediction on the test set or verification set), in which the MAE error is 0.15 and the ME error is -0.03.

The error loss diagram is, which is the error diagram of MAE

## LSTM mechanism

LSTM (Long Short-Term Memory) is a **deep learning model** for processing sequence data, a>, which uses an RNN unit similar to a bypass structure. Compared with ordinary RNN, LSTM introduces a gating mechanism, which can handle long-term dependence and short-term memory problems more effectively. It is one of the most commonly used Cells in RNN networks. **Belongs to a variant of Recurrent Neural Network (RNN)**

### Understand the structure of LSTM

LSTM is deliberately designed to learn sequence relationships while avoiding long-term dependency problems. Its structural diagram is shown below.

In the structural diagram of LSTM, each black line transmits an entire vector, from the output of one node to the input of other nodes. **The "+" sign represents the operation (such as the sum of vectors)**, ** and the rectangle represents the learned Neural network layer. **Converging lines represent the connection of vectors, and branching lines represent content being copied and then distributed to different locations.

If the above LSTM structure diagram is difficult for you to understand, but**In fact, the essence of LSTM is a simple RNN with tanh activation function**, as shown in the figure below.

The principle of this structure of LSTM is to introduce a connection called cell state. This state cell is used to store the desired memory (corresponding to h in the simple LSTM structure, except that it no longer only stores the last state, ** but stores those through network learning Useful state), and three gates are added at the same time, namely **.

Forget gate: Decide when to forget the previous state.

Input gate: determines when new states are added.

Output gate: Decide when the state and input need to be output together.

It can be seen literally that due to the operation of the three gates, the update of the LSTM state and whether the state should be used as input are all left to the training mechanism of the neural network to choose.

Let’s introduce the structure and function of the three doors respectively.

#### forget the door

The figure below shows the operation of the forget gate.**The forget gate determines what information the model discards from the cell state**.

The forget gate will read the output of the previous sequence model and the input of the current model to control whether each number in the cell state is retained .

For example: In a language model example,** assume that the cell state will contain the gender of the current subject, so the correct pronoun can be selected based on this state. When we see a new subject, we should update the new subject in our memory. The function of forgetters is to first find the old subject in the memory (it does not actually perform the forgetting operation, it just finds it.**

In the forget gate of the LSTM in the above figure, represents the output of the forget gate, α represents the activation function, represents the weight of the forget gate, represents the input of the current model, represents the output of the previous sequence model, represents the bias of the forget gate.

#### input gate

**The input gate can be divided into two parts. One part is to find those cell states that need to be updated. The other part is to update the information that needs to be updated into the cell state.**

In the structure of the input gate above, represents the cell state to be updated, α represents the activation function, represents the input of the current model, < /span>. represents the bias of the calculation represents the calculation represents the new cell state created using tanh, 's bias, represents the calculation , represents the calculation of the weight of represents the output of the previous sequence model,

After the forget gate finds the information that needs to be forgotten, it multiplies it with the old state and discards the information that is determined to be discarded. (If you need to discard the corresponding position weight and set it to 0), then add * to the result to obtain new information for the cell state. This completes the update of the cell state, as shown in the update diagram of the input gate below.

In the update diagram of the LSTM input gate above, represents the output result of the forget gate, represents the output result of the forget gate, a> represents the cell state of the previous sequence model, represents the cell state to be updated, represents the new cell state created using tanh .

#### output gate

As shown in the output gate structure diagram of LSTM below,**In the output gate, an activation function layer (the Sigmoid activation function is actually used) is used to determine which part of the information Output**, then process the cell state through tanh (get a value between -1~1), and multiply it with the output of the Sigmoid gate to get the final output. That part, for example, in the language model, assuming that a pronoun has been input, it will calculate the need to output a piece of information (word vector) related to the pronoun

In the output gate structure diagram of LSTM, represents the information to be output, α represents the activation function, represents calculation < The weight of a i=3>, represents the bias of calculation , represents the updated cell state, represents the output result of the current sequence model.

### Variations of LSTM

#### LSTM unit with only forget gate

The forget-gate-only JANET unit is also a variant of the LSTM unit and was released in 2018. This unit structure is derived from a bold guess - what will happen when LSTM only has forget gates?

Experiments show that the network performance of only the forget gate is actually better than the standard LSTM unit. Similarly, this optimization method can also be used for other RNN structures.

If you want to know more variations of LSTM, you can check out other related information such as this paper.

#### Independent Recurrent (IndRNN) unit

The independent recurrent unit is a new type of recurrent neural network structural unit structure. ** was released in 2018. Its effect and speed are better than the LSTM unit. **

The IndRNN unit can not only effectively solve the problems of gradient explosion and small gradient in traditional RNN models, but also can better learn long-term dependencies in samples.

When building the model:

You can use IndRNN units in stacking, residual, and fully connected ways to build a deeper network structure;

Using the IndRNN unit with non-saturated activation functions such as ReLU will make the model show better robustness

**Bidirectional RNN structure (LSTM)**

Bidirectional RNN, also known as Bi-RNN, is an RNN model that uses two directions

RNN model is good at processing continuous data. Since it is continuous data, the model can not only learn its forward features,** but also learn its reverse features. , this structure that combines forward and reverse directions will have a higher fitting degree than a one-way cyclic network. **

The processing process of bidirectional RNN is to perform backpropagation on the basis of forward propagation. Both forward propagation and backpropagation are connected to an output layer. This structure provides the output layer with complete past and future contextual information for each point in the input sequence. As shown in the figure, it is a bidirectional recurrent neural network that unfolds along the event (we use LSTM as an example, but it can actually be any other RNN structure)

Bidirectional RNN will have one more hidden layer than single RNN. 6 unique weights are reused at each time step. The 6 weights correspond to: input to forward and backward hidden layers, hidden layer to hidden layer layer itself, forward and backward hidden layers to the output layer.

I will talk about bidirectional RNN (LSTM) in subsequent articles, but here is just a general introduction.

**(In most applications, time series-based analysis and some questions about automatic answering in NLP are generally implemented with bidirectional LSTM and one-way LSTM or RNN horizontal expansion, and the effect is very good)**

## run code

You can assign the following code to a py file and enter your own file prediction file to run.**This is just a simple LSTM model construction, which can be used with many other modules. Combinations such as: attention mechanism (TPA), other network structures (GRU), bidirectional RNN structure, etc. **

```
import time
import numpy as np
import pandas as pd
import torch
import torch.nn as nn
from matplotlib import pyplot as plt
from sklearn.preprocessing import MinMaxScaler
np.random.seed(0)
def calculate_mae(y_true, y_pred):
# 平均绝对误差
mae = np.mean(np.abs(y_true - y_pred))
return mae
true_data = pd.read_csv('ETTh1-Test.csv') # 填你自己的数据地址
target = 'OT'
# 这里加一些数据的预处理, 最后需要的格式是pd.series
true_data = np.array(true_data['OT'])
# 定义窗口大小
test_data_size = 32
# 训练集和测试集的尺寸划分
test_size = 0.15
train_size = 0.85
# 标准化处理
scaler_train = MinMaxScaler(feature_range=(0, 1))
scaler_test = MinMaxScaler(feature_range=(0, 1))
train_data = true_data[:int(train_size * len(true_data))]
test_data = true_data[-int(test_size * len(true_data)):]
print("训练集尺寸:", len(train_data))
print("测试集尺寸:", len(test_data))
train_data_normalized = scaler_train.fit_transform(train_data.reshape(-1, 1))
test_data_normalized = scaler_test.fit_transform(test_data.reshape(-1, 1))
# 转化为深度学习模型需要的类型Tensor
train_data_normalized = torch.FloatTensor(train_data_normalized).view(-1)
test_data_normalized = torch.FloatTensor(test_data_normalized).view(-1)
def create_inout_sequences(input_data, tw, pre_len):
inout_seq = []
L = len(input_data)
for i in range(L - tw):
train_seq = input_data[i:i + tw]
if (i + tw + 4) > len(input_data):
break
train_label = input_data[i + tw:i + tw + pre_len]
inout_seq.append((train_seq, train_label))
return inout_seq
pre_len = 4
train_window = 16
# 定义训练器的的输入
train_inout_seq = create_inout_sequences(train_data_normalized, train_window, pre_len)
class LSTM(nn.Module):
def __init__(self, input_dim=1, hidden_dim=350, output_dim=1):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = x.unsqueeze(1)
h0_lstm = torch.zeros(1, self.hidden_dim).to(x.device)
c0_lstm = torch.zeros(1, self.hidden_dim).to(x.device)
out, _ = self.lstm(x, (h0_lstm, c0_lstm))
out = out[:, -1]
out = self.fc(out)
return out
lstm_model = LSTM(input_dim=1, output_dim=pre_len, hidden_dim=train_window)
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(lstm_model.parameters(), lr=0.001)
epochs = 10
Train = False # 训练还是预测
if Train:
losss = []
lstm_model.train() # 训练模式
for i in range(epochs):
start_time = time.time() # 计算起始时间
for seq, labels in train_inout_seq:
lstm_model.train()
optimizer.zero_grad()
y_pred = lstm_model(seq)
single_loss = loss_function(y_pred, labels)
single_loss.backward()
optimizer.step()
print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')
losss.append(single_loss.detach().numpy())
torch.save(lstm_model.state_dict(), 'save_model.pth')
print(f"模型已保存,用时:{(time.time() - start_time) / 60:.4f} min")
plt.plot(losss)
# 设置图表标题和坐标轴标签
plt.title('Training Error')
plt.xlabel('Epoch')
plt.ylabel('Error')
# 保存图表到本地
plt.savefig('training_error.png')
else:
# 加载模型进行预测
lstm_model.load_state_dict(torch.load('save_model.pth'))
lstm_model.eval() # 评估模式
results = []
reals = []
losss = []
test_inout_seq = create_inout_sequences(test_data_normalized, train_window, pre_len)
for seq, labels in train_inout_seq:
pred = lstm_model(seq)[0].item()
results.append(pred)
mae = calculate_mae(pred, labels.detach().numpy()) # MAE误差计算绝对值(预测值 - 真实值)
reals.append(labels.detach().numpy())
losss.append(mae)
print("模型预测结果：", results)
print("预测误差MAE:", losss)
plt.style.use('ggplot')
# 创建折线图
plt.plot(results, label='real', color='blue') # 实际值
plt.plot(reals, label='forecast', color='red', linestyle='--') # 预测值
# 增强视觉效果
plt.grid(True)
plt.title('real vs forecast')
plt.xlabel('time')
plt.ylabel('value')
plt.legend()
plt.savefig('test——results.png')
```

## Data Format

To run the above code, you only need one column of data, and the final data needs to be of type pd.series. You can add some data processing operations after reading the incoming file.

**(It should be noted that your data should not be disrupted in time order, otherwise the predicted results may be inaccurate) **

## Code explanation

In the above running code, the main thing to fill in is to fill in your data directory below.

`true_data = pd.read_csv('') # 填你自己的数据地址`

Here it is defined that test_data_size refers to how many pieces of past data are used to predict the future prediction value.

test_size and train_size represent the size division of the training set and test set.

```
# 定义窗口大小
test_data_size = 350
# 训练集和测试集的尺寸划分
test_size = 0.15
train_size = 0.85
```

The operation of the data loader is defined here

```
def create_inout_sequences(input_data, tw):
inout_seq = []
L = len(input_data)
for i in range(L - tw):
train_seq = input_data[i:i + tw]
if (i + tw + 4) > len(input_data):
break
train_label = input_data[i + tw:i + tw + 1]
inout_seq.append((train_seq, train_label))
return inout_seq
train_window = 350
# 定义训练器的的输入
train_inout_seq = create_inout_sequences(train_data_normalized, train_window)
```

The model itself contains an LSTM layer and a fully connected layer to output the results.

```
class LSTM(nn.Module):
def __init__(self, input_dim=1, hidden_dim=350, output_dim=1):
super(LSTM, self).__init__()
self.hidden_dim = hidden_dim
self.lstm = nn.LSTM(input_dim, hidden_dim, batch_first=True)
self.fc = nn.Linear(hidden_dim, output_dim)
def forward(self, x):
x = x.unsqueeze(1)
h0_lstm = torch.zeros(1, self.hidden_dim).to(x.device)
c0_lstm = torch.zeros(1, self.hidden_dim).to(x.device)
out, _ = self.lstm(x, (h0_lstm, c0_lstm))
out = out[:, -1]
out = self.fc(out)
return out
```

The model loss function and optimizer are defined here, where Train represents whether to perform training.

```
lstm_model = LSTM()
loss_function = nn.MSELoss()
optimizer = torch.optim.Adam(lstm_model.parameters(), lr=0.001)
epochs = 20
Train = False # 训练还是预测
```

When Train is True, training starts and it is best to save the model locally. After training the model, we set Train to False and start evaluation mode.

```
if Train:
losss = []
lstm_model.train() # 训练模式
for i in range(epochs):
start_time = time.time() # 计算起始时间
for seq, labels in train_inout_seq:
lstm_model.train()
optimizer.zero_grad()
y_pred = lstm_model(seq)
single_loss = loss_function(y_pred, labels)
losss.append(single_loss.detach().numpy())
single_loss.backward()
optimizer.step()
if i % 1 == 1:
print(f'epoch: {i:3} loss: {single_loss.item():10.8f}')
torch.save(lstm_model.state_dict(), 'save_model.pth')
print(f"模型已保存,用时:{(time.time() - start_time) / 60:.4f} min")
plt.plot(losss)
# 设置图表标题和坐标轴标签
plt.title('Training Error')
plt.xlabel('Epoch')
plt.ylabel('Error')
# 保存图表到本地
plt.savefig('training_error.png')
else:
# 加载模型进行预测
lstm_model.load_state_dict(torch.load('save_model.pth'))
lstm_model.eval() # 评估模式
results = []
losss = []
test_inout_seq = create_inout_sequences(test_data_normalized, train_window)
for seq, labels in test_inout_seq:
# 这里的pred = lstm_model(seq)[0].item()是结果的输出这里只保存了输出的第一个值如果想拿出更多的预测值可以修改数字0用切片的方式
pred = lstm_model(seq)[0].item()
results.append(pred)
mae = calculate_mae(pred, labels.detach().numpy()) # MAE误差计算绝对值(预测值 - 真实值)
losss.append(mae)
print("模型预测结果：", results)
print("预测误差MAE:", losss)
```

Later, I will also talk about some of the latest prediction models, including Informer, TPA-LSTM, ARIMA, XGBOOST, Holt-winter, moving average method, and a series of time series prediction models, including models in the direction of deep learning and machine learning. Speaking of which, you can choose a model that suits you for prediction according to your needs. If necessary, you can + follow it, including my own code for this model. If you need it, I will also release the Baidu Netdisk download link! !

An explanation of other time series forecasting models!

--------------------------------------------------------MTS-Mixers---------------------------------------------------------

--------------------------------------------------------Holt-Winters--------------------------------------------------------

**If you don’t understand something, you can leave a message in the comment area and report some errors. You can discuss it and I will also tell you how to solve it!**