[Numerical prediction case] (7) CNN-LSTM hybrid neural network temperature prediction, complete with TensorFlow code

Hello everyone, today I will share with you how to use Tensorflow to build a hybrid neural network model combining CNN convolutional neural network and LSTM recurrent neural network to complete multi-feature time series prediction.


The main structure of the prediction model in this paper is composed of CNN and LSTM neural network. The characteristic data of air temperature is spatially dependent. This paper chooses to extract the spatial relationship between features by using a CNN convolutional neural network in the model front-end . At the same time, the temperature data has obvious time dependence, so the LSTM long and short-term memory model is added after the convolutional neural network for time series processing .


1. Get the dataset

Self-fetched dataset: https://download.csdn.net/download/dgvv4/49801464

This article uses GPU to accelerate the calculation. Friends who do not have a GPU can delete the code that calls the GPU below.

The last 5 features in the dataset are temperature data, the first three are time, and the 'actual' column is selected as the label. The task requires that according to the temperature data of 10 consecutive days, predict the actual temperature value after 5 days

import tensorflow as tf
from tensorflow import keras
from tensorflow.keras import layers
import pandas as pd

# 调用GPU加速
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

# --------------------------------------- #
#(1)获取数据集
# --------------------------------------- #
filepath = 'temps.csv'  # 数据集位置
data = pd.read_csv(filepath)
print(data.head())  # 查看前五行数据

The dataset information is as follows:


2. Processing time data

As shown in the figure above, the first three columns of the list are time information. You need to combine the year, month, and day information to convert from string type to datetime type. When selecting features, the time feature is not used, and it is processed first.

import datetime  # 将时间信息组合成datetime类型的数据

# 获取数据中的年月日信息
years = data['year']
months = data['month']
days = data['day']

dates = []  # 存放组合后的时间信息

# 遍历一一对应的年月日信息
for year, month, day in zip(years, months, days):
    # 年月日之间使用字符串拼接
    date = str(year) + '-' + str(month) + '-' + str(day)
    # 将每个时间(字符串类型)保存
    dates.append(date)

# 字符串类型的时间转为datetime类型的时间
times = []

# 遍历所有的字符串类型的时间
for date in dates:
    # 转为datetime类型
    time = datetime.datetime.strptime(date, '%Y-%m-%d')
    # 逐一保存转变类型后的时间数据
    times.append(time)

# 查看转换后的时间数据
print(times[:5])

The processed temporal characteristics are as follows


3. Feature data visualization

Draw the distribution curve of each feature and have an intuitive understanding of the data

import matplotlib.pyplot as plt
# 指定绘图风格
plt.style.use('fivethirtyeight')
# 设置画布,2行2列的画图窗口,第一行画ax1和ax2,第二行画ax3和ax4
fig, ((ax1,ax2), (ax3,ax4)) = plt.subplots(2, 2, figsize=(20, 10))
 
# ==1== actual特征列
ax1.plot(times, data['actual'])
# 设置x轴y轴标签和title标题
ax1.set_xlabel(''); ax1.set_ylabel('Temperature'); ax1.set_title('actual temp')
# ==2== 前一天的温度
ax2.plot(times, data['temp_1'])
# 设置x轴y轴标签和title标题
ax2.set_xlabel(''); ax2.set_ylabel('Temperature'); ax2.set_title('temp_1')
# ==3== 前2天的温度
ax3.plot(times, data['temp_2'])
# 设置x轴y轴标签和title标题
ax3.set_xlabel('Date'); ax3.set_ylabel('Temperature'); ax3.set_title('temp_2')
# ==4== friend
ax4.plot(times, data['friend'])
# 设置x轴y轴标签和title标题
ax4.set_xlabel('Date'); ax4.set_ylabel('Temperature'); ax4.set_title('friend')
# 轻量化布局调整绘图
plt.tight_layout(pad=2)
plt.show()

Simply plot the distribution of the four features over time


4. Data preprocessing

First of all, there is a column in the data that is classified data, the day of the week, this data needs to be one-hot encoded , and a feature column is added to each classification. If this row of data is Monday, then the value of the corresponding Monday column is 1, the value of other columns is 0.

Then choose the label value to predict the temperature in 5 days. The label data takes the 'label' column in the feature data , and moves this column collectively up by 5 rows , then at this time, the last 5 rows in the label data will have a vacancy value nan . The feature data of the last 5 lines has no corresponding label value, so the last 5 lines need to be deleted from the feature data and label data.

The next step is to standardize all the temperature data , and do not standardize the onehot encoded weekly data .

# 选择特征, 共6列特征
feats = data.iloc[:,3:]
# 对离散的星期几的数据进行onehot编码
feats = pd.get_dummies(feats)
# 特征列增加到12项
print(feats.shape)  # (348, 12)

# 选择标签数据,一组时间序列预测5天后的真实气温
pre_days = 5
# 选择特征数据中的真实气温'actual'具体向上移动5天的气温信息
targets = feats['actual'].shift(-pre_days)
# 查看标签信息
print(targets.shape)  #(348,)

# 由于特征值最后5行对应的标签是空值nan,将最后5行特征及标签删除
feats = feats[:-pre_days]
targets = targets[:-pre_days]
# 查看数据信息
print('feats.shape:', feats.shape, 'targets.shape:', targets.shape)  # (343, 12) (343,)


# 特征数据标准化处理
from sklearn.preprocessing import StandardScaler  
# 接收标准化方法
scaler = StandardScaler()
# 对特征数据中所有的数值类型的数据进行标准化
feats.iloc[:,:5] = scaler.fit_transform(feats.iloc[:,:5])
# 查看标准化后的信息
print(feats)

The preprocessed data is as follows:


5. Time series sliding window

The queue deque is used here , first in first out . The maximum length of the specified queue is 10, that is, the length of the time series window is 10 , and the temperature after 5 days is predicted based on the characteristic data of 10 days. If the length of the queue exceeds 10, the queue will automatically delete the element at the head and append the new element to the tail of the queue to form a new sequence .

For label data , for example, the feature of range(0,10) days predicts the temperature on the 15th day, and the label value has been collectively moved up by 5 rows before, then the 15th degree and temperature label value corresponds to index 9 , ie [max_series_days-1 ]

import numpy as np
from collections import deque  # 队列,可在两端增删元素

# 将特征数据从df类型转为numpy类型
feats = np.array(feats)

# 定义时间序列窗口是连续10天的特征数据
max_series_days = 10  
# 创建一个队列,队列的最大长度固定为10
deq = deque(maxlen=max_series_days)  # 如果长度超出了10,先从队列头部开始删除

# 创建一个列表,保存处理后的特征序列
x = []
# 遍历每一行数据,包含12项特征
for i in feats:
    # 将每一行数据存入队列中, numpy类型转为list类型
    deq.append(list(i))
    # 如果队列长度等于指定的序列长度,就保存这个序列
    # 如果队列长度大于序列长度,队列会自动删除头端元素,在尾端追加新元素
    if len(deq) == max_series_days:
        # 保存每一组时间序列, 队列类型转为list类型
        x.append(list(deq))

# 保存与特征对应的标签值
y = targets[max_series_days-1:].values

# 保证序列长度和标签长度相同
print(len(x))   # 334
print(len(y))  # 334

# 将list类型转为numpy类型
x, y = np.array(x), np.array(y)

The left image x is the 10 rows of feature data contained in a time series sliding window , and the right image label y is a temperature label value corresponding to each time series sliding window .


6. Divide the dataset

The first 80 data are taken from the divided sequences for training, and the remaining 20% ​​are used for validation and testing respectively. Shuffle() the training data randomly to avoid chance. Construct an iterator iter() , combine with the next() function to take out a batch from the training set , and view the data set information.

total_num = len(x)  # 一共有多少组序列
train_num = int(total_num*0.8)  # 前80%的数据用来训练
val_num = int(total_num*0.9)  # 前80%-90%的数据用来训练验证
# 剩余数据用来测试

x_train, y_train = x[:train_num], y[:train_num]  # 训练集
x_val, y_val = x[train_num: val_num], y[train_num: val_num]  # 验证集
x_test, y_test = x[val_num:], y[val_num:]  # 测试集

# 构造数据集
batch_size = 128  # 每次迭代处理128个序列
# 训练集
train_ds = tf.data.Dataset.from_tensor_slices((x_train, y_train))
train_ds = train_ds.batch(batch_size).shuffle(10000)
# 验证集
val_ds = tf.data.Dataset.from_tensor_slices((x_val, y_val))
val_ds = val_ds.batch(batch_size)
# 测试集
test_ds = tf.data.Dataset.from_tensor_slices((x_test, y_test))
test_ds = test_ds.batch(batch_size)

# 查看数据集信息
sample = next(iter(train_ds))  # 取出一个batch的数据
print('x_train.shape:', sample[0].shape)  # (128, 10, 12)
print('y_train.shape:', sample[1].shape)  # (128,)

7. Network Construction

The shape of the input layer is [None, 10, 12] , where None means Batch_Size does not need to be written out, 10 means the size of the time series window, and 12 means the number of features in the data.

Since I use a two-dimensional convolution Conv2D here , I need to add the channel dimension to the input data set , and the shape becomes [None, 10, 12, 1] , similar to image processing, using 3*3 convolution with a stride of 1, Extract features

Downsampling using pooling layers, and of course convolutional layers with stride 2 is also possible . When downsampling, the pooling kernel size I choose is [1,2] , that is, only up and down sampling in the feature dimension, the sequence window remains unchanged , and the specific problems of the downsampling method are analyzed in detail.

Before inputting the feature data from CNN to LSTM, it is necessary to adjust the number of channels , fuse the channel information, reduce the number of channels to 1, and then squeeze out the channel dimension , and the shape changes from four-dimensional to three-dimensional, [None,10,6,1] becomes [None,10,6]

The next step is to process the time series information of the data through LSTM , and finally output the prediction result through a fully connected layer. The number of neurons in the fully connected layer should be the same as the number of predicted label values. Here, only a certain time point in the future is predicted, and the number of neurons is 1.

# 输入层要和x_train的shape一致,但注意不要batch维度
input_shape = sample[0].shape[1:]  # [10,12]

# 构造输入层
inputs = keras.Input(shape=(input_shape))  # [None, 10, 12]

# 调整维度 [None,10,12]==>[None,10,12,1]
x = layers.Reshape(target_shape=(inputs.shape[1], inputs.shape[2], 1))(inputs)

# 卷积+BN+Relu  [None,10,12,1]==>[None,10,12,8]
x = layers.Conv2D(8, kernel_size=(3,3), strides=1, padding='same', use_bias=False,
                  kernel_regularizer=keras.regularizers.l2(0.01))(x)

x = layers.BatchNormalization()(x)  # 批标准化
x = layers.Activation('relu')(x)  # relu激活函数

# 池化下采样 [None,10,12,8]==>[None,10,6,8]
x = layers.MaxPool2D(pool_size=(1,2))(x)

# 1*1卷积调整通道数 [None,10,6,8]==>[None,10,6,1]
x = layers.Conv2D(1, kernel_size=(3,3), strides=1, padding='same', use_bias=False,
                  kernel_regularizer=keras.regularizers.l2(0.01))(x)

# 把最后一个维度挤压掉 [None,10,6,1]==>[None,10,6]
x = tf.squeeze(x, axis=-1)

# [None,10,6] ==> [None,10,16]
# 第一个LSTM层, 如果下一层还是LSTM层就需要return_sequences=True, 否则就是False
x = layers.LSTM(16, activation='relu', kernel_regularizer=keras.regularizers.l2(0.01))(x)
x = layers.Dropout(0.2)(x)  # 随机杀死神经元防止过拟合

# 输出层 [None,16]==>[None,1]
outputs = layers.Dense(1)(x)

# 构建模型
model = keras.Model(inputs, outputs)

# 查看模型架构
model.summary()

The network architecture is as follows:

Model: "model_12"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
input_13 (InputLayer)        [(None, 10, 12)]          0         
_________________________________________________________________
reshape_12 (Reshape)         (None, 10, 12, 1)         0         
_________________________________________________________________
conv2d_25 (Conv2D)           (None, 10, 12, 8)         72        
_________________________________________________________________
batch_normalization_13 (Batc (None, 10, 12, 8)         32        
_________________________________________________________________
activation_13 (Activation)   (None, 10, 12, 8)         0         
_________________________________________________________________
max_pooling2d_13 (MaxPooling (None, 10, 6, 8)          0         
_________________________________________________________________
conv2d_26 (Conv2D)           (None, 10, 6, 1)          72        
_________________________________________________________________
tf.compat.v1.squeeze_12 (TFO (None, 10, 6)             0         
_________________________________________________________________
lstm_11 (LSTM)               (None, 16)                1472      
_________________________________________________________________
dropout_14 (Dropout)         (None, 16)                0         
_________________________________________________________________
dense_13 (Dense)             (None, 1)                 17        
=================================================================
Total params: 1,665
Trainable params: 1,649
Non-trainable params: 16
_________________________________________________________________

8. Model training

Use the mean absolute error between the predicted value and the true value calculated by the regression as the loss function, and use the log mean squared error between the predicted value and the true value as the monitoring indicator during training

# 网络编译
model.compile(optimizer = keras.optimizers.Adam(0.001),  # adam优化器学习率0.001
              loss = tf.keras.losses.MeanAbsoluteError(),  # 标签和预测之间绝对差异的平均值
              metrics = tf.keras.losses.MeanSquaredLogarithmicError())  # 计算标签和预测之间的对数误差均方值。

epochs = 300  # 迭代300次

# 网络训练, history保存训练时的信息
history = model.fit(train_ds, epochs=epochs, validation_data=val_ds)

Since the history records the loss information and index information contained in each iteration during network training, visualize them

history_dict = history.history  # 获取训练的数据字典
train_loss = history_dict['loss']  # 训练集损失
val_loss = history_dict['val_loss']  # 验证集损失
train_msle = history_dict['mean_squared_logarithmic_error']  # 训练集的百分比误差
val_msle = history_dict['val_mean_squared_logarithmic_error']  # 验证集的百分比误差
 
#(11)绘制训练损失和验证损失
plt.figure()
plt.plot(range(epochs), train_loss, label='train_loss')  # 训练集损失
plt.plot(range(epochs), val_loss, label='val_loss')  # 验证集损失
plt.legend()  # 显示标签
plt.xlabel('epochs')
plt.ylabel('loss')
plt.show()
 
#(12)绘制训练百分比误差和验证百分比误差
plt.figure()
plt.plot(range(epochs), train_msle, label='train_msle')  # 训练集指标
plt.plot(range(epochs), val_msle, label='val_msle')  # 验证集指标
plt.legend()  # 显示标签
plt.xlabel('epochs')
plt.ylabel('msle')
plt.show()


9. Prediction Phase

First, use evaluate() to calculate the loss and monitoring indicators for the entire test set , and use the predict() function to calculate the predicted air temperature through the eigenvalues ​​of the test set. Then draw a curve comparing the predicted value and the true value.

# 对整个测试集评估
model.evaluate(test_ds)

# 预测
y_pred = model.predict(x_test)

# 获取标签值对应的时间
df_time = times[-len(y_test):]

# 绘制对比曲线
fig = plt.figure(figsize=(10,5))  # 画板大小
ax = fig.add_subplot(111)  # 画板上添加一张图
# 绘制真实值曲线
ax.plot(df_time, y_test, 'b-', label='actual')
# 绘制预测值曲线
ax.plot(df_time, y_pred, 'r--', label='predict')
# 设置x轴刻度
ax.set_xticks(df_time[::7])

# 设置xy轴标签和title标题
ax.set_xlabel('Date')
ax.set_ylabel('Temperature'); 
ax.set_title('result')
plt.legend()
plt.show()

The predicted curve and the actual curve are compared as follows

Guess you like

Origin blog.csdn.net/dgvv4/article/details/124406664