Detailed Explanation of ConvLSTM Space-Time Prediction Actual Combat Code

written in front

Spatiotemporal prediction is a problem in many fields. Unlike time series, spatiotemporal prediction not only needs to explore changes in time, but also needs to pay attention to changes in space. Many prediction problems only pay one-sided attention to time issues, such as predicting the probability of someone suffering from a certain disease in the next 3 years, the number of people eating in a cafeteria, etc., often ignoring space issues. For example, as a decision maker, I not only want to know the number of people who will be infected with the new crown tomorrow, but also want to know where these people will get sick, so as to accurately manage. In other words, policymakers pay more attention to the population level, while intervention and control are the main concerns of staff. The spatial problem can answer many people's last questions about the prediction problem, that is, if event A may happen, where will it happen?
In recent years, questions about spatiotemporal prediction have been stimulating scientists and scholars in the field of machine learning. Different from traditional statistical spatiotemporal prediction models, machine learning or deep learning has demonstrated its powerful advantages: nonlinear fitting, high-dimensional data processing capabilities, less worry about the impact of variable collinearity on the model, etc. At present, it has become possible to realize spatio-temporal prediction by extracting temporal and spatial features through long-term short-term memory network LSTM and convolutional neural network CNN respectively. This article is mainly based on a building block article "Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting" published on the arxiv platform in 2015 to realize the classic problem of spatio-temporal prediction-next frame video prediction. Of course, other machine learning spatio-temporal prediction models and spatio-temporal prediction problems are beyond the scope of this article. Interested students can try it by themselves.

1. Spatio-temporal prediction

Spatio-temporal prediction can be tried for many problems, such as the outbreak of the new crown epidemic, traffic flow, weather forecast, and regional risk prediction of chronic diseases. Note that, unlike ordinary LSTMs, spatio-temporal predictions still vary in position in other dimensions, which is a manifestation of space.
insert image description here
The formula principle of ConvLSTM mentioned in the paper has been discussed in detail on many blogs, so I won’t introduce too much here. Interested students can check it on the Internet or check the interpretation of the original literature.

2. Dataset selection and download

As in the paper, we also select the Moving-MNIST data set, which is the mobile MNIST data set. This public data set can be downloaded from the website provided by the University of Toronto. The Moving-MNIST data set is one of the commonly used data sets for spatiotemporal prediction. The data set download code is as follows:

import numpy as np
from tensorflow import keras
fpath = keras.utils.get_file(
    "moving_mnist.npy",
    "http://www.cs.toronto.edu/~nitish/unsupervised_video/mnist_test_seq.npy",
)
dataset = np.load(fpath)
print(dataset.shape)

After downloading the dataset, we output the shape of the dataset. The result is (20, 10000, 64, 64), which means that a seq has 20 pictures, the first ten frames are input, and the last ten frames are target.
insert image description here

3. Data set preprocessing and data set division

Since the input accepted by our ConvLSTM is (sanmples, seq, wide, height, channel), we need to transform the data set to meet the input requirements of the model.

# 转换数据集的seq和samples维度,便于输入我们的模型
dataset = np.swapaxes(dataset, 0, 1)
# 10000个样本太多,我们只选取1000个
dataset = dataset[:1000, ...]
# 我们此时是二维灰度图片,因此要增加一维,代表单通道,如果是彩色,则为3
dataset = np.expand_dims(dataset, axis=-1)
print(dataset.shape)

The shape of the converted data set is (1000, 20, 64, 64, 1), which already meets the input requirements of the model. The next step is to divide the data set. Here, the index must be disrupted to achieve random division of the training set and test set.

indexes = np.arange(dataset.shape[0])
np.random.shuffle(indexes)  # 打乱索引顺序
# 训练集:测试集=9:1
train_index = indexes[: int(0.9 * dataset.shape[0])]
val_index = indexes[int(0.9 * dataset.shape[0]):]
train_dataset = dataset[train_index]
val_dataset = dataset[val_index]
print(train_dataset.shape)
print(val_dataset.shape)

After dividing the data set, we divide by 255 to achieve normalization. The normalization must be completed after dividing the data set, otherwise it will lead to data leakage.

# 归一化,除255就是把3基色都调到0-1区间,得到绝对色彩信息
train_dataset = train_dataset / 255
val_dataset = val_dataset / 255

Separate x and y, according to the paper, we are the first 20 frames to predict the last 20 frames, similar to the figure below:
insert image description here
the code is as follows:

# 分离x和y,注意,此时的y是下一帧图像,既最后一个片子,我们用前20帧预测后20帧,既序号0-19
def create_shifted_frames(data):
    x = data[:, 0: data.shape[1] - 1, :, :]
    y = data[:, 1: data.shape[1], :, :]
    return x, y
x_train, y_train = create_shifted_frames(train_dataset)
x_val, y_val = create_shifted_frames(val_dataset)

4. Model Construction

# 模型构建核心代码,这里我们修改超参数与keras官方超参数一致
model = Sequential([
    keras.layers.ConvLSTM2D(filters=64, kernel_size=(5, 5),
                   input_shape=(None, 64, 64, 1),
                   padding='same', return_sequences=True),
    keras.layers.BatchNormalization(),
    keras.layers.ConvLSTM2D(filters=64, kernel_size=(3, 3),
                   padding='same', return_sequences=True),
    keras.layers.BatchNormalization(),
    keras.layers.ConvLSTM2D(filters=64, kernel_size=(1, 1),
                   padding='same', return_sequences=True),
    keras.layers.Conv3D(filters=1, kernel_size=(3, 3, 3),
               activation='sigmoid',
               padding='same', data_format='channels_last')
])
model.compile(loss='binary_crossentropy', optimizer='adadelta')
model.summary()

The model structure is as follows:
insert image description here

5. Model Training

After building the model, we started training. My computer graphics card is GTX1660ti with 6G memory, and the memory is limited. Therefore, the batch size is reduced and some epochs are appropriately increased. If the computing power allows, the epoch and batch size can be increased. The callback function for early termination of training and adjustment of learning rate is defined here, so there is no need to worry about the increase of invalid training time.

# 定义回调
early_stopping = keras.callbacks.EarlyStopping(monitor="val_loss", patience=10)
reduce_lr = keras.callbacks.ReduceLROnPlateau(monitor="val_loss", patience=5)

# 设置训练参数
epochs = 50
batch_size = 2

# 拟合模型.
model.fit(
    x_train,
    y_train,
    batch_size=batch_size,
    epochs=epochs,
    validation_data=(x_val, y_val),
    callbacks=[early_stopping, reduce_lr],
)
model.save('model.h5')

6. View the results

insert image description here

Different from natural language processing, spatio-temporal prediction cannot simply compare indicators such as accuracy and F1-score, but should compare the difference between the predicted result and the actual one. Judging from the results, our prediction results are rather vague. Therefore, consider using adam to optimize and adjust the number of convolution kernels to 30 and epoch to 15. The results are as follows: From the perspective of effect, the loss of the verification set is 0.0244. Continue to increase the epoch and find that although the cost function on the verification set is decreasing, the image effect is getting worse. Like the image in the paper, the effect can still be seen. It is still very poor. To be honest, it is really difficult to do well because the numbers are extremely nonlinear in the picture , it is difficult to learn all the rules of the movement, even if it is learned, the spatial information of a single word may be lost, so the effect becomes worse as it goes on
insert image description here
.
insert image description here
I have checked the solutions of some bloggers. Some suggest to use the SSIM (structural similarity) loss function, some suggest to reduce the learning rate, some suggest to use deconvolution, etc., but deep learning itself is an "alchemy" process. It is nothing more than adding some convolution kernels, or adding or reducing some convolution layers. The essence of ConvLSTM in this task is image generation. The hyperparameter adjustment of image generation is more sensitive. Since I have not installed the tensorflow_contrib library, I cannot use the SSIM loss function. If there are new ones in the future Progress will be posted as soon as possible.

7. Summary

At present, the use of space statistics in the medical field is more common, especially in the study of infectious diseases and environmental exposure factors. It can often be seen that the thesis author uses the time and space statistical model. The birth of CONLSTM and the improvement of the improvement of time and space prediction model based on machine learning in infectious diseases and environmental exposure factors can even be pushed. In addition, Convlstm can even be divided into points. Analysis of patients' video data, such as gait or risk matrix diagram of chronic diseases in a certain area. These areas are still blank. I believe that AI's assistance can promote the vigorous development of these fields.

Guess you like

Origin blog.csdn.net/JaysonWong/article/details/128373966