- For time series data or images, different filters, kernel_size, and strides mean that different high-dimensional features are output.
- The same piece of data, how to enhance the expressive ability of the network. Can a wider network achieve better results than a deep network? This is the problem that this article needs to solve.
- For data description, please refer to the previous article https://www.jianshu.com/p/21b96d597367
Network structure
# 定义多通道特征组合模型
def build_multi_cr_lstm_model(ts, fea_dim):
# 定义输入
inputs = Input(shape = (ts, fea_dim))
# ########################################
# cnn层&lstm层1
cnn_left_out1 = Conv1D(filters=50, kernel_size=6, strides=3, kernel_initializer=he_normal(seed=3))(inputs)
act_left_out1 = LeakyReLU()(cnn_left_out1)
lstm_left_out1 = LSTM(64, activation='sigmoid', dropout=0.1, return_sequences=False,
kernel_initializer=he_normal(seed=10))(act_left_out1)
# #########################################
# cnn层&lstm层2
cnn_right_out1 = Conv1D(filters=50, kernel_size=12, strides=3, kernel_initializer=he_normal(seed=3))(inputs)
act_right_out1 = LeakyReLU()(cnn_right_out1)
lstm_right_out1 = LSTM(64, activation='sigmoid', dropout=0.1, return_sequences=False,
kernel_initializer=he_normal(seed=10))(act_right_out1)
# #########################################
# cnn层&lstm层3
cnn_mid_out1 = Conv1D(filters=50, kernel_size=6, strides=2, kernel_initializer=he_normal(seed=3))(inputs)
act_mid_out1 = LeakyReLU()(cnn_mid_out1)
lstm_mid_out1 = LSTM(64, activation='sigmoid', dropout=0.1, return_sequences=False,
kernel_initializer=he_normal(seed=10))(act_mid_out1)
# ############################################
# 上层叠加新的dense层
concat_output = Concatenate(axis=1)([lstm_left_out1, lstm_mid_out1, lstm_right_out1])
outputs = Dense(1)(concat_output)
model_func = Model(inputs=inputs, outputs=outputs)
model_func.compile(loss='mse', optimizer=Adam(lr=0.002, decay=0.01), metrics=['mse'])
return model_func
image.png
- It can be seen that inputs are used as input data for the three networks. CNN1+LSMT1, CNN2+LSTM2, CNN3+LSTM3 obtained 3 high-dimensional features respectively, and finally connected to the dense layer to output the prediction results
Comparison of prediction results with the results in the above article
-
baseline effect:
The normalized mse and original mse are 0.00167 and 1606 respectively
-
Complex network cnn+lstm single channel effect:
The normalized mse and original mse are 0.0090666 and 869 respectively
-
Complex network cnn+lstm multi-channel effect:
The normalized mse and original mse are 0.0008297 and 795 respectively
It can be seen that the effect of the complex network is extremely improved, mse is reduced by a full 50%, but the improvement of multi-channel is relatively small compared with single-channel. If time performance is pursued, the single channel of cnn+lsmt+dense may already be able to meet the demand in most cases
Conjecture on the reason for the improvement
- The cnn with different parameter settings obtains different high-dimensional feature expressions, which enriches the feature input of the model, and thus obtains a better prediction effect
Comparison of fitted curves
-
The first picture is the deep lstm network fitting diagram, the second picture is the cnn+lstm single-channel fitting diagram, and the third picture is the cnn+lstm multi-channel network fitting diagram.
image.png
image.png
image.png
Thinking
For different data, how to apply the same complex network is a question that needs to be considered in the next step. And the introduction of attention
Author: yangy_fly
link: https: //www.jianshu.com/p/c428efc6966e
Source: Jane books
are copyrighted by the author. For commercial reprints, please contact the author for authorization, and for non-commercial reprints, please indicate the source.