Fit LSTM Network/拟合LSTM网络
Next, we need to fit an LSTM network model to the training data.
接下来,我们需要将LSTM网络模型拟合到训练数据中。
This first requires that the training dataset be transformed from a 2D array [samples, features] to a 3D array [samples, timesteps, features]. We will fix time steps at 1, so this change is straightforward.
首先需要把训练数据集从二维数组【样品,特征】转换到三维数组【样品,时间步,特征】,我们将把时间步固定为1,所以这个改变非常简单。
Next, we need to design an LSTM network. We will use a simple structure with 1 hidden layer with 1 LSTM unit, then an output layer with linear activation and 3 output values. The network will use a mean squared error loss function and the efficient ADAM optimization algorithm.
接下来,我们需要设计一个LSTM网络。我们将使用一个具有1个隐藏层和1个LSTM单元的简单结构,然后是一个具有线性激活和3个输出值的输出层。网络将使用均方误差损失函数和高效的ADAM优化算法。
The LSTM is stateful; this means that we have to manually reset the state of the network at the end of each training epoch. The network will be fit for 1500 epochs.
LSTM是有状态的;这意味着我们必须在每个训练epoch结束时手动重置网络状态。该网络将适合1500个epochs。
The same batch size must be used for training and prediction, and we require predictions to be made at each time step of the test dataset. This means that a batch size of 1 must be used. A batch size of 1 is also called online learning as the network weights will be updated during training after each training pattern (as opposed to mini batch or batch updates).
必须使用相同的batch size进行训练和预测,并且我们需要在测试数据集的每个时间步进行预测。这意味着必须使用批量大小1。批量大小1也称为在线学习,因为网络权重将在每个训练对(而不是小批量或批量更新)之后的训练期间更新。
We can put all of this together in a function called fit_lstm(). The function takes a number of key parameters that can be used to tune the network later and the function returns a fit LSTM model ready for forecasting.
我们可以将所有这些放在一个名为fit_lstm()的函数中。该函数需要一些关键参数,可用于稍后调整网络,并且该函数返回一个准备好的用于预测的拟合的LSTM模型。
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
|
# fit an LSTM network to training data
def
fit_lstm
(
train
,
n_lag
,
n_seq
,
n_batch
,
nb_epoch
,
n_neurons
)
:
# reshape training into [samples, timesteps, features]
X
,
y
=
train
[
:
,
0
:
n_lag
]
,
train
[
:
,
n_lag
:
]
X
=
X
.
reshape
(
X
.
shape
[
0
]
,
1
,
X
.
shape
[
1
]
)
# design network
model
=
Sequential
(
)
model
.
add
(
LSTM
(
n_neurons
,
batch_input_shape
=
(
n_batch
,
X
.
shape
[
1
]
,
X
.
shape
[
2
]
)
,
stateful
=
True
)
)
model
.
add
(
Dense
(
y
.
shape
[
1
]
)
)
model
.
compile
(
loss
=
'mean_squared_error'
,
optimizer
=
'adam'
)
# fit network
for
i
in
range
(
nb_epoch
)
:
model
.
fit
(
X
,
y
,
epochs
=
1
,
batch_size
=
n_batch
,
verbose
=
0
,
shuffle
=
False
)
model
.
reset_states
(
)
return
model
|
The function can be called as follows:
1
2
|
# fit model
model
=
fit_lstm
(
train
,
1
,
3
,
1
,
1500
,
1
)
|
The configuration of the network was not tuned; try different parameters if you like.
网络的配置没有调整; 如果你喜欢,尝试不同的参数如果你喜欢。
Report your findings in the comments below. I’d love to see what you can get.
在下面的评论中报告你的发现。 我很想看看你能得到什么。
机器学习-Keras之stateful LSTM全面解析+实例测试
Keras中的stateful LSTM可以说是所有学习者的梦魇,令人混淆的机制,说明不到位的文档,中文资料的匮乏。注意,此处的状态表示的是原论文公式里的c,h,即LSTM特有的一些记忆参数,并非w权重。在stateless时,长期记忆网络并不意味着你的LSTM将记住之前 batch 的内容。当我们在默认状态 stateless 下,Keras会在训练每个sequence小序列(=sample)开始时,将LSTM网络中的记忆状态参数reset初始化(指的是 c,h 而并非权重 w ),即调用 model.reset_states() 。为啥stateless LSTM每次训练都要初始化记忆参数? 因为Keras在训练时会默认地 shuffle samples ,所以导致 sequence 之间的依赖性消失, sample 和 sample 之间就没有时序关系,顺序被打乱,这时记忆参数在 batch 、小序列之间进行传递就没意义了,所以Keras要把记忆参数初始化。无论是stateful还是stateless,都是在模型接受一个 batch 后,计算每个sequence的输出,然后平均它们的梯度,反向传播更新所有的各种参数。