With Keras Python to understand the state of LSTM recurrent neural network

This article is a translation of the great God [Jason Brownlee PhD] Bowen, the original address in here .

After reading this article, you'll know:

如何为序列预测问题开发朴素的LSTM网络。
如何使用LSTM网络通过批处理和功能仔细管理状态。
如何在LSTM网络中手动管理状态以进行状态预测。
在我的新书中，用几行代码，通过18个循序渐进的教程和9个项目，探索如何为一系列预测建模问题开发深度学习模型。

Problem Description: Learning alphabet

In this tutorial, we will develop and compare many different LSTM recurrent neural network model.

These comparative context would be a simple series forecasting problems learning letters. That is, given a letter, to predict the next letter of the alphabet. This is a simple series forecasting problems, once understood, can be extended to other series forecasting problems, such as time series prediction and sequence classification. Let's use some python code can be reused between examples to prepare questions. First, let's import all classes and functions we plan to use in this tutorial.

import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils

Next, we can provide seed for the random number generator to ensure that the code is executed each time the results are the same.

# fix random seed for reproducibility
numpy.random.seed(7)

Now, we can define a dataset that letter. For ease of reading, we will define the letters to uppercase. Digital neural network modeling, so we need to map the letters is an integer. We can create an index by letter to the dictionary characters (map) to easily accomplish this. We can also create a reverse lookup to convert back into character prediction for later use.

# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))

Now, we need to create inputs and outputs, on which to train our neural network. We can define the length of the input sequence, and then read in sequence from the input letter sequence to do this. For example, we use an input length. From the beginning of the original input data, we can read the first letter "A" as the prediction and the next letter "B". We predict the movement and repeat until the "Z" along a character.

# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)

We will also print input for integrity checking. This will run code to generate the following output, a summary of the length of the input sequence and outputs a single character.

A -> B
B -> C
C -> D
D -> E
E -> F
F -> G
G -> H
H -> I
I -> J
J -> K
K -> L
L -> M
M -> N
N -> O
O -> P
P -> Q
Q -> R
R -> S
S -> T
T -> U
U -> V
V -> W
W -> X
X -> Y
Y -> Z

We need to NumPy array remodeling LSTM network desired format, namely: [ the Samples, Time Steps, Features ].

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

After shaping, we can enter an integer normalized to the range 0 to 1, i.e., the use of S-shaped network LSTM range of activation functions.

# normalize
X = X / float(len(alphabet))

Finally, we can use this issue as a sequence classification task, where each letter represents 26 letters in a different class. Thus, we can use the built-in function Keras to_categorical () output (y) is converted to one-hot encoding.

# one hot encode the output variable
y = np_utils.to_categorical(dataY)

For a character to learn a simple character mapping LSTM

Let's design a simple LSTM start to learn how to predict the next character in the alphabet in the given context of a character. We will issue constructed as a single letter to a random set of input of a single letter output. As we will see, this is LSTM learning difficult. We define a network LSTM 32 units and an output layer having a softmax activation function has to be predicted. Because this is a multi-class classification problem, so we can use a logarithmic loss function (called "categorical_crossentropy" in Keras in), and using ADAM optimization function to optimize the network. The model for 500 era, the batch size is 1.

# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)

After fitting model, we can assess and summarize the performance of the entire training data set.

# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))

Then, we can re-run the network training data and produces forecasts, the input and output of characters are converted back to its original format to intuitively understand the extent of the network to understand the problem.

# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

The complete code implementation is as follows:

# Naive LSTM to learn one-char to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

Performs output results are as follows:

Model Accuracy: 84.00%
['A'] -> B
['B'] -> C
['C'] -> D
['D'] -> E
['E'] -> F
['F'] -> G
['G'] -> H
['H'] -> I
['I'] -> J
['J'] -> K
['K'] -> L
['L'] -> M
['M'] -> N
['N'] -> O
['O'] -> P
['P'] -> Q
['Q'] -> R
['R'] -> S
['S'] -> T
['T'] -> U
['U'] -> W
['V'] -> Y
['W'] -> Z
['X'] -> Z
['Y'] -> Z

We can see that the network is indeed difficult to solve this problem. The reason is that there is no means inferior LSTM context may be used. Each input - output mode are displayed in a random order to the network, and in each mode (in each batch mode, each batch comprising a) reset the network state. This is an abuse of LSTM network architecture, it is treated like Standard Multilayer Perceptron same. Next, let's try to define the problem differently, in order to provide more learning order for the network.

LSTM a simple three-character function window to a character mapping

Add more context for the data multilayer perceptron popular method is to use a window method. This is where the sequence preceding step as another input function provided by the network. We can try to use the same technique to provide more context for the LSTM network. Here, the sequence length will be from 1 to 3, for example:

# prepare the dataset of input to output pairs encoded as integers
seq_length = 3

Training data examples are as follows:

ABC -> D
BCD -> E
CDE -> F

Then, each element in the sequence as a new function to the network input. This requires modifying the input sequence in a manner remodeling data preparation steps:

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), 1, seq_length))

It also needs to be modified in order to prove that at the time predicted from the model of how to reshape sample pattern.

x = numpy.reshape(pattern, (1, 1, len(pattern)))

Complete code implementation is as follows:

# Naive LSTM to learn three-char window to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), 1, seq_length))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, 1, len(pattern)))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

Performs output results are as follows:

Model Accuracy: 86.96%
['A', 'B', 'C'] -> D
['B', 'C', 'D'] -> E
['C', 'D', 'E'] -> F
['D', 'E', 'F'] -> G
['E', 'F', 'G'] -> H
['F', 'G', 'H'] -> I
['G', 'H', 'I'] -> J
['H', 'I', 'J'] -> K
['I', 'J', 'K'] -> L
['J', 'K', 'L'] -> M
['K', 'L', 'M'] -> N
['L', 'M', 'N'] -> O
['M', 'N', 'O'] -> P
['N', 'O', 'P'] -> Q
['O', 'P', 'Q'] -> R
['P', 'Q', 'R'] -> S
['Q', 'R', 'S'] -> T
['R', 'S', 'T'] -> U
['S', 'T', 'U'] -> V
['T', 'U', 'V'] -> Y
['U', 'V', 'W'] -> Z
['V', 'W', 'X'] -> Z
['W', 'X', 'Y'] -> Z

We can see that, to enhance the rate of performance may not, or may not. This is a simple question, even with the window method, we still can not learn by LSTM. Again, this is a poor framework on the issue of abuse of LSTM network. In fact, the sequence of letters is a characteristic time step, rather than a single time step length features. We provide more context for the network, but not as there is more order as expected.

In the next section, we will be in the form of time step to provide more background information for the network.

LSTM simple steps for a three-character time window to a character mapping

In Keras in, LSTM intended use is in the form of time steps provide context, rather than the other networks offer the same type of window function. We can give a first example, the sequence length simply changed from 1 to 3.

seq_length = 3
输入-输出样例如下所示：
ABC -> D
BCD -> E
CDE -> F
DEF -> G

Except that the remodeling of the input data sequence as a time series of a feature step, rather than a single time step of the plurality of elements.

# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))

Complete code implementation is as follows:

# Naive LSTM to learn three-char time steps to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 3
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=1, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

Performs output results are as follows:

Model Accuracy: 100.00%
['A', 'B', 'C'] -> D
['B', 'C', 'D'] -> E
['C', 'D', 'E'] -> F
['D', 'E', 'F'] -> G
['E', 'F', 'G'] -> H
['F', 'G', 'H'] -> I
['G', 'H', 'I'] -> J
['H', 'I', 'J'] -> K
['I', 'J', 'K'] -> L
['J', 'K', 'L'] -> M
['K', 'L', 'M'] -> N
['L', 'M', 'N'] -> O
['M', 'N', 'O'] -> P
['N', 'O', 'P'] -> Q
['O', 'P', 'Q'] -> R
['P', 'Q', 'R'] -> S
['Q', 'R', 'S'] -> T
['R', 'S', 'T'] -> U
['S', 'T', 'U'] -> V
['T', 'U', 'V'] -> W
['U', 'V', 'W'] -> X
['V', 'W', 'X'] -> Y
['W', 'X', 'Y'] -> Z

We can see an example of model evaluation and prediction proves that the model can perfectly learning problems. But it has learned a more simple question. Specifically, it was learned that the sequence predicted from the three-letter alphabet of the next letter. Alphabet may be displayed in any random sequence of three letters, and predict the next letter. It actually can not be enumerated letters. I hope to have a large enough window Multilayer Perceptron network can use the same method to learn the map. LSTM network is stateful. They should be able to learn the entire sequence of letters, but by default, Keras realization will reset the network status after each batch training.

LSTM state of Batch

After each batch of the batch, LSTM of Keras realization will reset the network state. This shows that if we batch size sufficient to accommodate all input modes, and all input modes are sorted in order, you can use the context LSTM batch sequence to better learn the sequence. The first example of learning through one mapping changes and for increased batch size from the size of a training data set, we can easily prove it. In addition, Keras will be on the training data set shuffled before each training epoch. In order to ensure the training data mode to maintain order, we can disable the shuffle.

model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False)

Mapping the network will use the batch learning sequence of characters, but during the prediction, the context will not be available to the network. We can assess the ability of the network to predict random and sequential.

Overall code implementation is as follows:

# Naive LSTM to learn one-char to one-char mapping with all data in each batch
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# convert list of lists to array and pad sequences if needed
X = pad_sequences(dataX, maxlen=seq_length, dtype='float32')
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (X.shape[0], seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
model = Sequential()
model.add(LSTM(16, input_shape=(X.shape[1], X.shape[2])))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=5000, batch_size=len(dataX), verbose=2, shuffle=False)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for pattern in dataX:
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)
# demonstrate predicting random patterns
print("Test a Random Pattern:")
for i in range(0,20):
	pattern_index = numpy.random.randint(len(dataX))
	pattern = dataX[pattern_index]
	x = numpy.reshape(pattern, (1, len(pattern), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

Performs output results are as follows:

Model Accuracy: 100.00%
['A'] -> B
['B'] -> C
['C'] -> D
['D'] -> E
['E'] -> F
['F'] -> G
['G'] -> H
['H'] -> I
['I'] -> J
['J'] -> K
['K'] -> L
['L'] -> M
['M'] -> N
['N'] -> O
['O'] -> P
['P'] -> Q
['Q'] -> R
['R'] -> S
['S'] -> T
['T'] -> U
['U'] -> V
['V'] -> W
['W'] -> X
['X'] -> Y
['Y'] -> Z
Test a Random Pattern:
['T'] -> U
['V'] -> W
['M'] -> N
['Q'] -> R
['D'] -> E
['V'] -> W
['T'] -> U
['U'] -> V
['J'] -> K
['F'] -> G
['N'] -> O
['B'] -> C
['M'] -> N
['F'] -> G
['F'] -> G
['P'] -> Q
['A'] -> B
['K'] -> L
['W'] -> X
['E'] -> F

As we expected, the network can be used within the context of the learning sequence of letters, so as to achieve 100% accuracy on the training data. Importantly, the network can accurately predict the next letter of the alphabet randomly selected character. Very impressive.

Stateful LSTM, for mapping from one character to the character 1

We have seen that we can be the raw data into a sequence of fixed size, and LSTM can learn this representation, but can only learn three random characters to map a character. We also see that we can make the batch size metamorphosis, in order to provide more series for the network, but only during the training. Ideally, we want to expose the network to the entire sequence, and let it learn interdependence, rather than explicitly define those dependencies in the framework of issues. We can do it in Keras in this way is to make LSTM layer become state, and the end of the period (that is, the end of the training sequence) to manually reset the status of the network.

This is really going to LSTM way to use the network. We first need to LSTM layer is defined as a state. In doing so, we must explicitly specify the size of the batch size as the input shape. This also means that when we evaluate the network or make predictions, we must also specify and comply with the same batch size. Now this is not a problem, because we are using the batch size is 1. This may be the size of the batch difficult to predict when the case 1 is not, because of the need to predict and order in batches.

batch_size = 1
model.add(LSTM(50, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))

There is an important difference between training status LSTM, we manually once a training period, and the reset state after each period. We can do this in a for loop. Again, we do not shuffle input, but to keep the created order of input training data.

for i in range(300):
	model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
	model.reset_states()

As described above, the specified batch size we evaluate network performance over the entire training data set.

# summarize performance of the model
scores = model.evaluate(X, y, batch_size=batch_size, verbose=0)
model.reset_states()
print("Model Accuracy: %.2f%%" % (scores[1]*100))

Finally, we can prove that the network really learned the whole alphabet. We can use the first letter "A" as a seed, a request for prediction, the prediction is fed back as an input, until the "Z" whole process is repeated.

# demonstrate some model predictions
seed = [char_to_int[alphabet[0]]]
for i in range(0, len(alphabet)-1):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()

We can also see whether the network can be predicted from the beginning of any letter.

# demonstrate a random starting point
letter = "K"
seed = [char_to_int[letter]]
print("New start: ", letter)
for i in range(0, 5):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()

Complete code implementation is as follows:

# Stateful LSTM to learn one-char to one-char mapping
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
seq_length = 1
dataX = []
dataY = []
for i in range(0, len(alphabet) - seq_length, 1):
	seq_in = alphabet[i:i + seq_length]
	seq_out = alphabet[i + seq_length]
	dataX.append([char_to_int[char] for char in seq_in])
	dataY.append(char_to_int[seq_out])
	print(seq_in, '->', seq_out)
# reshape X to be [samples, time steps, features]
X = numpy.reshape(dataX, (len(dataX), seq_length, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
batch_size = 1
model = Sequential()
model.add(LSTM(50, batch_input_shape=(batch_size, X.shape[1], X.shape[2]), stateful=True))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
for i in range(300):
	model.fit(X, y, epochs=1, batch_size=batch_size, verbose=2, shuffle=False)
	model.reset_states()
# summarize performance of the model
scores = model.evaluate(X, y, batch_size=batch_size, verbose=0)
model.reset_states()
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
seed = [char_to_int[alphabet[0]]]
for i in range(0, len(alphabet)-1):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()
# demonstrate a random starting point
letter = "K"
seed = [char_to_int[letter]]
print("New start: ", letter)
for i in range(0, 5):
	x = numpy.reshape(seed, (1, len(seed), 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	print(int_to_char[seed[0]], "->", int_to_char[index])
	seed = [index]
model.reset_states()

Performs output results are as follows:

Model Accuracy: 100.00%
A -> B
B -> C
C -> D
D -> E
E -> F
F -> G
G -> H
H -> I
I -> J
J -> K
K -> L
L -> M
M -> N
N -> O
O -> P
P -> Q
Q -> R
R -> S
S -> T
T -> U
U -> V
V -> W
W -> X
X -> Y
Y -> Z
New start:  K
K -> B
B -> C
C -> D
D -> E
E -> F

We can see the network perfectly remember the entire alphabet. It uses the context of the sample itself, and understand any dependencies predict the next sequence of characters required. We can also see, if we use the first letter of the network as a seed, it can correctly set aside the chaos of the rest of the alphabet. We can also see that it only learned the complete sequence of letters, and is a popular from the beginning of learning. When asked you to predict the next letter "K" in, it will predict the "B" and return to the entire alphabet. In order to predict the true "K", it requires an iterative network state preheated to from "A" to "J" represents a letter. This tells us that through the following training data, we can use the "stateless" LSTM achieve the same effect:

---a -> b
--ab -> c
-abc -> d
abcd -> e

25 is fixed to the input sequence (from y to a prediction z), and zero padding prefix mode. Finally, raise issues under variable length input sequence to predict a character to train LSTM network.

Input to a variable length character output LSTM

In the previous section, we found Keras "stateful" before LSTM is really just a shortcut n replay sequences, but does not really help us learn general model letters.

In this section, we explore the "stateless" LSTM a variant, the random sequence variants learning the alphabet, and strive to build a model of any letter or sequences can be given and predict the next alphabet letter. First of all, we're framework for change. For simplicity, we will define the maximum length of the input sequence and is set to a small value of 5 or the like, in order to speed up training. This defines the maximum length of letter sequences to be trained. In the extension, if we allow loops back to the beginning of the sequence, you can set it to the whole alphabet (26) or more. We also need to define the number of random sequences to be created, in this case 1000. This may be more or less. I want to actually need less mode.

# prepare the dataset of input to output pairs encoded as integers
num_inputs = 1000
max_len = 5
dataX = []
dataY = []
for i in range(num_inputs):
	start = numpy.random.randint(len(alphabet)-2)
	end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
	sequence_in = alphabet[start:end+1]
	sequence_out = alphabet[end + 1]
	dataX.append([char_to_int[char] for char in sequence_in])
	dataY.append(char_to_int[sequence_out])
	print(sequence_in, '->', sequence_out)

Input sample as follows:

PQRST -> U
W -> X
O -> P
OPQ -> R
IJKLM -> N
QRSTU -> V
ABCD -> E
X -> Y
GHIJ -> K

The length of the input sequence varies between 1 to max_len, zero padding is required. Here, we use the built pad_sequences () function in the left side of Keras (prefix) is filled.

X = pad_sequences(dataX, maxlen=max_len, dtype='float32')

The trained model assessment on the input mode selected at random. It is as easy as a new randomly generated sequence of characters. I also believe that this can also be a linear sequence to "A" as a seed, as a single character input output fes return.

Complete code implementation is as follows:

# LSTM with Variable Length Input Sequences to One Character Output
import numpy
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from keras.utils import np_utils
from keras.preprocessing.sequence import pad_sequences
from theano.tensor.shared_randomstreams import RandomStreams
# fix random seed for reproducibility
numpy.random.seed(7)
# define the raw dataset
alphabet = "ABCDEFGHIJKLMNOPQRSTUVWXYZ"
# create mapping of characters to integers (0-25) and the reverse
char_to_int = dict((c, i) for i, c in enumerate(alphabet))
int_to_char = dict((i, c) for i, c in enumerate(alphabet))
# prepare the dataset of input to output pairs encoded as integers
num_inputs = 1000
max_len = 5
dataX = []
dataY = []
for i in range(num_inputs):
	start = numpy.random.randint(len(alphabet)-2)
	end = numpy.random.randint(start, min(start+max_len,len(alphabet)-1))
	sequence_in = alphabet[start:end+1]
	sequence_out = alphabet[end + 1]
	dataX.append([char_to_int[char] for char in sequence_in])
	dataY.append(char_to_int[sequence_out])
	print(sequence_in, '->', sequence_out)
# convert list of lists to array and pad sequences if needed
X = pad_sequences(dataX, maxlen=max_len, dtype='float32')
# reshape X to be [samples, time steps, features]
X = numpy.reshape(X, (X.shape[0], max_len, 1))
# normalize
X = X / float(len(alphabet))
# one hot encode the output variable
y = np_utils.to_categorical(dataY)
# create and fit the model
batch_size = 1
model = Sequential()
model.add(LSTM(32, input_shape=(X.shape[1], 1)))
model.add(Dense(y.shape[1], activation='softmax'))
model.compile(loss='categorical_crossentropy', optimizer='adam', metrics=['accuracy'])
model.fit(X, y, epochs=500, batch_size=batch_size, verbose=2)
# summarize performance of the model
scores = model.evaluate(X, y, verbose=0)
print("Model Accuracy: %.2f%%" % (scores[1]*100))
# demonstrate some model predictions
for i in range(20):
	pattern_index = numpy.random.randint(len(dataX))
	pattern = dataX[pattern_index]
	x = pad_sequences([pattern], maxlen=max_len, dtype='float32')
	x = numpy.reshape(x, (1, max_len, 1))
	x = x / float(len(alphabet))
	prediction = model.predict(x, verbose=0)
	index = numpy.argmax(prediction)
	result = int_to_char[index]
	seq_in = [int_to_char[value] for value in pattern]
	print(seq_in, "->", result)

The results output is as follows:

Model Accuracy: 98.90%
['Q', 'R'] -> S
['W', 'X'] -> Y
['W', 'X'] -> Y
['C', 'D'] -> E
['E'] -> F
['S', 'T', 'U'] -> V
['G', 'H', 'I', 'J', 'K'] -> L
['O', 'P', 'Q', 'R', 'S'] -> T
['C', 'D'] -> E
['O'] -> P
['N', 'O', 'P'] -> Q
['D', 'E', 'F', 'G', 'H'] -> I
['X'] -> Y
['K'] -> L
['M'] -> N
['R'] -> T
['K'] -> L
['E', 'F', 'G'] -> H
['Q'] -> R
['Q', 'R', 'S'] -> T

We can see that, although the model is not perfectly learned the alphabet from randomly generated sequences in, but it's a very good effect. The model has not been adjusted, may need more training or larger networks, or both (is an exercise to the reader). This is on top of learning "all in order batch input example of" good natural extension of models of letters, because it can handle ad hoc queries, but this is arbitrary sequence length (maximum length).

Summary

In this article, you find Keras in LSTM recurrent neural networks and how they manage state. Specifically, you'll learn:

如何为单个字符到一个字符的预测开发幼稚的LSTM网络。
如何配置朴素的LSTM以学习样本中各个时间步长的序列。
如何配置LSTM以通过手动管理状态来学习样本之间的序列。

Together_CZ blog expert

Published 532 original articles · won praise 1297 · Views 3.34 million +

His message board concerns