Python Basic Tutorial: Implementation of Time Series Prediction Analysis Using LSTM Model in Python

@This article comes from the public number: csdn2299, like you can pay attention to the public number programmer academy

This article mainly introduces the implementation of time series prediction and analysis in Python using the LSTM model. The sample code in this article is very detailed and has certain reference learning value for everyone's learning or work. Let's learn together.
Time series model

Time series forecast analysis is to use the characteristics of an event time in the past period to predict the characteristics of the event in the future period. This is a relatively complex type of predictive modeling problem. Unlike the regression analysis model, the time series model depends on the sequence of events. The output of the input model after the same size value changes the order is different.
Let's take a chestnut: based on the daily stock price data of a stock for the past two years, predict the stock price change of the following week; predict the number of people who will consume in the store next week based on the number of people who want to spend every week in a store in the past 2 years, etc.

RNN and LSTM models

The most commonly used and powerful tool for time series models is the recurrent neural network (RNN). Compared with the independent characteristics of the calculation results of the ordinary neural network, the calculation results of each hidden layer of the RNN are related to the current input and the previous hidden layer results. Through this method, the RNN calculation results have the characteristics of memorizing the previous results.

A typical RNN network structure is as follows: The structure Insert picture description here
on the right is created for easy understanding of memory during calculation. Simply put, x is the input layer, o is the output layer, s is the hidden layer, and t refers to the number of calculations; V, W, U are the weights, where St = f (U Xt + W St-1), to achieve the purpose of linking the current input result with the previous calculation. The
limitation
of RNN : Because the RNN model needs to achieve long-term memory, the current hidden state calculation needs to be calculated with the previous n calculations. Hook, that is St = f (U Xt + W1 St-1 + W2 St-2 +… + Wn St-n), then the amount of calculation will increase exponentially, resulting in a significant increase in the time of model training, so the RNN model is generally Used directly for long-term memory calculations.

LSTM model
LSTM (Long Short-Term Memory) model is a variant of RNN, first proposed by Juergen Schmidhuber. The structure of a classic LSTM model is as follows: Insert picture description here
The characteristic of LSTM is the addition of valve nodes of various layers in addition to the RNN structure. There are 3 types of valves: forget gate, input gate and output gate. These valves can be opened or closed, and used to judge whether the memory output of the model network (the state of the previous network) at this layer reaches the threshold to be added to the current calculation of this layer. As shown in the figure, the valve node uses the sigmoid function to calculate the memory state of the network as an input; if the output result reaches the threshold, the valve output is multiplied by the calculation result of the current layer as the input of the next layer (PS: here Multiplication means element-by-element multiplication in the matrix); if the threshold is not reached, the output is forgotten. The weight of each layer including the valve node is updated during each model back propagation training process. The more specific LSTM judgment calculation process is shown below: Insert picture description here
The memory function of the LSTM model is realized by these valve nodes. When the valve is opened, the training results of the previous model will be related to the current model calculation, and when the valve is closed, the previous calculation results will no longer affect the current calculation. Therefore, by adjusting the opening and closing of the valve, we can achieve the effect of the early sequence on the final result. And when you do n’t want the previous results to affect the future, such as the beginning of a new paragraph or chapter in natural language processing, then turn off the valve to
illustrate how the valve works in detail: the sequence is controlled by the valve control The first input variable affects the variable calculation results of the fourth and sixth sequences.
Insert picture description here
The black solid circle represents that the calculation result of the node is output to the next layer or the next calculation; the open circle represents that the calculation result of the node is not input to the network or has not received a signal from the previous time.

Implementation of LSTM model building in Python

There are many packages in Python that can be called directly to build LSTM models, such as pybrain, kears, tensorflow, cikit-neuralnetwork, etc.
Here we use keras. (PS: If the operating system uses linux or mac, push Tensorflow !!!)

Because the training of the LSTM neural network model can be optimized by adjusting many parameters, such as the activation function, the number of LSTM layers, the input and output variable dimensions, etc., the adjustment process is quite complicated. Here is just a simple application example to describe the construction process of LSTM.

Applications

Based on the historical consumption time of a customer of a certain store, the time of the customer's previous visit to the store is estimated. The specific data are as follows:

消费时间
2015-05-15 14:03:51
2015-05-15 15:32:46
2015-06-28 18:00:17
2015-07-16 21:27:18
2015-07-16 22:04:51
2015-09-08 14:59:56
..
..

Specific operation:

  1. The conversion of the original data
    first needs to digitize the time point data. Converting a specific time into a time period is used to indicate the time interval between the user's two adjacent consumptions, and then importing the model for training is a more common method. The converted data is as follows:
消费间隔
0
44
18
0
54
..
..

2. Generate a model training data set (determine the window length of the training set)
. The window here refers to how many consumption intervals are needed to predict the next consumption interval. Here we first use a window length of 3, that is, use t-2, t-1, t consumption intervals for model training, and then use t + 1 intervals to verify the results. The data set format is as follows: X is training data and Y is verification data.
PS: It is also inappropriate to say that it is determined here, because the window length needs to be adjusted according to the model verification results.

X1  X2  X3  Y
0  44  18  0
44  18  0  54
..
..

Note: The general accuracy of the prediction will be worse directly. You can bin the predicted value Y according to the value into several categories, and then use the one-hot label to convert to training. For example, if you divide Y into five categories (1: 0-20, 2: 20-40, 3: 40-60, 4: 60-80, 5: 80-100) according to the value range, the above formula can be transformed into:

X1  X2  X3  Y
0  44  18  0
44  18  0  4
...

After Y is converted into one-hot, it is

1  0  0  0  0
0  0  0  0  1
...
  1. Network model structure determination and adjustment
    Here we use python's keras library. (Students using java can refer to deeplearning4j library). The training process of the network is designed to adjust many parameters: for example

The activation function (activation fucntion) of the LSTM module needs to be determined (the default is tanh in keras);

Determine the activation function of the fully-connected artificial neural network that receives the LSTM output (the default is linear in keras);

Determine the rejection rate of each layer of network nodes (in order to prevent overfitting), here we set the default value to 0.2;

To determine the calculation method of the error, here we use mean squared error (mean squared error);

Iterative update method to determine the weight parameters, here we use the RMSprop algorithm, which is usually used in RNN networks. Determine the epoch and batch size of the model training.
Generally speaking, the more layers of the LSTM module (generally no more than 3 layers, it is more difficult to converge when training more), the stronger the learning ability for high-level time; Finally, a common neural network layer will be added for dimensionality reduction of the output results. The typical structure is as follows: Insert picture description here
If multiple sequences need to be trained on the same model, the sequences can be input to independent LSTM modules and the output results are combined and input to the common layer. The structure is as follows: Insert picture description here
4. Model training and result prediction
The above data set is randomly split into a training set and a verification set at a ratio of 4: 1, in order to prevent overfitting. Train the model. Then import the X column of the data as a parameter to the model to get the predicted value, and compare the actual Y value to the model.

Implementation code

The time interval sequence is formatted into the required training set format

import pandas as pd
import numpy as np
 
def create_interval_dataset(dataset, look_back):
  """
  :param dataset: input array of time intervals
  :param look_back: each training set feature length
  :return: convert an array of values into a dataset matrix.
  """
  dataX, dataY = [], []
  for i in range(len(dataset) - look_back):
    dataX.append(dataset[i:i+look_back])
    dataY.append(dataset[i+look_back])
  return np.asarray(dataX), np.asarray(dataY)
 
df = pd.read_csv("path-to-your-time-interval-file")  
dataset_init = np.asarray(df)  # if only 1 column
dataX, dataY = create_interval_dataset(dataset, lookback=3)  # look back if the training set sequence length

The input data source here is a csv file, if the input data is from the database, you can refer to here

LSTM network structure

import pandas as pd
import numpy as np
import random
from keras.models import Sequential, model_from_json
from keras.layers import Dense, LSTM, Dropout
 
class NeuralNetwork():
  def __init__(self, **kwargs):
    """
    :param **kwargs: output_dim=4: output dimension of LSTM layer; activation_lstm='tanh': activation function for LSTM layers; activation_dense='relu': activation function for Dense layer; activation_last='sigmoid': activation function for last layer; drop_out=0.2: fraction of input units to drop; np_epoch=10, the number of epoches to train the model. epoch is one forward pass and one backward pass of all the training examples; batch_size=32: number of samples per gradient update. The higher the batch size, the more memory space you'll need; loss='mean_square_error': loss function; optimizer='rmsprop'
    """
    self.output_dim = kwargs.get('output_dim', 8)
    self.activation_lstm = kwargs.get('activation_lstm', 'relu')
    self.activation_dense = kwargs.get('activation_dense', 'relu')
    self.activation_last = kwargs.get('activation_last', 'softmax')  # softmax for multiple output
    self.dense_layer = kwargs.get('dense_layer', 2)   # at least 2 layers
    self.lstm_layer = kwargs.get('lstm_layer', 2)
    self.drop_out = kwargs.get('drop_out', 0.2)
    self.nb_epoch = kwargs.get('nb_epoch', 10)
    self.batch_size = kwargs.get('batch_size', 100)
    self.loss = kwargs.get('loss', 'categorical_crossentropy')
    self.optimizer = kwargs.get('optimizer', 'rmsprop')
 
    def NN_model(self, trainX, trainY, testX, testY):
    """
    :param trainX: training data set
    :param trainY: expect value of training data
    :param testX: test data set
    :param testY: epect value of test data
    :return: model after training
    """
    print "Training model is LSTM network!"
    input_dim = trainX[1].shape[1]
    output_dim = trainY.shape[1] # one-hot label
    # print predefined parameters of current model:
    model = Sequential()
    # applying a LSTM layer with x dim output and y dim input. Use dropout parameter to avoid overfitting
    model.add(LSTM(output_dim=self.output_dim,
            input_dim=input_dim,
            activation=self.activation_lstm,
            dropout_U=self.drop_out,
            return_sequences=True))
    for i in range(self.lstm_layer-2):
      model.add(LSTM(output_dim=self.output_dim,
            input_dim=self.output_dim,
            activation=self.activation_lstm,
            dropout_U=self.drop_out,
            return_sequences=True))
    # argument return_sequences should be false in last lstm layer to avoid input dimension incompatibility with dense layer
    model.add(LSTM(output_dim=self.output_dim,
            input_dim=self.output_dim,
            activation=self.activation_lstm,
            dropout_U=self.drop_out))
    for i in range(self.dense_layer-1):
      model.add(Dense(output_dim=self.output_dim,
            activation=self.activation_last))
    model.add(Dense(output_dim=output_dim,
            input_dim=self.output_dim,
            activation=self.activation_last))
    # configure the learning process
    model.compile(loss=self.loss, optimizer=self.optimizer, metrics=['accuracy'])
    # train the model with fixed number of epoches
    model.fit(x=trainX, y=trainY, nb_epoch=self.nb_epoch, batch_size=self.batch_size, validation_data=(testX, testY))
    # store model to json file
    model_json = model.to_json()
    with open(model_path, "w") as json_file:
      json_file.write(model_json)
    # store model weights to hdf5 file
    if model_weight_path:
      if os.path.exists(model_weight_path):
        os.remove(model_weight_path)
      model.save_weights(model_weight_path) # eg: model_weight.h5
    return model

Here I only write about the structure of the LSTM network. As for how to normalize the data processing into the structure required by the network and visualize the statistical comparison of the model prediction results with the actual values, you need to adjust it according to the actual situation

Thank you very much for reading
. When I chose to study python at university, I found that I ate a bad computer foundation. I did n’t have an academic qualification. This is
nothing to do. I can only make up for it, so I started my own counterattack outside of coding. The road, continue to learn the core knowledge of python, in-depth study of computer basics, sorted out, if you are not willing to be mediocre, then join me in coding, and continue to grow!
In fact, there are not only technology here, but also things beyond those technologies. For example, how to be an exquisite programmer, rather than "cock silk", the programmer itself is a noble existence, isn't it? [Click to join] Want to be yourself, want to be a noble person, come on!

Published 40 original articles · praised 14 · 20,000+ views

Guess you like

Origin blog.csdn.net/chengxun03/article/details/105498216