Reprinted from: https://blog.csdn.net/mylove0414/article/details/55805974

RNN and LSTM

This part mainly involves the theory of recurrent neural network, which may be brief.

What is RNN

The full name of RNN is Recurrent Neural Networks, which is used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and the nodes between each layer are unconnected. But this kind of ordinary neural network is powerless for many problems about time series. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will memorize the information at the previous moment and apply it to the calculation of the current output, that is, the nodes between the hidden layers are no longer unconnected but connected, and the input of the hidden layer not only includes the input layer The output also includes the output of the hidden layer at the previous moment.
Having said so much, it is like this with a picture.

write picture description here
In a traditional neural network, data is input from the input layer, processed in the hidden layer, and output from the output layer. The difference of RNN is that the processing method in the hidden layer is different. The latter node is not only affected by the input of the input layer, but also affected by the previous node.
It expands like this:
write picture description here

x in the figuret−1

， xt ， xt+1 It is the input at different times. Each x has the n-dimensional feature of the input layer. After entering the recurrent neural network in turn, the hidden layer outputs s .t subject to the previous moment st−1 The hidden layer output of , and the input layer input x at this momentt

both influences.
If you want to learn more about tensorflow's explanation of RNN, please check the official tensorflow.RNN
additionally recommended learning materials: WildML

What is LSTM

LSTM stands for Long-Short Term Memory (Long-Short Term Memory), which is a variant of RNN. For example, let's say we try to predict the underlined word "I grew up in France... with a lot of words in between... I speak fluent __ ". We beat our heads and thought that the word should be French. For the RNN, the current information suggests that the next word might be the name of a language, but if we need to figure out what language it is, we need information about the word "France" far from the current underscore position. The gap between the relevant information and the current predicted position becomes quite large, and as the gap grows, the RNN loses its ability to learn to connect such distant information.
At this time, LSTM is needed. In LSTM, we can control what information is discarded and what information is stored.
I won’t say much about the specific theory here. I recommend a blog post Understanding LSTM Networks , which contains a detailed introduction to LSTM. For translations made by netizens, please click [Translation] to understand LSTM networks

stock forecast

Based on an understanding of the theory, we use LSTM to predict the daily high price of a stock. In this example, only one-dimensional features are used.
The data format is as follows:
write picture description here

In this example, the daily highest price is used as the input feature [x], and the highest price of the next day is the label [y]
to obtain data, please click stock_dataset.csv , password: md9l

Import Data:

import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import tensorflow
f=open('stock_dataset.csv')  
df=pd.read_csv(f) #Read in stock data
data=np.array(df['highest price']) #Get the highest price sequence
data=data[::-1] #Reverse, so that the data are arranged in the order of the date
#Display data as a line chart
plt.figure()
plt.plot(data)
plt.show()
normalize_data=(data-np.mean(data))/np.std(data)  #标准化
normalize_data=normalize_data[:,np.newaxis] #Increase dimension
#————————————————————Form training set —————————————————————
#set constants
time_step=20 #time step
rnn_unit=10       #hidden layer units
batch_size=60 #How many examples are trained in each batch
input_size=1 #input layer dimension
output_size=1 #output layer dimension
lr=0.0006 #learning rate
train_x,train_y=[],[] #training set
for i in range(len(normalize_data)-time_step-1):
    x=normalize_data[i:i+time_step]
    y=normalize_data[i+1:i+time_step+1]
    train_x.append(x.tolist())
    train_y.append(y.tolist())

The resulting train_x looks like this:

[[[-1.59618],...18 more..., [-1.56340]]
  ……
 [[-1.59202] [-1.58244]]]

is a matrix of shape [-1, time_step, input__size]

Defining Neural Network Variables

X=tf.placeholder(tf.float32, [None,time_step,input_size]) #The tensor of each batch of input network
Y=tf.placeholder(tf.float32, [None,time_step,output_size]) #The label corresponding to each batch of tensor

#input layer, output layer weight, bias
weights={
         'in':tf.Variable(tf.random_normal([input_size,rnn_unit])),
         'out':tf.Variable(tf.random_normal([rnn_unit,1]))
         }
biases={
        'in':tf.Variable(tf.constant(0.1,shape=[rnn_unit,])),
        'out':tf.Variable(tf.constant(0.1,shape=[1,]))
        }

define lstm network

def lstm(batch): #Parameter: input network batch number
    w_in=weights['in']
    b_in=biases['in']
    input=tf.reshape(X,[-1,input_size]) #The tensor needs to be converted into 2 dimensions for calculation, and the calculated result is used as the input of the hidden layer
    input_rnn=tf.matmul(input,w_in)+b_in
    input_rnn=tf.reshape(input_rnn,[-1,time_step,rnn_unit]) #Convert the tensor to 3 dimensions as the input of the lstm cell
    cell=tf.nn.rnn_cell.BasicLSTMCell(rnn_unit)
    init_state=cell.zero_state(batch,dtype=tf.float32)
    output_rnn,final_states=tf.nn.dynamic_rnn(cell, input_rnn,initial_state=init_state, dtype=tf.float32) #output_rnn is the result of recording each output node of lstm, final_states is the result of the last cell
    output=tf.reshape(output_rnn,[-1,rnn_unit]) #As the input of the output layer
    w_out=weights['out']
    b_out=biases['out']
    pred=tf.matmul(output,w_out)+b_out
    return pred,final_states

Train the model

def train_lstm():
    global batch_size
    pred,_=rnn(batch_size)
    #loss function
    loss=tf.reduce_mean(tf.square(tf.reshape(pred,[-1])-tf.reshape(Y, [-1])))
 train_op=tf.train.AdamOptimizer(lr).minimize(loss)
    saver=tf.train.Saver(tf.global_variables())
    with tf.Session() as sess:
        sess.run(tf.global_variables_initializer())
        #Repeat training 10000 times
        for i in range(10000):
            step=0
            start=0
            end=start+batch_size
            while(end<len(train_x)):
                _,loss_=sess.run([train_op,loss],feed_dict={X:train_x[start:end],Y:train_y[start:end]})
                start+=batch_size
                end=start+batch_size
                #Save parameters every 10 steps
                if step%10==0:
                    print(i,step,loss_)
                    print("Save model: ",saver.save(sess,'stock.model'))
                step+=1

predictive model

def prediction():
    pred,_=lstm(1) #Only input test data of [1,time_step,input_size] when predicting
    saver=tf.train.Saver(tf.global_variables())
    with tf.Session() as sess:
        #Parameter recovery
        module_file = tf.train.latest_checkpoint(base_path+'module2/')
        saver.restore(sess, module_file)
        # Take the last row of the training set as a test sample. shape=[1,time_step,input_size]
        prev_seq=train_x[-1]
        predict=[]
        #Get the next 100 prediction results
        for i in range(100):
            next_seq=sess.run(pred,feed_dict={X:[prev_seq]})
            predict.append(next_seq[-1])
            #Get the prediction result of the last time step each time, add it together with the previous data to form a new test sample
            prev_seq=np.vstack((prev_seq[1:],next_seq[-1]))
        #Represent the result as a line graph
        plt.figure()
        plt.plot(list(range(len(normalize_data))), normalize_data, color='b')
        plt.plot(list(range(len(normalize_data), len(normalize_data) + len(predict))), predict, color='r')
        plt.show()

code

full code

In this lecture, only the highest price is used as a feature to predict the trend of the highest price in the future. In the next lecture, the input feature dimension will be added, and the lowest price, opening price, closing price, transaction volume, etc. are used as the input feature pair. The highest price after the pair Make predictions.

Note: This article is in the introduction of RNN and LSTM. If the source involves copyright issues or the original link is wrong, please correct it, and it will be revised immediately.

Tensorflow Example: Using LSTM to Predict the Daily Highest Price of Stocks (1)