Reprinted from: https://blog.csdn.net/mylove0414/article/details/55805974
RNN and LSTM
This part mainly involves the theory of recurrent neural network, which may be brief.
What is RNN
The full name of RNN is Recurrent Neural Networks, which is used to process sequence data. In the traditional neural network model, from the input layer to the hidden layer to the output layer, the layers are fully connected, and the nodes between each layer are unconnected. But this kind of ordinary neural network is powerless for many problems about time series. For example, if you want to predict the next word of a sentence, you generally need to use the previous words, because the front and rear words in a sentence are not independent. The reason why RNN is called a recurrent neural network is that the current output of a sequence is also related to the previous output. The specific manifestation is that the network will memorize the information at the previous moment and apply it to the calculation of the current output, that is, the nodes between the hidden layers are no longer unconnected but connected, and the input of the hidden layer not only includes the input layer The output also includes the output of the hidden layer at the previous moment.
Having said so much, it is like this with a picture.
In a traditional neural network, data is input from the input layer, processed in the hidden layer, and output from the output layer. The difference of RNN is that the processing method in the hidden layer is different. The latter node is not only affected by the input of the input layer, but also affected by the previous node.
It expands like this:
x in the figuret−1
, xt , xt+1 It is the input at different times. Each x has the n-dimensional feature of the input layer. After entering the recurrent neural network in turn, the hidden layer outputs s .t subject to the previous moment st−1 The hidden layer output of , and the input layer input x at this momenttboth influences.
If you want to learn more about tensorflow's explanation of RNN, please check the official tensorflow.RNN
additionally recommended learning materials: WildML
What is LSTM
LSTM stands for Long-Short Term Memory (Long-Short Term Memory), which is a variant of RNN. For example, let's say we try to predict the underlined word "I grew up in France... with a lot of words in between... I speak fluent __ ". We beat our heads and thought that the word should be French. For the RNN, the current information suggests that the next word might be the name of a language, but if we need to figure out what language it is, we need information about the word "France" far from the current underscore position. The gap between the relevant information and the current predicted position becomes quite large, and as the gap grows, the RNN loses its ability to learn to connect such distant information.
At this time, LSTM is needed. In LSTM, we can control what information is discarded and what information is stored.
I won’t say much about the specific theory here. I recommend a blog post Understanding LSTM Networks , which contains a detailed introduction to LSTM. For translations made by netizens, please click [Translation] to understand LSTM networks
stock forecast
Based on an understanding of the theory, we use LSTM to predict the daily high price of a stock. In this example, only one-dimensional features are used.
The data format is as follows:
In this example, the daily highest price is used as the input feature [x], and the highest price of the next day is the label [y]
to obtain data, please click stock_dataset.csv , password: md9l
Import Data:
import pandas as pd import numpy as np import matplotlib.pyplot as plt import tensorflow f=open('stock_dataset.csv') df=pd.read_csv(f) #Read in stock data data=np.array(df['highest price']) #Get the highest price sequence data=data[::-1] #Reverse, so that the data are arranged in the order of the date #Display data as a line chart plt.figure() plt.plot(data) plt.show() normalize_data=(data-np.mean(data))/np.std(data) #标准化 normalize_data=normalize_data[:,np.newaxis] #Increase dimension #————————————————————Form training set ————————————————————— #set constants time_step=20 #time step rnn_unit=10 #hidden layer units batch_size=60 #How many examples are trained in each batch input_size=1 #input layer dimension output_size=1 #output layer dimension lr=0.0006 #learning rate train_x,train_y=[],[] #training set for i in range(len(normalize_data)-time_step-1): x=normalize_data[i:i+time_step] y=normalize_data[i+1:i+time_step+1] train_x.append(x.tolist()) train_y.append(y.tolist())
The resulting train_x looks like this:
[[[-1.59618],...18 more..., [-1.56340]] …… [[-1.59202] [-1.58244]]]
is a matrix of shape [-1, time_step, input__size]
Defining Neural Network Variables
X=tf.placeholder(tf.float32, [None,time_step,input_size]) #The tensor of each batch of input network Y=tf.placeholder(tf.float32, [None,time_step,output_size]) #The label corresponding to each batch of tensor #input layer, output layer weight, bias weights={ 'in':tf.Variable(tf.random_normal([input_size,rnn_unit])), 'out':tf.Variable(tf.random_normal([rnn_unit,1])) } biases={ 'in':tf.Variable(tf.constant(0.1,shape=[rnn_unit,])), 'out':tf.Variable(tf.constant(0.1,shape=[1,])) }
define lstm network
def lstm(batch): #Parameter: input network batch number w_in=weights['in'] b_in=biases['in'] input=tf.reshape(X,[-1,input_size]) #The tensor needs to be converted into 2 dimensions for calculation, and the calculated result is used as the input of the hidden layer input_rnn=tf.matmul(input,w_in)+b_in input_rnn=tf.reshape(input_rnn,[-1,time_step,rnn_unit]) #Convert the tensor to 3 dimensions as the input of the lstm cell cell=tf.nn.rnn_cell.BasicLSTMCell(rnn_unit) init_state=cell.zero_state(batch,dtype=tf.float32) output_rnn,final_states=tf.nn.dynamic_rnn(cell, input_rnn,initial_state=init_state, dtype=tf.float32) #output_rnn is the result of recording each output node of lstm, final_states is the result of the last cell output=tf.reshape(output_rnn,[-1,rnn_unit]) #As the input of the output layer w_out=weights['out'] b_out=biases['out'] pred=tf.matmul(output,w_out)+b_out return pred,final_states
Train the model
def train_lstm(): global batch_size pred,_=rnn(batch_size) #loss function loss=tf.reduce_mean(tf.square(tf.reshape(pred,[-1])-tf.reshape(Y, [-1]))) train_op=tf.train.AdamOptimizer(lr).minimize(loss) saver=tf.train.Saver(tf.global_variables()) with tf.Session() as sess: sess.run(tf.global_variables_initializer()) #Repeat training 10000 times for i in range(10000): step=0 start=0 end=start+batch_size while(end<len(train_x)): _,loss_=sess.run([train_op,loss],feed_dict={X:train_x[start:end],Y:train_y[start:end]}) start+=batch_size end=start+batch_size #Save parameters every 10 steps if step%10==0: print(i,step,loss_) print("Save model: ",saver.save(sess,'stock.model')) step+=1
predictive model
def prediction(): pred,_=lstm(1) #Only input test data of [1,time_step,input_size] when predicting saver=tf.train.Saver(tf.global_variables()) with tf.Session() as sess: #Parameter recovery module_file = tf.train.latest_checkpoint(base_path+'module2/') saver.restore(sess, module_file) # Take the last row of the training set as a test sample. shape=[1,time_step,input_size] prev_seq=train_x[-1] predict=[] #Get the next 100 prediction results for i in range(100): next_seq=sess.run(pred,feed_dict={X:[prev_seq]}) predict.append(next_seq[-1]) #Get the prediction result of the last time step each time, add it together with the previous data to form a new test sample prev_seq=np.vstack((prev_seq[1:],next_seq[-1])) #Represent the result as a line graph plt.figure() plt.plot(list(range(len(normalize_data))), normalize_data, color='b') plt.plot(list(range(len(normalize_data), len(normalize_data) + len(predict))), predict, color='r') plt.show()
code
In this lecture, only the highest price is used as a feature to predict the trend of the highest price in the future. In the next lecture, the input feature dimension will be added, and the lowest price, opening price, closing price, transaction volume, etc. are used as the input feature pair. The highest price after the pair Make predictions.
Note: This article is in the introduction of RNN and LSTM. If the source involves copyright issues or the original link is wrong, please correct it, and it will be revised immediately.