Introduction to NLP (4) RNN

NLP Introductory Tutorial Series

Chapter 1 Distributed Representation of Natural Language and Words

Improved version of Chapter 2 based on counting methods

Chapter 3 A brief introduction to word2vec



foreword

RNN is a very classic model in NLP, and RNN is characterized by having such a loop (or loop). This loop allows the data to circulate continuously. Through the cycle of data, RNN remembers the past data while updating to the latest data


1. Basic introduction to RNN structure

The network structure of rnn is actually simpler than that of cnn. As shown in the figure below,
rnn structure diagram
one input is x, and part of the output after passing through the network is h, and then a copy of h will be returned to rnn as input. After expanding the above figure, the following figure can be obtained
rnn structure diagram
. It can be seen that the output at time 0 will be used as the input at time 1 and input to the network together with x at time 1,
where
ht = tanh ( ht − 1 W h + xt W x + b ) h_t = tanh(h_{t−1}W_h + x_tW_x + b)ht=t you ( ht1Wh+xtWx+b ) RNN has two weights, which are the weight Wx that converts the input x into the output h and the weight W h
that converts the output of the previous RNN layer into the output at the current momentWh. In addition, there is a bias b. Here, ht − 1 h_{t−1}ht1and xt x_txtare row vectors

In many literatures, the output of RNN ht h_thtCalled the hidden state (hidden state) or hidden state vector (hidden state vector)

2. Backpropagation

1.Backpropagation Through Time

After the RNN layer is expanded, it can be regarded as a neural network extending in the horizontal direction, so the learning of RNN can be carried out in the same way as the learning of ordinary neural networks. Similarly, backpropagation can also be performed in the usual way. However, when the time span of time series data increases, a lot of resources will be consumed, and the gradient will also become unstable.

2.Truncated BPTT

When dealing with long time series data, it is common practice to truncate network connections into appropriate lengths. Specifically, the network that is too long in the time axis direction is truncated at a suitable position to create multiple small networks, and then the error backpropagation method is performed on the truncated small networks. This method is called Truncated BPTT. When dealing with time series data of length 1000, if the RNN layer is expanded, it becomes a network with 1000 layers arranged in the horizontal direction. Of course, no matter how many layers are arranged, the gradient can be calculated according to the error backpropagation method. However, if the sequence is too long, there will be computational or memory usage issues. Also, as the layer gets longer, the gradient gets smaller and the gradient will not be able to propagate to the previous layer. Therefore, as shown in the figure,
let's consider truncating the backpropagation connections of the network at an appropriate length in the horizontal direction. In
cut off
this case, the backpropagation connections are truncated so that learning can be performed in units of 10 RNN layers. Like this, as long as the backpropagation connection is truncated, there is no need to consider data outside the block range, so the error backpropagation method can be completed in units of blocks (not associated with other blocks)

3. Mini-batch learning of Truncated BPTT

When discussing Truncated BPTT, mini-batch learning is not considered. In other words, our previous discussion corresponds to a batch size of 1. Therefore, at the beginning of the input data, an "offset" needs to be made in each batch. For time series data with a length of 1000, truncation is performed in units of time length 10. At this point, how to set the batch size to 2 for learning? In this case, as the input data of the RNN layer, the first sample data is input in order from the beginning, and the second data is input in order from the 500th data. That is, shift the starting position by 500
minibatch


Summarize

The above is a brief summary of the content today. This article only briefly introduces rnn, and the specific sample code will be given later.

Guess you like

Origin blog.csdn.net/weixin_39524208/article/details/131692340