2021 Li Hongyi Machine Learning Course Notes - Recurrent Neural Network

Note: This is a simple note used by the author for the final review, so it is difficult to be comprehensive and detailed. If you have any questions, please discuss in the comment area.

I. Basic Idea

First introduce an example, the slot filling (Slot Filling) problem:

  • Input: I would like to arrive Guangzhou on November 2nd.
  • Output: Destination=> Taipei | time of arrival=> November 2nd

That is, we need to extract the given type of information we want to know in the input sentence (for example, there are two types above, destination and arrival time). So for this problem, we can solve it as a classification problem, predicting whether each word belongs to one of the target categories.

In deep networks, only vectors (matrices) can be accepted as input, so words need to be mapped into vectors first. This process is called word embedding. Common embedding methods include one-hot encoding, word hashing, etc.

Problem: The network needs contextual information to classify words accurately, for example:

  • arrive Guangzhou on November 2nd
  • leave Guangzhou on November 2nd

Guangzhou in these two sentences is the location. However, if we don't see the arrival or leave before the network, we can't distinguish whether Guangzhou is the origin or the destination, that is, the network needs memory.

II. RNN

Recurrent Neural Network is a neural network with memory, where the memory stores the output of the hidden layer of the network:
insert image description here
a more popular RNN drawing method is given below:
insert image description here
that is, the content of the previous hidden layer (a1) will be stored and used as an input to the next hidden layer (a2). Returning to the above example, the information (a1) obtained by the processing of arrive(x1) will be input into the information (a2) encoded by Guangzhou(x2), which affects the determination of the content category of Guangzhou.

In addition, there are many variants of RNN, such as feeding the output of the previous layer (instead of the previous hidden layer) into the next hidden layer:
insert image description here
or bidirectional RNN:
insert image description here

III. LSTM

LSTM (Long Short-term Memory, long short-term memory network) is also a kind of RNN. Compared with RNN, its "memory" will be more complicated, as shown below:
insert image description here
The lump in the red box is equivalent to the previous RNN. Speaking of a1, but a1 only has simple storage and transfer functions, and the LSTM module here is more complex in function and structure. If you use a more popular form, LSTM looks like this: insert image description here
This structure... is actually quite complicated, and it is not required to memorize it. You can directly adjust the package when you use it.

Regarding LSTM, we also need to understand one of its properties, that is, it can prevent the gradient disappearance problem (cannot prevent the gradient explosion), and add it to the input through its memory.

Guess you like

Origin blog.csdn.net/qq_40714949/article/details/122306380
Recommended