Recurrent Neural Network (4)

1. Introduction to Recurrent Neural Networks and Natural Language Processing

Target

  • Knowledge token sum tokenization
  • Know the concept and function of N-gram
  • Know the method of text vectorization representation

1.1 Tokenization of text

1.1.1 Introduction to concepts and tools

Tokenization is commonly referred to as word segmentation, and we call each word token a token.
There are many common word segmentation tools, such as:

1.1.2 Method of Chinese and English word segmentation

insert image description here

1.2 N-garm expression method

We said earlier that a sentence can be represented by a single word or word, but sometimes, we can use 2, 3 or more words to represent a group of N-gram
words, where N means that it can be Number of words used together
For example:
insert image description here
insert image description here

1.3 Vectorization

insert image description here

1.3.1 one-hot encoding

insert image description here

1.3.2 word embedding

Word embedding is a commonly used method for representing text in deep learning. Unlike one-hot partial codes, word embedding uses a floating-point dense matrix to represent tokens. Depending on the size of the dictionary, our vectors usually use different dimensions, such as 100, 256, 300, etc. Each value in the orientation is a hyperparameter, its initial value is randomly generated, and then it will be learned during the training process.
If we have 20,000 words in the text, if we use one-hot encoding, then we will have a matrix of 20,000 20,000, most of which are 0, but if we use word embedding to represent, only 20,000 dimensions are needed, such as The representation of 20000*300
image is:
insert image description here
insert image description here

1.3.3 word embedding API

insert image description here

insert image description here

1.3.4 Data shape change

Thinking: Each sentence in each batch has 10 words. After the Word embedding with shape [20, 4], what shape will the original sentence become? Each word is represented by a
vector of length 4, so, The final sentence will become [batch_size,10,4] shape. Added a dimension, this dimension is embedding dim

2. Text sentiment classification

Target

  1. Know the basic methods of text processing
  2. Ability to use data to achieve sentiment classification

2.1 Case introduction

insert image description here

2.2 Thought Analysis

First of all, the above problem can be defined as a classification problem, and the emotional score is divided into 1-10, 10 categories (it can also be understood as a regression problem, which is considered as a classification problem here). Then, based on previous experience, our general process is as follows:

  1. Prepare dataset
  2. build model
  3. horizontal training
  4. Model evaluation
    After knowing the idea, let's complete the above steps step by step

2.3 Prepare the dataset

insert image description here

2.3.1 Preparation of basic Dataset

insert image description here
insert image description here
insert image description here
insert image description here

2.3.2 Text serialization

When we introduced word embedding again, we said that the text will not be directly converted into vectors, but first converted into numbers, and then converted into vectors, so how to realize this process? Here we can consider the text
in Each word and its corresponding number are stored in a dictionary , and the method is implemented to map the sentence into a list containing numbers through the dictionary .
Before implementing text serialization, consider the following points:

  1. How to use a dictionary to map words to numbers
  2. The number of occurrences of different languages ​​is not the same, whether it is necessary to filter high-frequency or low-frequency words, and whether the total number of words needs to be limited
  3. After getting the dictionary, how to convert the sentence into a sequence of numbers, and how to convert the sequence of numbers into a sentence
  4. Different sub-lengths are different, how to construct the sentences of each batch into the same length (short sentences can be filled with special characters)
  5. What to do if a new word does not appear in the dictionary (special characters can be used as a proxy)
    Idea analysis:
  6. Participate all sentences
  7. The words are stored in the dictionary, the words are filtered according to the number of times, and the number of times is counted
  8. The method of realizing text-to-number sequence
  9. Implement the method of converting digital sequences to text
    insert image description here
    insert image description here
    insert image description here
    insert image description here
    insert image description hereinsert image description here
    insert image description here
    insert image description here

2.4 Build the model

insert image description here

2.5 Model training and evaluation

insert image description hereinsert image description here

3. Recurrent neural network

Target

  1. Able to explain the concept and function of knot-loop neural network
  2. Be able to tell the types and application scenarios of recurrent neural networks
  3. Be able to tell the function and principle of LSTM
  4. Be able to tell the function and principle of GRU

3.1 Introduction to Recurrent Neural Networks

Why do you need a recurrent neural network when you have a neural network?
In an ordinary neural network, the transmission of information is one-way. Although this limitation makes the network easier to learn, it also weakens the neural network model to a certain extent. Ability. Especially in many real-world tasks, the output of the network is not only related to the input at the current moment, but also related to the output of a past period of time. In addition, it is difficult for ordinary networks to process time series data, such as video, voice, text, etc. The length of time series data is generally not fixed, while the feedforward neural network requires that the dimensions of the input and output are fixed and cannot be changed arbitrarily. Therefore, When dealing with this kind of timing-related problems, a more capable Wyatt type is needed.
Recurrent Neural Network (RNN) is a type of neural network with short-term memory capability. In a recurrent neural network, neurons can not only receive information from other neurons, but also receive information from themselves, forming a network structure with loops. In other words: the neuron's output can act directly on itself (as an input) at the next time step
insert image description here
insert image description here
insert image description here
insert image description here

3.2 LSTMs and GRUs

3.2.1 Basic introduction of LSTM

If there is such a need now, digging out the existing text to predict the next word, such as the clouds floating in the sky, the word can be predicted to be the sky by asking the position not far away, but for some other sentences, it may need to be predicted words before the first 100 words, then because the interval is very large at this time, as the interval increases, the influence of the real predicted value on the result may become very small, and it cannot be predicted very well (long-term in RNN Dependency problem (long-Term Dependencies))
Then in order to solve this problem, LSTM (Long Short-Term Memory network)
LSTM is a special type of RNN that can learn long-term dependency information. On many problems, LSTM has achieved considerable success and has been widely used.
insert image description here

3.2.2 The core of LSTM

insert image description here
insert image description here

3.2.3 Gradually understand LSTM

3.2.3.1 Forget Gate

insert image description here

3.2.3.2 Input gate

insert image description here
insert image description here

3.2.3.3 Output Gate

insert image description here

3.2.4 GRU, LSTM deformation

insert image description here
LSTM content reference address: https://colah.github.ioposts/2015-08-Understanding:LSTMs/

3.3 Bidirectional LSTM

The one-way RNN is based on the previous information to deduce the following, but sometimes it is not enough to only look at the previous words, and the words that need to be predicted may also be related to the following content, so a mechanism is needed at this time, so that the model can not only Those who can have memory from front to back need to remember from back to front. At this time, bidirectional LSTM can help us solve this problem
insert image description here

4. Recurrent Neural Network Realizes Sentiment Classification

Target

  1. Know how to use LSTM and GRU and the format of input and output
  2. Can apply LSTM and GRU to realize text sentiment classification

4.1 Use of LSTM and GRU modules in Pytorch

4.1.1 Introduction to LSTM

insert image description here
insert image description here

4.1.2 Example of LSTM usage

Assume that the data mining input is input, and the shape is [10,20], assuming that the shape of embedding is [100,30]
insert image description here
insert image description here
insert image description here

4.1.2 Notes on the use of LSTM and GRU

insert image description here

4.2 Use LSTM to complete text sentiment classification

In the past, we used word embedding to implement toy-level text sentiment classification, so now we add an LSTM layer to this model to observe the classification effect.
In order to achieve better results, the previous model is modified as follows

  1. MAX_LEN = 200
  2. In the process of building a dataset, the data is transformed into a 2-category problem, where pos is 1 and neg is 0, otherwise 25,000 samples are not enough to divide the data into 10 categories
  3. When instantiating LSTM, use dropout=0.5, in the process of model.eval0, dropout will automatically be 0

5. Serialization container in Pytorch

Target

  1. Know the principles and solutions of gradient disappearance and gradient explosion
  2. Can use nn.sequentia1 to complete the model building
  3. Know how to use nn.BatchNormld
  4. Know how to use nn.Dropout

5.1 Gradient disappearance and gradient explosion

Before using the serialized container in pytorch, let's take a look at the common problems of gradient disappearance and gradient sea explosion

5.1.1 Gradient disappearance

Suppose we have a minimalist neural network with four layers: each layer has only one neuron
insert image description here
insert image description here

5.1.2 Gradient Explosion

insert image description here

5.1.3 Experience in solving gradient disappearance or gradient explosion

insert image description here

5.2 nn.Sequential

insert image description here
insert image description here

5.3 nn.BatchNormld

Batch normalization is translated into Chinese as batch normalization, that is, in the process of each batch training, the parameters are normalized, so as to achieve the effect of speeding up the training speed.
Take the sigmoid activation function as an example. In the process of reverse transmission, when the value is 0, 1, the gradient is close to 0, resulting in a small update of the parameters and slow training speed. However, if the data is normalized, the data will be pulled down to the range of [0-1] as much as possible, so that the parameter update range will be larger and the training speed will be improved.
batchNorm is generally placed after the activation function, that is, the input is activated before entering batchNorm
insert image description here

5.4 nn.Dropout

insert image description here

Guess you like

Origin blog.csdn.net/weixin_45529272/article/details/128129284