Attention mechanism of natural language processing

1 Introduction

        Attention! See the name knows the meaning, it is an imitation of the human attention mechanism. When we see an image or a text, we will instinctively pay attention to the more important parts. We call these things eye-catching. Computer engineers always do their best to make computers approach humans, so how to add attention mechanism to computers so that they can learn to focus on the important points? follow me!

2.Encoder-Decoder

         Because a lot of attention mechanism models are now attached to the Encoder-Decoder model, let's talk about it first. For example, I heard a beautiful woman say: "I think you look a lot like my future husband." My brain processes this information to form its own understanding. This is the encoder. Then I have to react to this sentence, which is the process of decoding. When decoding, I have to organize the language based on my understanding of what she said and what I said before. Based on my understanding of her, this is to make my answer relevant to her words; based on what I said before, it is to make my language fluent, so that what I said is a human word. At last I replied: "I'm sorry, I am the father you will never get!"

        Based on the above explanation, we summarize the principles of training. We want the model to establish such a relationship (that is, the input sentence and historical output), and the relationship between the output at the current moment. The final trained model knows what it wants to say at every moment. We let him say it faster and it will be a continuous sentence.

        This model can meet such an application, given a piece of information, output another piece of information. Then it can be applied to many aspects such as chat bots, machine translation, speech recognition, and article summary generation. In fact, artificial intelligence is the only feasible technology that can completely subvert the world. It has been sleeping for too long, and we are trying to wake up this behemoth little by little.

     

3.Attention attention mechanism

       According to the description in the preface, Attention is to add attention to this factor in the process of computer execution. The specific operation is to add the corresponding weight to the context information. As shown in the figure below, the whole process is a process of encoding and decoding, with the input encoding on the left and output decoding on the right. The only difference is that we added the step of calculating the weight when outputting and decoding.

       The calculation process of this weight is as follows: multiply the value of the hidden layer at the time of input hi and the value of the hidden layer at the time before output ht, and after the softmax function, the weight of input hi when calculating the output at time ht is obtained. Multiplying this context vector with the input of the decoding layer, the result is the output at time ht through the output layer. Connecting all the moments is the output text afterwards.

     

      Regarding the calculation of similarity in the above figure, there are two common mechanisms

               1. BahdanauAttention: Use two Ws to multiply the ht-1 layer state of the decoder and the output of the hidden layer of the encoder. The two are added through the tanh function, and then V(V can be understood as values, that is, all the outputs of the hidden layer of the decoder ) Multiply to get the score, use softmax(score) to get the weights, and then multiply with the context to get the context vector

                          

               2. Luong Attention: The processing flow is the same as the B Attention, but when calculating the score, the state at the time of the decoder ht is multiplied by W, and then multiplied by the output of the encoder hidden layer.

                                   

4. Summary

            Today we introduced the encoding and decoding modes and attention mechanism in natural language processing. The encoding and decoding mode is that the left network encodes the information, and the right network decodes the encoded information. Decoding is not a restoration of the original encoding process, but leads to another form of output. In this way, we can establish a connection between a piece of information and another piece of information, which can be used for question answering systems, voice recognition, etc. The attention mechanism is in the encoding and decoding mode, adding a weight vector to the context information, which represents the contribution value of different words in the context to the current moment prediction. In this way, our model learns to grasp the key points. Which uses cosine similarity or network learning to calculate our weight.

 5. Nonsense

           Everything here is too heavy, I think I can't carry it anymore. All kinds of annoying things, all kinds of invisible pressure. No one is by my side, in desperation, I can only talk to the universe. I asked the layers of dark clouds in the sky, what are you hiding? Is the world really as complicated as we think? Or to try to express the infinite universe with limited thoughts is inherently wrong. The greatest elites of countless generations of mankind have compiled a set of large-scale guessing games for society and played them all. I want to get out of all this and see the world in a different light. Not to make myself different, but because I know they see the world in the wrong way. The motivation for their research is to use the world, out of arrogant selfishness. But I want to start by embracing the world, because I feel the most important love for life, which seems to be the most primitive power of the universe. And those are not what I can take the initiative to ask for, but from the world's giving after opening my arms.

The live version of BIGBANG's "LOSER", the atmosphere is so good, more enjoyable than listening to CD!

 

Guess you like

Origin blog.csdn.net/gaobing1993/article/details/108533628