Seq2Seq - - attention mechanism

Encoder

Common deep learning models include CNN, RNN, LSTM, AE, etc., but they can all be classified into a general framework - Encoder-Decoder.

 Attention

definition:

In simple terms: a vector of weights representing importance.

The attention vector Q calculates the similarity with other elements K, and the similarity is then weighted and summed with the element set V to obtain the weight vector

Q、K、V ----- softmax(Q,K)*V

Self-Attention

In fact, the Attention mechanism sounds tall, and the key is to learn a weight distribution and then act on the features .

  • This weight can retain all components, called weighting (Soft Attention), or select some components (Hard Attention) according to a certain sampling strategy.
  • This weight can be applied to the original image, such as target object detection; it can also be applied to the feature map, such as Image-Caption
  • This weight can act on the spatial scale or on the Channel scale, weighting the features of different channels
  • This weight can act on different moments, such as machine translation

Guess you like

Origin blog.csdn.net/wanghan0526/article/details/131049973