Study Notes -Transformer the attention mechanism

Article Directory

transformer frame

Before framework "Attention is all your need" is has been do not understand, read a lot of these are detailed also stumbled. Today they handled again related knowledge, mainly involving various mechanisms of attention part Q,K,V, looked the part relevant TensorFlow implementation code, feeling a little clear some than before.

Here Insert Picture Description

Related Links

  • Google official copy of the code models / mtf_transformer.py / _layer_stack function , there are Self-attention, Encoder-Decoder attention, Local attention, Compressed attentionseveral. But the core parts are packaged up, you need to install mesh-tensorflow, view related functions
import mesh-tensorflow as mtf

# Self attention layer
y, new_k, new_v = mtf.layers.multihead_self_attention_incremental(some_argvs)

# Encoder-Decoder attention layer
y, new_k, new_v =  mtf.layers.multihead_encdec_attention_incremental(some_argvs)

# Local attebtion          
y, new_k, new_v = mtf.layers.masked_local_attention_1d_incremental(some_argvs)

# Compressed attention
mtf.layers.multihead_self_attention_memory_compressed(some_argvs)
  • Prior to the official Google came out, many people realized their own Transformer logic, I think here there is a clearly written in the code , but also related to the bloggers were code parsing
  • I also saw some of the people blog intuitive understanding of the process, this action figure the decoder process explained very clearly
  • Also when reading the code, I found that there is a lot of attention, so I plan to get to know the future. There are a variety of attention in the review . Mark about ~
Published 120 original articles · won praise 35 · views 170 000 +

Guess you like

Origin blog.csdn.net/u012328476/article/details/104637423