Article Directory
transformer frame
Before framework "Attention is all your need" is has been do not understand, read a lot of these are detailed also stumbled. Today they handled again related knowledge, mainly involving various mechanisms of attention part Q,K,V
, looked the part relevant TensorFlow implementation code, feeling a little clear some than before.
Related Links
- Google official copy of the code models / mtf_transformer.py / _layer_stack function , there are
Self-attention
,Encoder-Decoder attention
,Local attention
,Compressed attention
several. But the core parts are packaged up, you need to installmesh-tensorflow
, view related functions
import mesh-tensorflow as mtf
# Self attention layer
y, new_k, new_v = mtf.layers.multihead_self_attention_incremental(some_argvs)
# Encoder-Decoder attention layer
y, new_k, new_v = mtf.layers.multihead_encdec_attention_incremental(some_argvs)
# Local attebtion
y, new_k, new_v = mtf.layers.masked_local_attention_1d_incremental(some_argvs)
# Compressed attention
mtf.layers.multihead_self_attention_memory_compressed(some_argvs)
- Prior to the official Google came out, many people realized their own Transformer logic, I think here there is a clearly written in the code , but also related to the bloggers were code parsing
- I also saw some of the people blog intuitive understanding of the process, this action figure the decoder process explained very clearly
- Also when reading the code, I found that there is a lot of attention, so I plan to get to know the future. There are a variety of attention in the review . Mark about ~