https://zhuanlan.zhihu.com/p/49271699
Bert significant progress in recent years, NLP master of now most of the NLP tasks can use a similar two-stage model directly to enhance the effect of
Transformer is Google doing machine translation task in 17 "Attention is all you need" a paper presented, caused considerable repercussions, many studies have demonstrated Transformer ability to extract features is far stronger than LSTM of .
Transformer in the future will gradually become the mainstream alternative out RNN NLP tools, RNN has been caught between its parallel computing capabilities, because the sequence dependence structure itself caused.
CNN has not formed the mainstream in the NLP, the biggest advantage is the ease of doing CNN parallel computing, so fast, but especially long-distance relationship in the sequence capture feature NLP terms of natural flawed
https://zhuanlan.zhihu.com/p/37601161 depth study of the attention model
https://jalammar.github.io/illustrated-transformer/ the Transformer Data