Table of contents
transform: encode, decode 12 steps
Encoder Attention is to consider context information
Data flow: Use algorithms to vectorize words, the same bet. 512 bits
8 weight matrices to prevent failure and eliminate the influence of initial weightsEdit
game is a game
With Beijing: Winter Olympics
transform: encode, decode 12 steps
The self-attention mechanism is the dismantling comparison of Transformers: generate parts V and weight K, and feedforward neural network for weight adjustment: preliminary deformation
Encoder Attention is to consider context information
Attention Mechanism: Adopt Multi-Attention Mechanism: Prevent One Person’s Mutiny from Leading to Model Failure
Data flow: Use algorithms to vectorize words, the same bet. 512 bits
Calculate and generate component description and relationship description through weight: Q, K, V
8 weight matrices to prevent failure and eliminate the influence of initial weights
Word vectorization, matrix multiplication to generate relationship description, attention weight, and final weighted sum