Table of contents
1 Multi-head attention mechanism
1 Multi-head attention mechanism
1.1 mask
ques: Why do you need a mask?
ans: If there is no mask, then you and know exist during training. As shown in the figure below, but during the test, if there is no mask, errors will occur, and the model effect is not good.
The mask is required as shown in the figure:
Remove you and know masks during training to ensure consistency.
2 interactive layer
All encoder outputs interact with each decoder.
The specific interaction is as follows:
The encoder generates K and V matrices; the decoder improves the Q matrix, that is, there are more attention mechanisms to calculate the values of K, Q, and K. The formula is as follows: