Transformer——Decoder

Table of contents

1 Multi-head attention mechanism

1.1 mask

 2 interactive layer

1 Multi-head attention mechanism

1.1 mask

        ques: Why do you need a mask?

        ans: If there is no mask, then you and know exist during training. As shown in the figure below, but during the test, if there is no mask, errors will occur, and the model effect is not good.

The mask is required as shown in the figure:

Remove you and know masks during training to ensure consistency.

 2 interactive layer

All encoder outputs interact with each decoder.

 The specific interaction is as follows:

The encoder generates K and V matrices; the decoder improves the Q matrix, that is, there are more attention mechanisms to calculate the values ​​​​of K, Q, and K. The formula is as follows:

Guess you like

Origin blog.csdn.net/maggieyiyi/article/details/126991415