CogView overall diagram

I am a rookie who is getting started, I hope to record what I have learned like taking notes, and I also hope to help people who are also getting started.

Table of contents

1. Summary

2. Data set input (token)

三、Transformer(GPT)

四、TransformerLayer

5. Self Attention

6. MLPs


1. Summary


2. Data set input (token)

1. Generate

Generate binary data set by cogdata (generate token)

cogdata用法:GitHub - Sleepychord/cogdata: A light-weight data management system for large-scale pretraining

2、token

The text generates a text token through the SentencePiece model; the image is converted into an image token through a discretized AE (Auto-Encoder).

The function of token is to turn text and image into small blocks with independent meaning as much as possible, which is convenient for later mapping to token space (similar to the expression of word vector ABABA), and the significance for image here is even greater, because a The enlarged graph becomes multiple small blocks, which reduces the calculation load of the subsequent network.

3、data and label

Because it is predicted from left to right, for example, the first token input gets the predicted second token and then compares it with the second token of the target to get the loss (so labels are one bit later than tokens)

The following code is in pretrain_gpt2.py (the labels are the comparison target; tokens are the input token data)

def get_batch(data_iterator, args, timers):#获取该batch的数据
    # Items and their type.
    keys = ['text', 'loss_mask']
    datatype = torch.int64

    # Broadcast data.
    timers('data loader').start()
    if data_iterator is not None:
        data = next(data_iterator)
    else:
        data = None
    timers('data loader').stop()

    data_b = mpu.broadcast_data(keys, data, datatype)
    # Unpack.解压数据
    tokens_ = data_b['text'].long()
    loss_mask = data_b['loss_mask'].float()#这个loss mask应该是服务于继续训练的那种吧(如果一开始训练应该为None)
    labels = tokens_[:, 1:].contiguous()#目标
    loss_mask = loss_mask[:, 1:].contiguous()
    tokens = tokens_[:, :-1].contiguous()#输入token
    #因为是从左到右预测,比如说第一个token输入得到预测的第二个token再与目标的第二个token进行比对得到loss(所以labels比tokens延后一位)
    attention_mask = None

    # Get the masks and postition ids.获得位置编码,attention mask 和 loss mask
    attention_mask, loss_mask, position_ids = get_masks_and_position_ids(
        tokens,
        loss_mask=loss_mask,
        attention_mask=attention_mask,
        args=args
        )
    # Convert转为半精度
    if args.fp16:
        attention_mask = attention_mask.half()

    return tokens, labels, loss_mask, attention_mask, position_ids

三、Transformer(GPT)

For details, code analysis, etc., see the overall construction of the network structure in CogView - Programmer Sought

​​​​​​Transformer (GPT) on the summary

Word embeddings are converted into word vectors; Transformer is the main network structure: predict tokens from left to right (the previous token pushes out the next token), consisting of multiple Transformer blocks (layers).

 The Transformer in this figure can be subdivided as follows (not sparse processing)

 For details, code analysis, etc., see Transformer_ttya's blog in CogView - CSDN Blog

 Among them, the single-layer TransformerLayer: see four


四、TransformerLayer

For details, code analysis, etc., see the single-layer TransformerLayer_ttya blog in CogView-CSDN Blog

The residual structure is used twice (to facilitate the realization of very high-complexity models, and it helps to solve the problem of gradient disappearance and gradient explosion), and LayerNorm is also used for stability maintenance.

Among them, Self Attention: see five

Among the MLPs: see six


5. Self Attention

For details, code analysis, etc., see Self Attention_ttya's blog in CogView - CSDN Blog

Make online learning focus on an important information point

The attention mask added here is for prediction from left to right (lower triangle)


6. MLPs

For details, code analysis, etc., see MLP_ttya's blog in CogView - CSDN Blog


 Everyone is welcome to criticize and correct in the comment area, thank you~

Guess you like

Origin blog.csdn.net/weixin_55073640/article/details/126608401