transformers里的BertModel之架构、参数统计、返回值分析

 BertModel的架构:

以bert-base-chinese为例:

model = BertModel.from_pretrained("../model/bert-base-chinese")

统计模型参数:

# 参数量的统计
total_params = 0 # 模型总的参数量
total_learnable_params = 0 # 可学习的参数量
total_embedding_params = 0 # embeddings 层的参数量
total_encoder_params = 0 # Encoder编码器部分的参数量
total_pooler_params = 0

for name , param  in model.named_parameters():
    print(name , "->" , param.shape)
    if param.requires_grad:
        total_learnable_params += param.numel()
    if "embedding" in name :
        total_embedding_params += param.numel()
    if "encoder" in name :
        total_encoder_params += param.numel()
    if "pooler" in name :
        total_pooler_params += param.numel()
        
    total_params += param.numel()

从上面可以看出:

embedding层占比 0.16254008305735163

Encoder编码器部分占比 0.8316849528014959

pooler层占比 0.005774964141152439

总共的参数:102267648

返回值分析:

关于BertModel的文档如下:

BERTWe’re on a journey to advance and democratize artificial intelligence through open source and open science.icon-default.png?t=N6B9https://huggingface.co/docs/transformers/main/en/model_doc/bert#transformers.BertModel这里进步详细解释一下:

 last_hidden_state 、 pooler_output 必定会返回的,而hidden_state是在model设置output_hidden_states=True 或者config.output_hidden_states=True的时候进行返回的。

 这里说明一下:

outputs的长度为3:

# outputs[0] == last_hidden_state : (batch_size, sequence_length, hidden_size)

# outputs[1] == pooler_output : (batch_size, hidden_size)

# outputs[2] == hidden_state : (batch_size, sequence_length, hidden_size)

 从上图中可以看出:

model.embeddings(input_tensor) == outputs[2][0]

猜你喜欢

转载自blog.csdn.net/wtl1992/article/details/132048038