Architecture, parameter statistics, and return value analysis of BertModel in transformers - Code World

Architecture, parameter statistics, and return value analysis of BertModel in transformers

Mobile 2023-08-09 04:53:31 views: null

Architecture of BertModel:

Take bert-base-chinese as an example:

model = BertModel.from_pretrained("../model/bert-base-chinese")

Statistical model parameters:

# 参数量的统计
total_params = 0 # 模型总的参数量
total_learnable_params = 0 # 可学习的参数量
total_embedding_params = 0 # embeddings 层的参数量
total_encoder_params = 0 # Encoder编码器部分的参数量
total_pooler_params = 0

for name , param  in model.named_parameters():
    print(name , "->" , param.shape)
    if param.requires_grad:
        total_learnable_params += param.numel()
    if "embedding" in name :
        total_embedding_params += param.numel()
    if "encoder" in name :
        total_encoder_params += param.numel()
    if "pooler" in name :
        total_pooler_params += param.numel()
        
    total_params += param.numel()

From the above it can be seen that:

The proportion of the embedding layer is 0.16254008305735163

Encoder encoder part accounted for 0.8316849528014959

Pooler layer accounted for 0.005774964141152439

Total parameters: 102267648

Return value analysis:

The documentation on BertModel is as follows:

BERT We're on a journey to advance and democratize artificial intelligence through open source and open science. https://huggingface.co/docs/transformers/main/en/model_doc/bert#transformers.BertModel explains in detail the progress here:

last_hidden_state and pooler_output must be returned, and hidden_state is returned when the model is set output_hidden_states=True or config.output_hidden_states=True .

Let me explain here:

The length of outputs is 3:

# outputs[0] == last_hidden_state : (batch_size, sequence_length, hidden_size)

# outputs[1] == pooler_output : (batch_size, hidden_size)

# outputs[2] == hidden_state : (batch_size, sequence_length, hidden_size)

As can be seen from the figure above:

model.embeddings(input_tensor) == outputs[2][0]

Guess you like

Origin blog.csdn.net/wtl1992/article/details/132048038

Architecture, parameter statistics, and return value analysis of BertModel in transformers

Parameter and return value

Parameter passing and return value

Parameter passing and return value of the function

Class name as parameter and return value (a)

Class name as parameter and return value (at)

chatgpt interface return parameter analysis

[Original] 006 | catch SpringBoot parameter parsing the return value of the processing source code analysis car

The return value token_type_ids of AutoTokenizer in transformers (2)

No parameter Lambda expressions of no return value of exercise

SpringBoot in LocalDatetime as parameter and return value serialization problems

MVC i.e., the return value parameter

The parameter type in Java / return value type

Lambda exercise one: —— (no parameter, no return value)

SpringMVC common annotations, parameter transfer, return value

As a method for the return value of the parent class as the parent class parameter and methods

Thinking C ++ returns a result parameter and returns the result with a return value

c # calling Oracle Procedure parameter passing and return value

Function (as defined in, or absence of the return value, parameters and parameter passing problem)

String slice method only when receiving a return value of the parameter

Regular expression, taking the return data (including html) the value of the specified parameter

[C++] Copy constructor calling timing ② (Object value as function parameter | Object value as function return value)

Damai.com return ticket monitoring, Sing parameter analysis

Lecture 10: Python function definition and calling & parameter passing & return value & parameter definition & variable scope & recursive function

Configure logs in AOP mode, print uri, url, input parameter name, input parameter value, and return result.

QByteArray::data() return value analysis, the char* assignment in the structure crashes

[SpringMVC source code analysis] HandlerMethodReturnValueHandler return value processor

return value of return

ajax return value return value

-23- C zero-based courses and function return parameter value, the reference variable

Recommended

Ranking

[Algorithm] greedy _ program scheduling issues

Spring 控制反转（IOC）

Data structure-6.6 figure

Indicates that the class or member method has abstract properties

Huawei v5 server installed Linux operating system

Postgresql source code analysis - creating ordinary tables

Chapter 10 Evaluation Classification Results

Cloud service Ubuntu 20.04 version uses Nginx to deploy static web pages

Java Exercise 17.1

Solve the problem that git cannot automatically push submission in IDEA Push failed: Failed with error: Could not read from remote repository.

Daily

More

2024-05-09(32)

2024-05-08(18)

2024-05-07(34)

2024-05-06(6)

2024-05-05(0)

2024-05-04(18)

2024-05-03(8)

2024-05-02(0)

2024-05-01(4)

2024-04-30(36)