onnx file and its structure, correct export of onnx, onnx reading, onnx creation, onnx modification, onnx parser

1. Onnx basic concept

1. ONNX files are actually binary files stored after serialization by Protobuf (Protobuf is a portable and efficient structured data storage format). Protobuf syntax is similar to json.
2. protoc (compiled and generated binary files), as the compiler provided by Protobuf, compiles onnx-ml.proto (the .proto file is used to define the data structure of the data, generally found in the environment package path of the system), Get onnx-ml.pb.h and onnx-ml.pb.cc or onnx_ml_pb2.py
3, then use onnx-ml.pb.cc and code to operate the onnx model file, such as onnx.load(), to achieve addition, deletion and modification
4 , Specifically: the onnx-ml.proto file describes how the onnx file is composed and what structure it has. It is often referred to in the operation of onnx, but this .proto file (the protobuf syntax used in it) is just an intermediate representation Files do not have any capabilities, that is, they are not oriented to storage and transmission (serialization, deserialization, reading and writing). So we need to use protoc to compile the .proto file, which is the implementation of compiling the proto file into different languages, and get the two interface files of cc and py files. The cc file can be called in the c++ language, and the compilation can be called in python to generate This py file is used to read and write the data defined in the .proto file, so that data in different languages ​​can interact with data in protobuf format!
protobuf is a lightweight and efficient structured data storage format that can be used for serialization (or serialization) of structured data. It is very suitable as a data storage or exchange format. For onnx, the binary file of onnx The saving format is protobuf
. By handing the onnx-ml.proto file to protoc, protoc is the compiler program provided by Protobuf (protoc --help to view the specific usage of the protoc tool), put itCompiled into corresponding cc and py files, as interface files , the functions of these two compiled files are to deserialize binary files (onnx files) into c++ classes (structured data) or python classes (structured data), so that after it becomes structured data, we can easily add, delete, and modify it! . So you can directly operate onnx (referring to calling various make functions in onnx.load and onnx.helper, these functions should call the generated cc or py interface files) to realize various Additions and deletions.
Insert image description here
Proto file of ONNX: https://github.com/onnx/onnx/blob/main/onnx/onnx-ml.proto

Refer :
Python operates protobuf. Common usages.
Use Protobuf in C++.

2. onnx-ml.proto

Open the onnx-ml.proto file. The content inside is as shown below. So how do we read this file? Don't be intimidated by this thing. The NodeProto in the figure below indicates that there is a node type called node in onnx. There is input in NodeProto, which means that the node has an input attribute. It is repeated, that is, a repeated type, and it is an array. Repeated means an array. It has an output attribute, which is repeated, that is, a repeated type, and it is also an array. He has a name attribute of type string

For repeated it is an array, for optional ignore him. For input = 1, the following number is id. In fact, each attribute has a specific id, and these ids cannot conflict, so we don't need to care about how the id comes from. We only care if it is an array and what type it is. For example, the input in the first line in the figure below is an array type, we need to use an index to get a value in the array, and the elements stored in the array are of the string type.

Insert image description here

3. ONNX structure

The opset_import of ONNX indicates the operator library version. Opening the ONNX model, we found this structure inside. Below ONNX is a model, a graph will be hung under the model, and node, initializer, input, and output nodes will be hung under the graph.

  • A node represents an operator, such as conv, relu, linear, etc., which are all stored in the node. Each node also has input and output . The input and output here represent the input and output of the node. Don't confuse this input and output with the following. The following represents the input and output of the entire model! There are also name and op_type in node , name represents the name of the node, which can be taken at will. But op_type corresponds to the operator type and cannot be changed casually. What is stored in the attribute of node is the padding, kernel shape, stride, etc. of the operator.
  • The initializer node stores weight, that is, tensors such as the weight and bias of the model. For example, the weight and bias of the conv operator are stored in the initializer.
  • input is to mark which nodes of this onnx are input, that is, what is the input, there will be a corresponding name, shape, and data type, which represent the input of the entire network!
  • Output is used to describe the output of this onnx, which represents the output of the entire network!
  • Special: Constant belongs to model.graph.node, op_type: Constant, so Constant is regarded as an operator , for example, for anchorgrid (its shape is (bs, N, H, W, 2), N Refers to anchors with N sizes, 2 in the fourth dimension is wh, specifically, each pixel of each output feature map will correspond to N anchor boxes) constant tensor data , usually stored in model.graph.node , op_type is Constant. This type of node will not be displayed when visualized in netron.

model : Indicates the entire onnx model, including graph structure and parser format, opset version, and export program type. The model here refers to model=onnx.load("demo.onnx").
model.graph : Indicates the graph structure, usually the main structure we see in netron
model.graph.node : Indicates all the nodes in the graph, arrays, such as conv, bn and other nodes are here, and the nodes are represented by input and output The connection relationship between
model.graph.initializer : Most of the data of the weight class is stored here
model.graph.input : The input of the entire model is stored here, indicating which node is the input node and what the shape is
model.graph.output : The entire model The output is stored here, indicating which node is the output node and what shape it is.

Insert image description here

4. How to correctly export onnx

  1. For any parameters that use the return value of shape and size, avoid using the return value of tensor.size directly, for example : tensor.view(tensor.size(0), -1) This operation uses .size. We should add int to cast items with xx.size or xx.shape, tensor.view(int( tensor.size(0) ), -1), so that we can disconnect the tensor of xx Track the trace, otherwise, due to the existence of the trace, the .size operation and the .shape operation will bring out many nodes, such as the shape node (actually corresponding to the .shape in the code), the gather node , the unsqueeze node, etc. These extra nodes are unnecessary More nodes will bring more extra calculations and consume more memory space, and these extra nodes may not be well supported . What is disconnection tracking? Disconnecting tracking means turning the intermediate results that depend on the input into constants (tensors are also constants) !
  2. For the nn.Upsample or nn.functional.interpolate functions, use scale_factor to specify the multiplier instead of the size parameter to specify the size .
  3. For the input, when reshape and view operations, please put the specification of -1 in the batch dimension . Other dimensions can be calculated. The batch dimension is prohibited from being specified as an explicit number greater than -1 . This is because in onnx, -1 means automatic calculation, and the size of the batch dimension is uncertain, so -1 should be used, which makes sense for the batch dimension, because during training and inference, the batch dimension is usually Indeterminate, while other dimensions are fixed.
  4. When torch.onnx.export specifies the dynamic_axes parameter, only the batch dimension is specified, and other dimensions are prohibited from being dynamic. This prohibition means that it is best not to.
  5. Use opset_version=11, of course, as long as it is not lower than 11, the problem is not big.
  6. Avoid using the inplace operation, for example y[…, 0:2] = y[…, 0:2] * 2 - 0.5, the inplace operation will directly make changes on the old memory, and will not open up a new memory for Storage of operation results. Because the inplace operation will bring a scatter operator , which is not supported, and will also bring many nodes.
  7. There are 5 dimensions as few as possible, such as ShuffleNet Module, you can consider merging wh to avoid 5 dimensions.
  8. Try to implement the post-processing part in the onnx model to reduce the post-processing complexity. Therefore, when exporting the onnx model, we usually rewrite a simplified version of mypredict.py file by ourselves, and extract the pre-processing, model reasoning, and post-processing so that the reasoning code looks very concise, which is convenient for us to convert Put the post-processing into the model and export onnx. Then when the onnx is successfully exported, the simplified code of mypredict.py lets us understand the entire reasoning process, and then the next step is to use tensorRT to compile onnx into an engine, and then perform reasoning on C++. The reasoning here refers to: Pre-processing + model inference (model inference means that the generated engine is handed over to tensorRT for forward propagation) + post-processing, so pre-processing and post-processing have nothing to do with tensorRT. At this time, the C++ pre- and post-processing code can be directly copied from the python version of mypredict.py and written again, which will be much more convenient . For example, in the post-processing of yolov5, you need to use the anchor to do some multiplication and addition operations. If we do it separately in the post-processing, you will find that you have to prepare a model and store the anchor of this model specially. information, the complexity of the code will be very high, and the post-processing logic will be very troublesome. Therefore, the post-processing logic should be placed in the model as much as possible, so that its tensor can be easily implemented through decoding. Then the post-processing performance you do may not be high enough. If you put it in onnx, tensorRT can help you speed it up. Many times our onnx has been exported. At this time, I also want to add onnx post-processing. What should I do at this time? There are two ways, one is to directly use the onnx package to operate the onnx file, adding some nodes is no problem, but the degree of difficulty is relatively high. The second method is to use pytorch to implement the post-processing logic code, export this post-processing to an onnx, and then merge this onnx into the original onnx. This is actually an approach we have specially customized for complex tasks. .
  9. By mastering these, you can guarantee the smoothness of the following situations.

The necessity of these methods is reflected in simplifying the complexity of the process and removing some unnecessary nodes, such as gather and shape nodes. The removal of these nodes is more adaptable, because after all, everyone does not support such things. good. Many times, it seems that it is possible to not change parts like this, but after the requirements are complicated, various problems still exist. According to the modification as said, it will basically work. After doing this, there is no need to use onnx-simplife

An example of using pytorch.onnx.export uses a dummy_input as the input of the model. Through the variable name, it can be found that this is a false input, so the value of dummy_input can be set at will, as long as the shape is satisfied: Detailed explanation of pytorch.onnx.export method
parameters

import torch
import torch.nn as nn
import torch.nn.functional as F
import torch.onnx
import os

class Model(torch.nn.Module):
    def __init__(self):
        super().__init__()

        self.conv = nn.Conv2d(1, 1, 3, padding=1)
        self.relu = nn.ReLU()
        self.conv.weight.data.fill_(1)
        self.conv.bias.data.fill_(0)
    
    def forward(self, x):
        x = self.conv(x)
        x = self.relu(x)
        return x

# 这个包对应opset11的导出代码,如果想修改导出的细节,可以在这里修改代码
# import torch.onnx.symbolic_opset11
print("对应opset文件夹代码在这里:", os.path.dirname(torch.onnx.__file__))

model = Model()

#dummy如果改成torch.zeros(8, 1, 3, 3),对生成的onnx图是没有影响的
dummy = torch.zeros(1, 1, 3, 3)

#生成的onnx图的conv算子的bias为1,这是由输出通道数决定的,因为输出通道为1
torch.onnx.export(
    model, 

    # 这里的args,是指输入给model的参数,需要传递tuple,因此用括号
    (dummy,), 

    # 储存的文件路径
    "demo_xx.onnx",  

    # 打印详细信息
    verbose=True, 

    # 为输入和输出节点指定名称,方便后面查看或者操作
    input_names=["image"], 
    output_names=["output"], 

    # 这里的opset,指,各类算子以何种方式导出,对应于symbolic_opset11
    opset_version=11, 

    # 表示他有batch、height、width3个维度是动态的,在onnx中给其赋值为-1
    # 通常,我们只设置batch为动态,其他的避免动态
    dynamic_axes={
    
    
        "image": {
    
    0: "batch", 2: "height", 3: "width"},
        "output": {
    
    0: "batch", 2: "height", 3: "width"},
    }
)

print("Done.!")

onnx read

import onnx
import onnx.helper as helper
import numpy as np

model = onnx.load("demo.onnx")

#打印信息
print("==============node信息")
# print(helper.printable_graph(model.graph))
print(model)

conv_weight = model.graph.initializer[0]
conv_bias = model.graph.initializer[1]

#initializer里有dims这个属性是可以通过打印model看到的
#打印输出的是1,dims在onnx-ml.proto文件中是repeated类型的,即数组类型,所以要用索引去取!
print(conv_weight.dims[0])
#取node节点的第一个元素
print("xxxxx",model.graph.node[0])

# 数据是以protobuf的格式存储的,因此当中的数值会以bytes的类型保存,通过np.frombuffer方法还原成类型为float32的ndarray
print(f"===================={
      
      conv_weight.name}==========================")
print(conv_weight.name, np.frombuffer(conv_weight.raw_data, dtype=np.float32))

print(f"===================={
      
      conv_bias.name}==========================")
print(conv_bias.name, np.frombuffer(conv_bias.raw_data, dtype=np.float32))

Five.onnx creation (in fact, it is equivalent to filling in the value)

import onnx # pip install onnx>=1.10.2
import onnx.helper as helper
import numpy as np

# https://github.com/onnx/onnx/blob/v1.2.1/onnx/onnx-ml.proto

nodes = [
    helper.make_node(
        name="Conv_0",   # 节点名字,不要和op_type搞混了
        op_type="Conv",  # 节点的算子类型, 比如'Conv'、'Relu'、'Add'这类,详细可以参考onnx给出的算子列表
        inputs=["image", "conv.weight", "conv.bias"],  # 各个输入的名字,结点的输入包含:输入和算子的权重。必有输入X和权重W,偏置B可以作为可选。
        outputs=["3"],  
        pads=[1, 1, 1, 1], # 其他字符串为节点的属性,attributes在官网被明确的给出了,标注了default的属性具备默认值。
        group=1,
        dilations=[1, 1],
        kernel_shape=[3, 3],
        strides=[1, 1]
    ),
    helper.make_node(
        name="ReLU_1",
        op_type="Relu",
        inputs=["3"],
        outputs=["output"]
    )
]

initializer = [
    helper.make_tensor(
        name="conv.weight",
        data_type=helper.TensorProto.DataType.FLOAT,
        dims=[1, 1, 3, 3],
        vals=np.array([1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0], dtype=np.float32).tobytes(),
        raw=True
    ),
    helper.make_tensor(
        name="conv.bias",
        data_type=helper.TensorProto.DataType.FLOAT,
        dims=[1],
        vals=np.array([0.0], dtype=np.float32).tobytes(),
        raw=True
    )
]

inputs = [
    helper.make_value_info(
        name="image",
        type_proto=helper.make_tensor_type_proto(
            elem_type=helper.TensorProto.DataType.FLOAT,
            shape=["batch", 1, 3, 3]
        )
    )
]

outputs = [
    helper.make_value_info(
        name="output",
        type_proto=helper.make_tensor_type_proto(
            elem_type=helper.TensorProto.DataType.FLOAT,
            shape=["batch", 1, 3, 3]
        )
    )
]

graph = helper.make_graph(
    name="mymodel",
    inputs=inputs,
    outputs=outputs,
    nodes=nodes,
    initializer=initializer
)

# 如果名字不是ai.onnx,netron解析就不是太一样了
opset = [
    helper.make_operatorsetid("ai.onnx", 11)
]

# producer主要是保持和pytorch一致
model = helper.make_model(graph, opset_imports=opset, producer_name="pytorch", producer_version="1.9")
onnx.save_model(model, "my.onnx")

print(model)
print("Done.!")

6.onnx modification (add preprocessing to onnx)

import onnx
import onnx.helper as helper
import numpy as np

model = onnx.load("demo.onnx")

# 可以取出权重
conv_weight = model.graph.initializer[0]
conv_bias = model.graph.initializer[1]
# 修改权重
conv_weight.raw_data = np.arange(9, dtype=np.float32).tobytes()

#自己创建新的节点
newitem = helper.make_node(...)#省略具体细节,用的时候直接查这个函数用法

#把旧的item节点替换成新的newitem节点,省略了item的来源,但只要知道item的类型是node就行了
#主要是为了强调替换节点是用CopyFrom这个函数,它是protobuf的一个函数,具体细节可以查下protobuf的官方文档,在message里
item.CopyFrom(newitem)

#删除节点,用remove函数
#注意的是删除节点后,要把原节点的输入节点和原节点的输出节点接上去
model.graph.node.remove(newitem)

#将onnx的动态batch改为静态
input = model.graph.input[0]
print(type(input))#<class 'onnx.onnx_ml_pb2.ValueInfoProto'>

#因为发现input的类型是ValueInfoProto,所以我们用make_tensor_value_info函数来构建一个同样类型的东西,但唯一不同的就是用的是静态batch
new_input = helper.make_tensor_value_info(input.name,1,[1,3,640,640])#1表示类型,用的是枚举类型,具体对应哪个可以查看1的type,然后去.proto文件里找。[1,3,640,640]是shape,因为第一维度不再是字符串,所以是静态batch
#用新节点替换掉旧节点就实现了把动态batch改为静态的操作
model.graph.input[0].CopyFrom(new_input)




#把预处理作为新节点添加进onnx文件
import torch
class Preprocess(torch.nn.Module):
	def __init__(self):
		super().__init__()
		#self.mean和self.std写在init里是为了让它变成一个确定的东西。如果放到forward函数里,导onnx时就会生成出多的节点
		self.mean = torch.rand(1,1,1,3)
		self.std = torch.rand(1,1,1,3)
	def forward(self,x):
		#输入: x = B × H × W × C   Uint8
		#输出: y = B × C × H × W   Float32 减去均值除以标准差
		x = x.float()#转换成float32,不然默认会变成float64,float64就太慢了
		x = (x/255.0 - mean) / std
		x = x.permute(0,3,1,2)
		return x
pre = Preprocess()
#(torch.zeros(1,640,640,3,dtype=torch.uint8),)输入用元组表示,这里表示只有一个输入
torch.onnx.export(
	pre,(torch.zeros(1,640,640,3,dtype=torch.uint8),),"preprocess.onnx"
	)
#在拿到preprocess.onnx文件后,就把它读进来,和原来的onnx对接起来,这样就完成了把预处理加到原来的onnx里了
pre_onnx = onnx.load("preprocess.onnx")
#接下来把预处理作为新节点添加进原onnx文件的具体步骤
#1. 先把pre_onnx的所有节点以及输入输出名称都加上前缀,因为可能和需要合并的onnx文件造成名称冲突
#2. 先把原onnx中的image为输入的节点,修改为pre_onnx的输出节点
#3. 把pre_onnx的node全部放到原onnx文件的node中,这是因为pre_onnx.onnx只用到了node,所以只需要把node加到原onnx文件中的node里去,但如果pre_onnx.onnx里除了node还有initializer,这时也得把initializer加到原onnx文件中的node里去。
#4. 把pre_onnx的输入名称作为原onnx文件的input名称
#第一步
for n in pre_onnx.graph.node:
	n.name = f"pre/{
      
      n.name}" #加前缀,输出类似为pre/Div_4
	for i in range(len(n.input)
		n.input[i] = f"pre/{
      
      n.input[i]}"
	for i in range(len(n.output)
		n.output[i] = f"pre/{
      
      n.output[i]}"	

#第二步
for n in model.graph.node:
	if n.name == "Conv_0":
		n.input[0] = "pre/" + pre_onnx.graph.output[0].name
#第三步
for n in pre_onnx.graph.node:
	model.graph.node.append(n)
#第四步
input_name = "pre/" + pre_onnx.graph.input[0].name
model.graph.input[0].CopyFrom(pre_onnx.graph.input[0])
model.graph.input[0].name = input_name	
	
		
	




# 修改权重后储存成新的onnx,不然不生效
onnx.save_model(model, "demo.change.onnx")
print("Done.!")

Seven. ONNX parser

The purpose of the onnx parser is to generate a model that tensorRT can understand by parsing the onnx file (that is, converted into a model defined by tensorRT's C++ interface).

There are two options for the onnx parser, libnvonnxparser.so or https://github.com/onnx/onnx-tensorrt (source code) . The purpose of using the source code is to better customize the package, simplify the process of plug-in development or model compilation, be more customized, and debug when encountering problems.

Guess you like

Origin blog.csdn.net/Rolandxxx/article/details/127713806