Table of contents
The underlying implementation of ONNX
Output the value of the ONNX intermediate node
The tutorial series on getting started with model deployment continues to be updated. In the first two tutorials, we learned how to convert PyTorch models to ONNX models , and learned how to customize operators for PyTorch or ONNX when the expressive capabilities of native operators are insufficient . For a long time, we have exported the ONNX model through PyTorch, and basically have not explored the construction knowledge of the ONNX model alone.
I don’t know if you will have some questions: what format is the ONNX model stored in the bottom layer? How to construct an ONNX model without relying on the deep learning framework and only using the ONNX API? If there is no source code, only an ONNX model, how to debug this model? Don't worry, today we will announce them one by one for you.
In this tutorial, we will focus on ONNX, a set of neural network definition standards, and explore the construction, reading, sub-model extraction, and debugging of ONNX models. First, we will learn the underlying representation of ONNX. Afterwards, we will use ONNX API to construct and read the model. Finally, we'll learn how to debug ONNX models by taking advantage of the submodel extraction capabilities provided by ONNX.
The underlying implementation of ONNX
ONNX storage format
ONNX is defined by Protobuf at the bottom layer . Protobuf, the full name of Protocol Buffer, is a set of mechanisms proposed by Google to represent and serialize data. When using Protobuf, users need to write a data definition file first, and then store the data into a binary file according to this definition file. It can be said that the data definition file is the data class, and the binary file is the instance of the data class.
Here is an example of a Protobuf data definition file:
message Person {
required string name = 1;
required int32 id = 2;
optional string email = 3;
}
This definition means that in Person
this data type, these two fields must be included, and the fields are optionally name
included . According to this definition file, the user can choose a programming language, define a class containing member variables , , and store an instance of this class as a binary file with Protobuf; otherwise, the user can also use the binary file and the corresponding Data definition file, read out an instance of a class. For ONNX, Protobuf's data definition files are in its open source library . These files define the data type specifications of models, nodes, and tensors in the neural network; and binary files are the ".onnx" files we are familiar with. Each onnx file follows the The data definition specification stores all relevant data of a neural network. It is quite troublesome to generate ONNX model directly with Protobuf. Fortunately, ONNX provides many practical APIs, we can construct and read ONNX models without knowing Protobuf at all.id
email
name
id
email
Person
Person
Structure definition of ONNX
Before using the API to operate the ONNX model, we need to understand the structure definition rules of ONNX and learn how ONNX describes a neural network in the Protobuf definition file.
Recall that a neural network is essentially a computational graph. The nodes of the calculation graph are operators, and the edges are the tensors involved in the operation. By visualizing the ONNX model, we know that ONNX records the attribute information of all operator nodes, and stores the tensor information involved in the operation in the input and output information of the operator nodes. In fact, the structure of the ONNX model can be roughly represented by a class diagram as follows:
As shown in the figure, an ONNX model can be ModelProto
represented by classes. ModelProto
It contains log information such as the version and creator, and also contains the structure of the storage calculation graph graph
. GraphProto
The class consists of input tensor information, output tensor information, and node information. Tensor information ValueInfoProto
class includes tensor name, basic data type, shape. The node information NodeProto
class includes the operator name, operator input tensor name, and operator output tensor name.
Let's look at a concrete example. If we have a described output=a*x+b
ONNX model model
, we print(model)
can output the following:
ir_version: 8
graph {
node {
input: "a"
input: "x"
output: "c"
op_type: "Mul"
}
node {
input: "c"
input: "b"
output: "output"
op_type: "Add"
}
name: "linear_func"
input {
name: "a"
type {
tensor_type {
elem_type: 1
shape {
dim {dim_value: 10}
dim {dim_value: 10}
}
}
}
}
input {
name: "x"
type {
tensor_type {
elem_type: 1
shape {
dim {dim_value: 10}
dim {dim_value: 10}
}
}
}
}
input {
name: "b"
type {
tensor_type {
elem_type: 1
shape {
dim {dim_value: 10}
dim {dim_value: 10}
}
}
}
}
output {
name: "output"
type {
tensor_type {
elem_type: 1
shape {
dim { dim_value: 10}
dim { dim_value: 10}
}
}
}
}
}
opset_import {version: 15}
Corresponding to the class diagram above, the information of this model consists of global information such as ir_version
, and graph information. Instead, it contains a multiply node, an add node, three input tensors , and an output tensor . In the next section, we will use the API to construct this model and output this result.opset_import
graph
graph
a, x, b
output
Read and write ONNX models
Construct ONNX model
In the previous section, we know that the ONNX model is organized in the following structure:
- ModelProto
- GraphProto
- NodeProto
- ValueInfoProto
- GraphProto
Now, let's put aside PyTorch and try to construct an output=a*x+b
ONNX model describing a linear function entirely with ONNX's Python API. We will construct this model from the bottom up based on the structure above.
First, we can construct an object helper.make_tensor_value_info
describing tensor information with ValueInfoProto
As shown in the previous class diagram, we need to pass in three pieces of information: the name of the tensor, the basic data type of the tensor, and the shape of the tensor. In ONNX, whether it is an input tensor or an output tensor, their representation is the same. So here we construct objects for three inputs a, x, b
and one output in a similar fashion . As shown in the code below:output
ValueInfoProto
import onnx
from onnx import helper
from onnx import TensorProto
a = helper.make_tensor_value_info('a', TensorProto.FLOAT, [10, 10])
x = helper.make_tensor_value_info('x', TensorProto.FLOAT, [10, 10])
b = helper.make_tensor_value_info('b', TensorProto.FLOAT, [10, 10])
output = helper.make_tensor_value_info('output', TensorProto.FLOAT, [10, 10])
Afterwards, we need to construct operator node information NodeProto
, which can be helper.make_node
achieved by passing in three pieces of information: operator type, input operator name, and output operator name. Here we first construct the described c=a*x
multiplication node, and then construct output=c+b
the addition node. As shown in the code below:
mul = helper.make_node('Mul', ['a', 'x'], ['c'])
add = helper.make_node('Add', ['c', 'b'], ['output'])
In computers, graphs are generally represented by a node set and an edge set. On the other hand, ONNX cleverly saves the edge information in the node information, eliminating the need to save the edge set. In ONNX, if the input name of a node is the same as the output name of a previous node, the two nodes are connected by default. As shown in the above example: Mul
the node defines the output c
, and Add
the node defines the input c
, then Mul
the node and Add
the node are connected.
It is precisely because of the implicit definition rules of such edges that ONNX has certain requirements for the input of nodes: the input of a node is either the input of the entire model or the output of a previous node. If we a, x, b
take an input node out of the calculation graph (this operation will be introduced in the code later), or change the output Mul
of , the final ONNX model does not meet the standard.c
d
An ONNX model that does not meet the criteria may not be correctly recognized by the inference engine. ONNX provides an API
onnx.checker.check_model
to determine whether an ONNX model meets the criteria.
Next, we helper.make_graph
use to construct the computational graph GraphProto
. helper.make_graph
The function needs to pass in four parameters: node, graph name, input tensor information, and output tensor information. As shown in the following code, we can pass in the previously constructed NodeProto
objects and ValueInfoProto
objects in order.
graph = helper.make_graph([mul, add], 'linear_func', [a, x, b], [output])
The node parameter here make_graph
has a requirement: the nodes of the computation graph must be given in topological order.
Topological order is a mathematical concept related to directed graphs. If all nodes are traversed in topological order, it can be guaranteed that the input of each node can be found in the output of the previous node (for the ONNX model, we also regard the input tensor of the calculation graph as the "previous output").
It doesn’t matter if you are not familiar with this concept. Let’s take the calculation graph just constructed as the research object, and use the two examples shown in the figure below to intuitively understand the topological order.
Here we only focus Mul
on Add
the sum nodes and the edges between them c
. In case 1: if our nodes are [Mul, Add]
given in order, then when traversing Add
, its input c
can be found in the previous Mul
output. However, as shown in case 2: if our nodes [Add, Mul]
are given in the order of , then Add
no input edge can be found and the computation graph cannot be constructed successfully. Here [Mul, Add]
is the topological order of the directed graph, but [Add, Mul]
not satisfied.
Finally, we encapsulate helper.make_model
the calculation graph GraphProto
into the model ModelProto
, and an ONNX model is constructed. make_model
The function can also add information such as the model maker and version. For the sake of simplicity, we did not add additional information. As shown in the code below:
model = helper.make_model(graph)
After constructing the model, we use the following three lines of code to check the correctness of the model, output the model in text form, and store it in a ".onnx" file. onnx.checker.check_model
It is necessary to check whether the model meets the ONNX standard here , because ONNX allows us to use onnx.save
the storage model regardless of whether the model meets the standard or not. We certainly don't want to generate a model that doesn't meet the criteria.
onnx.checker.check_model(model)
print(model)
onnx.save(model, 'linear_func.onnx')
On successful execution of this code, the program will output information about the model in text format, which should be the same as the output we showed in the previous section.
To sort it out, the code to construct a model with ONNX Python API is as follows:
import onnx
from onnx import helper
from onnx import TensorProto
# input and output
a = helper.make_tensor_value_info('a', TensorProto.FLOAT, [10, 10])
x = helper.make_tensor_value_info('x', TensorProto.FLOAT, [10, 10])
b = helper.make_tensor_value_info('b', TensorProto.FLOAT, [10, 10])
output = helper.make_tensor_value_info('output', TensorProto.FLOAT, [10, 10])
# Mul
mul = helper.make_node('Mul', ['a', 'x'], ['c'])
# Add
add = helper.make_node('Add', ['c', 'b'], ['output'])
# graph and model
graph = helper.make_graph([mul, add], 'linear_func', [a, x, b], [output])
model = helper.make_model(graph)
# save model
onnx.checker.check_model(model)
print(model)
onnx.save(model, 'linear_func.onnx')
As usual, we can run the model with ONNX Runtime to see if the model is correct:
import onnxruntime
import numpy as np
sess = onnxruntime.InferenceSession('linear_func.onnx')
a = np.random.rand(10, 10).astype(np.float32)
b = np.random.rand(10, 10).astype(np.float32)
x = np.random.rand(10, 10).astype(np.float32)
output = sess.run(['output'], {'a': a, 'b': b, 'x': x})[0]
assert np.allclose(output, a * x + b)
If all goes well, this code will not have any error messages. This shows that our model is equivalent to performing a * x + b
this calculation.
Read and modify ONNX models
By using the API to construct the ONNX model, we have thoroughly understood which modules ONNX consists of. Now, let's see how to read an existing ".onnx" file and extract model information from it.
First, we can read an ONNX model with the following code:
import onnx
model = onnx.load('linear_func.onnx')
print(model)
When exporting the model before, we passed onnx.save
an ModelProto
object. In the same way, when reading the ONNX model above onnx.load
, what we harvest is also an ModelProto
object. After outputting this object, we should get the exact same output as before.
Next, let's take a look at how to read graph GraphProto
, node NodeProto
, and tensor information :ValueInfoProto
graph = model.graph
node = graph.node
input = graph.input
output = graph.output
print(node)
print(input)
print(output)
Using the above codes, we can access the graph, node, and tensor information of the model respectively. Here you may have questions: How to find out graph.node,graph.input
the node, input
names of these attributes? In fact, the name of the property is written in the output of each object. Let's take print(node)
the output of as an example:
[input: "a"
input: "x"
output: "c"
op_type: "Mul"
, input: "c"
input: "b"
output: "output"
op_type: "Add"
]
In this output, we can see node
that it is actually a list, and the objects in the list have attributes input, output, op_type
(here input
is also a list, and the two elements it contains are displayed). We can use the following code to get the properties of node
the first node in Mul
:
node_0 = node[0]
node_0_inputs = node_0.input
node_0_outputs = node_0.output
input_0 = node_0_inputs[0]
input_1 = node_0_inputs[1]
output = node_0_outputs[0]
op_type = node_0.op_type
print(input_0)
print(input_1)
print(output)
print(op_type)
# Output
"""
a
x
c
Mul
"""
When we want to know what attributes a certain data object of the ONNX model has, we don't need to look through the ONNX document, we just need to output the data object first, and then find out the attribute name in the output result.
After reading the information of the ONNX model, it is very easy to modify the ONNX model. We can create new nodes and tensor information according to the model construction method in the previous section, and combine them with the original model to form a new model, or directly modify the attributes of a data object without violating the ONNX specification.
Here we look at an example of directly modifying model properties:
import onnx
model = onnx.load('linear_func.onnx')
node = model.graph.node
node[1].op_type = 'Sub'
onnx.checker.check_model(model)
onnx.save(model, 'linear_func_2.onnx')
After reading in the previous linear_func.onnx
model, we can directly modify the type of the second node node[1].op_type
, turning addition into subtraction. Thus, our model describes a * x - b
this linear function. If you are interested, you can run the new model with ONNX Runtime linear_func_2.onnx
to verify whether it is a * x - b
equivalent to or not.
Debug ONNX model
In actual deployment, if there is a problem with the ONNX model exported by the deep learning framework, it is generally solved by modifying the code of the framework instead of starting with ONNX. We treat the ONNX model as an unmodifiable black box.
Now that we have studied the principles of ONNX in depth, we can try to debug the ONNX model itself. In this section, let us see how to use the sub-model extraction function provided by ONNX to debug the ONNX model.
submodel extraction
ONNX officially provides developers with the function of sub-model extraction (extract). Sub-model extraction, as the name implies, is to extract a sub-model from a given ONNX model. The node set and edge set of this sub-model are all subsets of the corresponding set in the original model. Let's use PyTorch to export a more complex ONNX model and perform extraction operations on top of it:
import torch
class Model(torch.nn.Module):
def __init__(self):
super().__init__()
self.convs1 = torch.nn.Sequential(torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3))
self.convs2 = torch.nn.Sequential(torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3))
self.convs3 = torch.nn.Sequential(torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3))
self.convs4 = torch.nn.Sequential(torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3),
torch.nn.Conv2d(3, 3, 3))
def forward(self, x):
x = self.convs1(x)
x1 = self.convs2(x)
x2 = self.convs3(x)
x = x1 + x2
x = self.convs4(x)
return x
model = Model()
input = torch.randn(1, 3, 20, 20)
torch.onnx.export(model, input, 'whole_model.onnx')
The visualization result of this model is shown in the figure below (the serial number of the edge needs to be input to extract the sub-model, for everyone to read, this picture shows the serial number of the edge to be used later):
In the previous chapters, we learned that the edges of ONNX are represented by tensors of the same name. In other words, the edge number here is actually the output tensor number of the previous node and the input tensor number of the next node. Since this model is exported with PyTorch, these tensor numbers are automatically generated by PyTorch.
Next, we can extract a submodel with the following code:
import onnx
onnx.utils.extract_model('whole_model.onnx', 'partial_model.onnx', ['22'], ['28'])
The visualization result of the sub-model is shown in the figure below:
By observing the code and the output graph, it should not be difficult to guess that the function of this code is to extract the subgraph from edge 22 to edge 28 of the original calculation graph and form a submodel. onnx.utils.extract_model
It is the function that completes the extraction of the sub-model. Its parameters are the original model path, the output model path, the input edge of the sub-model (input tensor), and the output edge of the sub-model (output tensor).
Intuitively, sub-model extraction is to extract all nodes between the input edge and the output edge. So, what are the restrictions on the use of this function? Based on that whole_model.onnx
, let's take a look at an example of three submodel extractions.
add extra output
We newly set an output tensor when extracting, as shown in the following code:
onnx.utils.extract_model('whole_model.onnx', 'submodel_1.onnx', ['22'], ['27', '31'])
We can see that the sub-model will add a new edge that outputs the tensor, as shown in the following figure:
Add redundant input
If we still extract the sub-model between side 22 and side 28 as before, but add one more input input.1
, then the extracted sub-model will have a redundant input input.1
, as shown in the following code:
onnx.utils.extract_model('whole_model.onnx', 'submodel_2.onnx', ['22', 'input.1'], ['28'])
As can be seen from the figure below: no matter what value is passed to this input, it will not affect the output of the sub-model. It can be considered that if only part of the input of the sub-model can be used to get the output, then those "earlier" extra inputs are redundant.
Insufficient information entered
This time, the submodel input we are trying to extract is edge 24 and the output is edge 28. As shown in the code and figure below:
# Error
onnx.utils.extract_model('whole_model.onnx', 'submodel_3.onnx', ['24'], ['28'])
It can be seen from the figure that if you want to calculate the result of side 28 through side 24, at least you need to input side 26, or the upper side. It is impossible to calculate the result of side 28 only by virtue of side 24, so it will report an error when extracting the sub-model in this way.
Through the above several usage examples, we can sort out the implementation principle of sub-model extraction: create a new model and fill in the given input and output. Then reverse all the directed edges of the graph, start traversing nodes from the output edge, and stop when encountering the input edge, and use the nodes traversed in this way as the nodes of the sub-model.
If you haven't fully understood the extraction principle, it doesn't matter, we just try to ensure that when filling in the input and output of the sub-model, the output can just be determined by the input.
Output the value of the ONNX intermediate node
When using ONNX models, one of the most common requirements is to be able to use the inference engine to output the value of intermediate nodes. This is mostly seen in the accuracy alignment of the deep learning framework model and the ONNX model, because as long as the value of the intermediate node can be output, the operator with deviation in accuracy can be located. Let's see how to achieve this task with submodel extraction.
In the first submodel extraction example just now, we added an output edge that was not present in the original model. Using the same principle, we can add some new outputs while keeping the original input and output unchanged, and extract a "sub-model" that can output intermediate nodes. For example:
onnx.utils.extract_model('whole_model.onnx', 'more_output_model.onnx', ['input.1'], ['31', '23', '25', '27'])
In this sub-model, while maintaining the original input input.1
and output 31
, we added several other edges to the output. As shown below:
In this way, when running more_output_model.onnx
this model with ONNX Runtime, we can get more output.
In order to facilitate debugging, we can also split the original model into multiple disjoint sub-models. In this way, at each debugging, only some submodules of the original model can be debugged. for example:
onnx.utils.extract_model('whole_model.onnx', 'debug_model_1.onnx', ['input.1'], ['23'])
onnx.utils.extract_model('whole_model.onnx', 'debug_model_2.onnx', ['23'], ['25'])
onnx.utils.extract_model('whole_model.onnx', 'debug_model_3.onnx', ['23'], ['27'])
onnx.utils.extract_model('whole_model.onnx', 'debug_model_4.onnx', ['25', '27'], ['31'])
In this example, we split the original more complex model into four simpler sub-models, as shown in the figure below. When debugging, we can first debug the top-level sub-model, and after confirming that the top-level sub-model is correct, use its output as the input of the subsequent sub-model.
For example, for these submodels, we can first debug the first submodel and store the output 23. Then use tensor 23 as the input of the second and third sub-models, and debug these two models. Finally, use the same method to debug the fourth sub-model. It can be said that with the sub-model extraction function, even in the face of a huge model, we can extract the problematic sub-module from it and carefully debug only this sub-module.
Submodel extraction is certainly a handy ONNX debugging tool. However, in actual situations, we generally use frameworks such as PyTorch to export ONNX models. There are two problems here:
- Once the PyTorch model changes, the edge numbers of the ONNX model will also change. In this way, every time the same sub-module is extracted, it is necessary to check the serial number in the ONNX model again. Such a cumbersome debugging method will not be used in practice.
- Even if we can ensure that the edge number of ONNX does not change, it is difficult for us to match the PyTorch code with the ONNX node - when the model structure becomes very complex, it is impossible to identify the meaning of each node in ONNX.
In MMDeploy, we added model chunking to PyTorch models. Using this feature, we can export the original model into multiple disjoint sub-ONNX models by only modifying the implementation code of the PyTorch model. We'll cover it in a later tutorial.
https://github.com/open-mmlab/mmdeploygithub.com/open-mmlab/mmdeploy
Summarize
In this tutorial, we set aside PyTorch and learned about the ONNX model itself. The old rules, let's summarize the knowledge points of this tutorial:
- ONNX uses Protobuf to define specification and serialization models.
- An ONNX model is mainly composed of
ModelProto
,GraphProto
,NodeProto
,ValueInfoProto
objects of these data classes. - Using
onnx.helper.make_xxx
, we can construct the data object of the ONNX model. onnx.save()
Models can be saved,onnx.load()
models can be read, andonnx.checker.check_model()
models can be checked for compliance.onnx.utils.extract_model()
Some nodes can be taken out from the original model, and a new sub-model can be formed with the newly defined input and output edges.- Using the sub-model extraction function, we can output the intermediate results of the original ONNX model to realize the debugging of the ONNX model.
So far, our study of ONNX related knowledge has come to an end. To review, we first learned how to use the API from PyTorch to ONNX; then, we learned how to use custom operators to solve the problem of insufficient expressive ability of PyTorch and ONNX; finally, we learned the debugging method of the ONNX model separately. By learning ONNX from shallow to deep, we can basically deal with most of the problems related to ONNX in model deployment.
If you want to know more about ONNX API, you can read ONNX's official Python API documentation .
However, if we just passed the knowledge, we may not be able to skillfully apply these PyTorch and ONNX APIs. In the next tutorial, we will use PyTorch and ONNX to write some practical tools related to the ONNX model, as a summary of the past few tutorials, so stay tuned!
Interested friends, welcome to MMDeploy to experience~
https://github.com/open-mmlab/mmdeploygithub.com/open-mmlab/mmdeploy
Series Portal
OpenMMLab: Interpretation of TorchScript (1): Getting to know TorchScript for the first time
OpenMMLab: Interpretation of TorchScript (2): Torch jit tracer implementation analysis
OpenMMLab: Interpretation of TorchScript (4): Alias Analysis in Torch jit
OpenMMLab: Introduction to Model Deployment Tutorial (2): Solving the Problems in Model Deployment
OpenMMLab: Introductory Tutorial for Model Deployment (3): PyTorch to ONNX Detailed Explanation