Some basic operations of python onnx model

Recently, when the model was quantified, the model format was changed to the onnx model, so onnx needs to be loaded, run, and quantified (weight/input/output). Therefore, I simply learned the related operations of the onnx model and wrote a blog here. If there are any errors, please point out, thank you.

One, onnx configuration environment

The onnx environment mainly contains two packages onnx and onnxruntime, we can install these two dependent packages through pip.

pip install onnxruntime
pip install onnx

Second, get the output layer of the onnx model

import onnx
# 加载模型
model = onnx.load('onnx_model.onnx')
# 检查模型格式是否完整及正确
onnx.checker.check_model(model)
# 获取输出层,包含层名称、维度信息
output = self.model.graph.output
print(output)

Three, get the output data of the mid-node

  The onnx model usually can only get the output data of the last output node. If we want to get the output data of the intermediate node, we need to add the corresponding output node information ourselves; first we need to build the specified node (layer name, data type, dimension information) ; Then insert the node into the model by insert.

import onnx
from onnx import helper
# 加载模型
model = onnx.load('onnx_model.onnx')
# 创建中间节点:层名称、数据类型、维度信息
prob_info =  helper.make_tensor_value_info('layer1',onnx.TensorProto.FLOAT, [1, 3, 320, 280])
# 将构建完成的中间节点插入到模型中
model.graph.output.insert(0, prob_info)
# 保存新的模型
onnx.save(model, 'onnx_model_new.onnx')

# 扩展:
# 删除指定的节点方法: item为需要删除的节点
# model.graph.output.remove(item)

Fourth, the use of onnx forward InferenceSession

  Regarding the forward reasoning of onnx, onnx uses the onnxruntime calculation engine.
  onnx runtime is an inference engine for onnx models. In 2017, Microsoft and Facebook, etc. established a format standard for deep learning and machine learning models-ONNX, and provided an engine (onnxruntime) dedicated to ONNX model inference.

import onnxruntime

# 创建一个InferenceSession的实例,并将模型的地址传递给该实例
sess = onnxruntime.InferenceSession('onnxmodel.onnx')
# 调用实例sess的润方法进行推理
outputs = sess.run(output_layers_name, {
    
    input_layers_name: x})

1. Create examples, source code analysis

class InferenceSession(Session):
    """
    This is the main class used to run a model.
    """
    def __init__(self, path_or_bytes, sess_options=None, providers=[]):
        """
        :param path_or_bytes: filename or serialized model in a byte string
        :param sess_options: session options
        :param providers: providers to use for session. If empty, will use
            all available providers.
        """
        self._path_or_bytes = path_or_bytes
        self._sess_options = sess_options
        self._load_model(providers)
        self._enable_fallback = True
        Session.__init__(self, self._sess)

    def _load_model(self, providers=[]):
        if isinstance(self._path_or_bytes, str):
            self._sess = C.InferenceSession(
                self._sess_options if self._sess_options else C.get_default_session_options(), self._path_or_bytes,
                True)
        elif isinstance(self._path_or_bytes, bytes):
            self._sess = C.InferenceSession(
                self._sess_options if self._sess_options else C.get_default_session_options(), self._path_or_bytes,
                False)
        # elif isinstance(self._path_or_bytes, tuple):
        # to remove, hidden trick
        #   self._sess.load_model_no_init(self._path_or_bytes[0], providers)
        else:
            raise TypeError("Unable to load from type '{0}'".format(type(self._path_or_bytes)))

        self._sess.load_model(providers)

        self._sess_options = self._sess.session_options
        self._inputs_meta = self._sess.inputs_meta
        self._outputs_meta = self._sess.outputs_meta
        self._overridable_initializers = self._sess.overridable_initializers
        self._model_meta = self._sess.model_meta
        self._providers = self._sess.get_providers()

        # Tensorrt can fall back to CUDA. All others fall back to CPU.
        if 'TensorrtExecutionProvider' in C.get_available_providers():
            self._fallback_providers = ['CUDAExecutionProvider', 'CPUExecutionProvider']
        else:
            self._fallback_providers = ['CPUExecutionProvider']

  In the _load_model function, it can be found that the C.InferenceSession is used when loading the model, and related operations are also delegated to this class. From the import statement from onnxruntime.capi import _pybind_state as C, it can be seen that it is actually a Python interface implemented by C++, and its source code is in onnxruntime\onnxruntime\python\onnxruntime_pybind_state.cc.

2. Model reasoning run, source code analysis

    def run(self, output_names, input_feed, run_options=None):
        """
        Compute the predictions.

        :param output_names: name of the outputs
        :param input_feed: dictionary ``{ input_name: input_value }``
        :param run_options: See :class:`onnxruntime.RunOptions`.

        ::

            sess.run([output_name], {input_name: x})
        """
        num_required_inputs = len(self._inputs_meta)
        num_inputs = len(input_feed)
        # the graph may have optional inputs used to override initializers. allow for that.
        if num_inputs < num_required_inputs:
            raise ValueError("Model requires {} inputs. Input Feed contains {}".format(num_required_inputs, num_inputs))
        if not output_names:
            output_names = [output.name for output in self._outputs_meta]
        try:
            return self._sess.run(output_names, input_feed, run_options)
        except C.EPFail as err:
            if self._enable_fallback:
                print("EP Error: {} using {}".format(str(err), self._providers))
                print("Falling back to {} and retrying.".format(self._fallback_providers))
                self.set_providers(self._fallback_providers)
                # Fallback only once.
                self.disable_fallback()
                return self._sess.run(output_names, input_feed, run_options)
            else:
                raise

  In the run function, data inference is forward inference by calling self._sess.run. Similarly, the specific implementation of this function is implemented in the InferenceSession class of C++.

Five, some problems encountered

  1. The input data dimension or type is incorrect
    Insert picture description here
      . As can be seen from the above figure, the dimension information of the input data of the model is [1, 3, 480, 640], and the input data type is float32; therefore, when constructing the input data, you must follow this Information to construct, otherwise the code will report an error.

Note: Python calls C++ code through pybind11.

Guess you like

Origin blog.csdn.net/CFH1021/article/details/108732114