Debugging skills during model deployment, debug method

Debugging rules

  1. Make good use of python workflow and work with python/cpp to debug problems. Because python has a relatively complete workflow, such as pycharm, etc. In short, when you encounter a problem, don't think about using C++ to debug. Try to switch the workflow to python, which is more efficient. For example, when printing on the terminal and the content is too long and the display is incomplete, you can use pycharm to debug . Especially in deep learning, most of the things are tensor matrices, and it is more efficient to analyze problems on Python. For example, onnx model inference can be reproduced on the Python workflow using onnxruntime to implement inference.
  2. When removing pre- and post-processing, ensure that the results of onnx and pytorch are consistent and eliminate all factors. This is usually guaranteed. For example, if both input tensors are all 5, the difference between the outputs must be less than 1e-4 to ensure that no exceptions occur in the middle.
  3. It is generally difficult to ensure that the preprocessing during inference is exactly the same as the preprocessing in pytorch (for example, PIL used in pytorch, opencv during inference), consider storing the preprocessing results in python in a file, and loading the file in c++ and inferring the results. The difference should be less than 1e-4
  4. Consider storing the results of python model inference as a file, first writing it with numpy and then processing it. Then reproduce it using c++. In short, the priority is to do experimental adjustments in python, understand it clearly, and then move to c++. This is actually a process of code conversion. The efficiency of solving problems will become very high.
  5. If a bug occurs, you should save the tensor file from c++, load the file in python, and then put it on python for debugging and viewing . Avoid debugging in C++.

Code for loading Tensor and storing Tensor in python

# 以下代码是python中加载Tensor
 import numpy as np

 def load_tensor(file):
     
     with open(file, "rb") as f:
         binary_data = f.read()

     magic_number, ndims, dtype = np.frombuffer(binary_data, np.uint32, count=3, offset=0)
     assert magic_number == 0xFCCFE2E2, f"{
      
      file} not a tensor file."
     
     dims = np.frombuffer(binary_data, np.uint32, count=ndims, offset=3 * 4)

     if dtype == 0:
         np_dtype = np.float32
     elif dtype == 1:
         np_dtype = np.float16
     else:
         assert False, f"Unsupport dtype = {
      
      dtype}, can not convert to numpy dtype"
         
     return np.frombuffer(binary_data, np_dtype, offset=(ndims + 3) * 4).reshape(*dims)


 def save_tensor(tensor, file):

     with open(file, "wb") as f:
         typeid = 0
         if tensor.dtype == np.float32:
             typeid = 0
         elif tensor.dtype == np.float16:
             typeid = 1
         elif tensor.dtype == np.int32:
             typeid = 2
         elif tensor.dtype == np.uint8:
             typeid = 3

         head = np.array([0xFCCFE2E2, tensor.ndim, typeid], dtype=np.uint32).tobytes()
         f.write(head)
         f.write(np.array(tensor.shape, dtype=np.uint32).tobytes())
         f.write(tensor.tobytes())

The process of implementing a model at deployment time

  1. First, run the code through predict (the model's predict), and use a single image as input. Block everything that is inconsistent with the goal, and you can modify and delete any redundant things.
  2. Write a new python program by yourself to simplify the predict process and master the minimum dependencies and code required for predict. For example, the original predict process may depend on dataset, dataloader, config, etc., so you should consider simplifying it at this time, because only a single image is considered as input during inference, so the purpose of simplification is to use only a few lines of code. predict written out. The predict here refers to the predict of the model and does not include pre and post processing .
  3. If the second step is difficult, you can consider directly studying the step pred = model(x), for example, writing torch.onnx.export(model, (pred,) … ) directly here. Or directly store the results of pred for study, etc.
  4. Analyze the pre-processing and post-processing and implement a simplified version of python.
  5. Use the simplified version for debugging and understanding analysis. Then consider reasonable arrangements for pre-processing and post-processing, such as whether part of the post-processing can be placed in onnx
  6. Export onnx and reproduce the pre-processing part on c++ first (don't reproduce the post-processing at this time) , so that the results are close (because most of the time the same results cannot be obtained)
  7. After storing the pred results on python, use c++ to read and reproduce the required post-processing part . Make sure the results are correct. (Because this way we can avoid waiting for tensorrt’s compiled inference every time we debug the post-processing module we wrote, which would be too inefficient.)
  8. Connect pre- and post-processing with onnx to form a complete reasoning

Guess you like

Origin blog.csdn.net/Rolandxxx/article/details/127814207