8.7.tensorRT Advanced (3) Package Series - Debugging Methods, Thought Discussion

Preface

I have read the tensorRT high-performance deployment course from scratch launched by Teacher Du before, but I didn’t take notes and I forgot many things. I’ll do it again this time and take notes.

This course learns tensorRT advanced-debugging method, thought discussion

Please see the mind map below for the course syllabus

Insert image description here

1. Model debugging skills

In this section we learn the debugging skills of the model, the debug method

Debugging rules :

1. Make good use of python workflow and work with python/cpp to debug problems (python workflow is relatively complete, using C++ as a processing tool and Python as an analysis and visualization tool)

2. When removing pre- and post-processing, ensure that the results of onnx and pytorch are consistent and eliminate all factors. Engine can usually guarantee this. For example, if both input tensors are all 5, the difference between the outputs must be less than 1e-4 to ensure that no exceptions occur in the middle.

3. It is generally difficult to ensure that preprocessing is exactly the same. Consider storing the preprocessing results of pytorch in a file and inferring after loading in c++. The difference in the results should be less than 1e-4 (especially when writing plug-ins)

4. Consider storing the results of python model inference as a file, first writing it with numpy and then processing it. Then use c++ to reproduce

5. If a bug occurs, you should save the tensor file from c++ and put it on python for debugging and viewing. Avoid debugging in c++

Don’t rush to write C++, use python for debugging.

In the previous multi-threaded yolov5 code, we have two functions in the tensor package. One is save_to_file which can save our tensor into a binary file, and the other is load_from_file which can read the tensor from the binary file. Their definitions are as follows :

bool Tensor::save_to_file(const std::string& file) const{
    
    

    if(empty()) return false;

    FILE* f = fopen(file.c_str(), "wb");
    if(f == nullptr) return false;

    int ndims = this->ndims();
    unsigned int head[3] = {
    
    0xFCCFE2E2, ndims, static_cast<unsigned int>(dtype_)};
    fwrite(head, 1, sizeof(head), f);
    fwrite(shape_.data(), 1, sizeof(shape_[0]) * shape_.size(), f);
    fwrite(cpu(), 1, bytes_, f);
    fclose(f);
    return true;
}

bool Tensor::load_from_file(const std::string& file){
    
    

    FILE* f = fopen(file.c_str(), "rb");
    if(f == nullptr){
    
    
        INFOE("Open %s failed.", file.c_str());
        return false;
    }

    unsigned int head[3] = {
    
    0};
    fread(head, 1, sizeof(head), f);

    if(head[0] != 0xFCCFE2E2){
    
    
        fclose(f);
        INFOE("Invalid tensor file %s, magic number mismatch", file.c_str());
        return false;
    }

    int ndims = head[1];
    auto dtype = (TRT::DataType)head[2];
    vector<int> dims(ndims);
    fread(dims.data(), 1, ndims * sizeof(dims[0]), f);

    this->dtype_ = dtype;
    this->resize(dims);

    fread(this->cpu(), 1, bytes_, f);
    fclose(f);
    return true;
}

The save_to_file function is used to save the Tensor object to the specified file. It first writes a header containing the magic number, the number of dimensions and the data type, then writes the shape of the Tensor, and finally writes the Tensor data.

The load_from_file function is used to load the Tensor object from the specified file. First, read and verify the header of the file to obtain the number of dimensions and data type of the Tensor, then read the shape and data of the Tensor, and set this information to the current Tensor object.

The above is the maintenance and loading of tensor in C++. We can also achieve it in python. The specific implementation is as follows:

import numpy as np

def load_tensor(file):
    
    with open(file, "rb") as f:
        binary_data = f.read()

    magic_number, ndims, dtype = np.frombuffer(binary_data, np.uint32, count=3, offset=0)
    assert magic_number == 0xFCCFE2E2, f"{
      
      file} not a tensor file."
    
    dims = np.frombuffer(binary_data, np.uint32, count=ndims, offset=3 * 4)

    if dtype == 0:
        np_dtype = np.float32
    elif dtype == 1:
        np_dtype = np.float16
    else:
        assert False, f"Unsupport dtype = {
      
      dtype}, can not convert to numpy dtype"
        
    return np.frombuffer(binary_data, np_dtype, offset=(ndims + 3) * 4).reshape(*dims)


def save_tensor(tensor, file):

    with open(file, "wb") as f:
        typeid = 0
        if tensor.dtype == np.float32:
            typeid = 0
        elif tensor.dtype == np.float16:
            typeid = 1
        elif tensor.dtype == np.int32:
            typeid = 2
        elif tensor.dtype == np.uint8:
            typeid = 3

        head = np.array([0xFCCFE2E2, tensor.ndim, typeid], dtype=np.uint32).tobytes()
        f.write(head)
        f.write(np.array(tensor.shape, dtype=np.uint32).tobytes())
        f.write(tensor.tobytes())

The implementation of the Python version is actually no different from the C++ version

The load_tensor function is used to read binary data from the specified file. It first parses the header to obtain the magic number, number of dimensions and data type, then parses the shape and data of the tensor based on the read information, and finally returns both the shape and data type. Already set numpy array.

The save_tensor function is used to save the numpy array to the specified file. It first encodes the magic number, the number of dimensions and the data type of the array into a binary header, then writes the shape of the array, and finally writes the data of the array.

Now let's go through a process of keeping a tensor in python and loading it in C++. The Python code is as follows:

def save_tensor(tensor, file):

    with open(file, "wb") as f:
        typeid = 0
        if tensor.dtype == np.float32:
            typeid = 0
        elif tensor.dtype == np.float16:
            typeid = 1
        elif tensor.dtype == np.int32:
            typeid = 2
        elif tensor.dtype == np.uint8:
            typeid = 3

        head = np.array([0xFCCFE2E2, tensor.ndim, typeid], dtype=np.uint32).tobytes()
        f.write(head)
        f.write(np.array(tensor.shape, dtype=np.uint32).tobytes())
        f.write(tensor.tobytes())

data = np.arange(100, dtype=np.float32).reshape(10, 10, 1)
save_tensor(data, "data.tensor")

The C++ code is as follows:

#include "trt-tensor.hpp"

int main(){
    
    

    TRT::Tensor tensor;
    tensor.load_from_file("../data.tensor");
    
    float* ptr = tensor.cpu<float>();
    INFO("tensor.shape = %s, dtype = %d", tensor.shape_string(), tensor.type());

    for(int i = 0; i < tensor.count(); ++i){
    
    
        INFO("%d -> = %f", i, prt[i]);
    }

    return 0;
}

The execution effect is as follows:

Insert image description here

Figure 1-1 python storage C++ loading

We can see that the result is the same as we expected, there is no loss, we can change its type to uint8 and take a look, the operation effect is as follows:

Insert image description here

Figure 1-2 python storage C++ loading (uint8)

It can be seen that there is no problem, then here is python to save and c++ to read. Next, let's look at c++ to save, python to read

yolov5.cpp 中
214/216
    
input->save_to_file("input.tensor")
output->save_to_file("output.tensor")

Run as follows:

Insert image description here

Figure 1-3 C++ storage

Next, we go to python to load the saved input and output, the code is as follows:

import numpy as np

def load_tensor(file):
    
    with open(file, "rb") as f:
        binary_data = f.read()

    magic_number, ndims, dtype = np.frombuffer(binary_data, np.uint32, count=3, offset=0)
    assert magic_number == 0xFCCFE2E2, f"{file} not a tensor file."
    
    dims = np.frombuffer(binary_data, np.uint32, count=ndims, offset=3 * 4)

    if dtype == 0:
        np_dtype = np.float32
    elif dtype == 1:
        np_dtype = np.float16
    else:
        assert False, f"Unsupport dtype = {dtype}, can not convert to numpy dtype"
        
    return np.frombuffer(binary_data, np_dtype, offset=(ndims + 3) * 4).reshape(*dims)

input = load_tensor("workspace/input.tensor")
output = load_tensor("workspace/output.tensor")

print(input.shape, output.shape)

# 恢复成源图像
image = input * 255
image = image.transpose(0, 2, 3, 1)[0].astype(np.uint8)[..., ::-1]

import cv2
cv2.imwrite("image.jpg", image)
print("save done.")

The running effect is as follows:

Insert image description here

Figure 1-4 python loading

Insert image description here

Figure 1-5 Image recovered by python

It can be seen that the effect we recovered is exactly the same, indicating that there is no problem in the middle

Here we explain how to save tensor, how to load tensor, how to interact with C++, and you can also package it yourself.

Finally, let's look at the process of implementing a model:

1. First run the code through predict, and use a single image as input. Block everything that is inconsistent with the goal, and you can modify and delete any redundant things.

2. Write a python program by yourself, simplify the process of predict, and master the minimum dependencies and codes required by predict

3. If the second step is difficult, you can consider directly studying the step of pred = model(x), for example, write torch.onnx.export(model, (pred,) ...) directly here, or directly put pred The results are stored for study, etc.

4. Analyze the pre-processing and post-processing and implement a simplified version

5. Use the simplified version for debugging, understanding and analysis. Then consider reasonable arrangements for pre-processing and post-processing, such as whether part of the post-processing can be placed in onnx

6. Export onnx and reproduce the preprocessing part in C++ first, so that the result is close to that of python (most of the time, the same result cannot be obtained)

7. After storing the pred results on python, use C++ to read and reproduce the required post-processing part. Make sure the results are correct

8. Connect the pre- and post-processing with onnx to form a complete reasoning

Summarize

In this course, we learned the debugging method. First, we encapsulated tensor saving and loading, which can ensure the interaction between python and c++. Then we discussed the process of implementing a model. After getting a new project, the first thing we need to do is to run through the predict of a single image. Then we can implement a simple predict program by ourselves, or we can also consider directly exporting onnx or Save the pred prediction results, then we extract the pre-processing and post-processing through debug analysis, and export onnx. Now reproduce the preprocessing in C++ to see if the results are close to those in python. In addition, use c++ to read the pred stored in python and reproduce the postprocessing. Finally, connect the whole thing to complete the inference.

This is the workflow recommended by Teacher Du, which can simplify the process and facilitate development and debugging.

Guess you like

Origin blog.csdn.net/qq_40672115/article/details/132527266