[Model Deployment] Getting Started Tutorial (2): Solving Difficulties in Model Deployment

Introduction to Model Deployment Tutorial (2): Solving Difficulties in Model Deployment - Zhihu (zhihu.com)

Welcome to continue reading the Getting Started with Model Deployment tutorial series! In the previous tutorial , we deployed a simple super-resolution model and everything went smoothly. However, the previous model still has some flaws - the magnification factor of the picture is fixed at 4, and we cannot enlarge the picture by any multiple. Now, let's try to deploy a model that supports dynamic magnification, and experience the difficulties that may be encountered in model deployment.

Common Challenges in Model Deployment

In the previous study, we deployed the model smoothly and did not encounter any problems. This is because the SRCNN model only contains a few simple operators, and these convolution and interpolation operators have been perfectly supported on various intermediate representations and inference engines. If the operation of the model is a little more complicated, we may have to pay a lot of effort for the compatible model. In fact, the following types of difficulties are generally encountered during model deployment:

  • The dynamization of the model. For performance considerations, each reasoning framework assumes that the input shape, output shape, and structure of the model are static. In order to make the model more versatile, it is necessary to make the input and output or structure of the model dynamic without affecting the original logic as much as possible during deployment.
  • Implementation of the new operator. Deep learning technology is changing with each passing day, and new operators are often proposed faster than ONNX maintainers can support them. In order to deploy the latest models, deployment engineers often need to support new operators in ONNX and inference engines themselves.
  • Compatibility issues between intermediate representations and inference engines. Due to the different implementations of various inference engines, it is difficult to form a unified support for ONNX. In order to ensure that the model has the same running effect in different inference engines, deployment engineers often have to customize the model code for a certain inference engine, which introduces a lot of workload for model deployment.

We will describe in detail how to solve these problems in subsequent tutorials. If you are unfamiliar with terms such as ONNX, inference engine, intermediate representation, operator, etc. in the previous article, don't worry, you can read the introduction to model deployment to understand related concepts.

Now, let's make some small modifications to the original SRCNN model, experience the difficulty of model deployment caused by model dynamism, and learn a way to solve this problem.

Problem: Implementing Dynamically Upscaled Super-Resolution Models

In the original SRCNN, the magnification ratio of the picture is written in the model:

class SuperResolutionNet(nn.Module): 
    def __init__(self, upscale_factor): 
        super().__init__() 
        self.upscale_factor = upscale_factor 
        self.img_upsampler = nn.Upsample( 
            scale_factor=self.upscale_factor, 
            mode='bicubic', 
            align_corners=False) 
 
... 
 
def init_torch_model(): 
    torch_model = SuperResolutionNet(upscale_factor=3) 
 

We use upscale_factor to control the scale up of the model. When initializing the model, we set the upscale_factor to 3 by default, generating a PyTorch model enlarged by 3 times. This PyTorch model was finally converted into a model in ONNX format. If we need a model that is magnified by 4 times, we need to regenerate the model and do a conversion to ONNX again.

Now, suppose we want to do a super-resolution application. Our users hope that the magnification of the picture can be freely set. And what we handed over to the user is an .onnx file and an application to run the super-resolution model. We change the magnification without modifying the .onnx file.

Therefore, we must modify the original model so that the magnification of the model becomes the input for inference. Based on the Python script in the previous article , we make some modifications to get this script:

import torch 
from torch import nn 
from torch.nn.functional import interpolate 
import torch.onnx 
import cv2 
import numpy as np 
 
 
class SuperResolutionNet(nn.Module): 
 
    def __init__(self): 
        super().__init__() 
 
        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4) 
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0) 
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2) 
 
        self.relu = nn.ReLU() 
 
    def forward(self, x, upscale_factor): 
        x = interpolate(x, 
                        scale_factor=upscale_factor, 
                        mode='bicubic', 
                        align_corners=False) 
        out = self.relu(self.conv1(x)) 
        out = self.relu(self.conv2(out)) 
        out = self.conv3(out) 
        return out 
 
 
def init_torch_model(): 
    torch_model = SuperResolutionNet() 
 
    state_dict = torch.load('srcnn.pth')['state_dict'] 
 
    # Adapt the checkpoint 
    for old_key in list(state_dict.keys()): 
        new_key = '.'.join(old_key.split('.')[1:]) 
        state_dict[new_key] = state_dict.pop(old_key) 
 
    torch_model.load_state_dict(state_dict) 
    torch_model.eval() 
    return torch_model 
 
 
model = init_torch_model() 
 
input_img = cv2.imread('face.png').astype(np.float32) 
 
# HWC to NCHW 
input_img = np.transpose(input_img, [2, 0, 1]) 
input_img = np.expand_dims(input_img, 0) 
 
# Inference 
torch_output = model(torch.from_numpy(input_img), 3).detach().numpy() 
 
# NCHW to HWC 
torch_output = np.squeeze(torch_output, 0) 
torch_output = np.clip(torch_output, 0, 255) 
torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8) 
 
# Show image 
cv2.imwrite("face_torch_2.png", torch_output) 

Before SuperResolutionNet was modified, nn.Upsample fixed the magnification in the initialization phase, and PyTorch's interpolate interpolation operator can select the magnification in the running phase. Therefore, we use interpolate instead of nn.Upsample in the new script, so that the model supports dynamic magnification. We set the magnification factor to 3 when using model inference on line 55. Finally, the image is saved in the file "face_torch_2.png". If everything is normal, the contents of "face_torch_2.png" and "face_torch.png" are exactly the same.

With simple modifications, PyTorch models already support dynamic resolution. Now let's try to export the model:

x = torch.randn(1, 3, 256, 256) 
 
with torch.no_grad(): 
    torch.onnx.export(model, (x, 3), 
                      "srcnn2.onnx", 
                      opset_version=11, 
                      input_names=['input', 'factor'], 
                      output_names=['output']) 
 

When running these scripts, a long list of errors will be reported. No way, we encountered a compatibility issue in model deployment.

Solution: custom operator

If we use the PyTorch model directly, we can make the model input dynamic by modifying a few lines of code. But in model deployment, we spend several times as long trying to figure this out. Now, let us follow the idea of ​​solving the problem, experience the difficulties of model deployment, and learn how to use custom operators to solve the dynamic problem of super-resolution models.

The error just reported is because when the PyTorch model is exported to the ONNX model, the types of the input parameters of the model must all be torch.Tensor. In fact, the second parameter "3" we pass in is an integer variable. This is not compliant with PyTorch to ONNX. We have to modify the input of the original model. In order to ensure that all input parameters are of torch.Tensor type, we make the following modifications:

... 
 
class SuperResolutionNet(nn.Module): 
 
    def forward(self, x, upscale_factor): 
        x = interpolate(x, 
                        scale_factor=upscale_factor.item(), 
                        mode='bicubic', 
                        align_corners=False) 
 
... 
 
# Inference 
# Note that the second input is torch.tensor(3) 
torch_output = model(torch.from_numpy(input_img), torch.tensor(3)).detach().numpy() 
 
... 
 
with torch.no_grad(): 
    torch.onnx.export(model, (x, torch.tensor(3)), 
                      "srcnn2.onnx", 
                      opset_version=11, 
                      input_names=['input', 'factor'], 
                      output_names=['output']) 

Since the scale_factor parameter of interpolate in PyTorch must be a value, we use torch.Tensor.item() to convert a torch.Tensor with only one element into a value. Later, when the model is inference, we use torch.tensor(3) instead of 3, so that all our inputs meet the requirements. If you run the script now, whether you run the model directly or export the ONNX model, no error will be reported.

However, a TraceWarning warning was reported when exporting ONNX. This warning says that some volumes may fail to track. What's going on here? Let's visualize the generated srcnn2.onnx with Netron:

It can be found that although we set the input of model reasoning to two, the ONNX model still looks exactly the same as the original, with only one input called "input". This is because we used torch.Tensor.item() to get the data out of Tensor, and this operation cannot be recorded when exporting the ONNX model, so we had to report a TraceWarning. As a result, the magnification factor of the interpolate interpolation function is still set to a fixed value of "3", and the "srcnn2.onnx" we exported is exactly the same as the original "srcnn.onnx".

It doesn't seem to work to directly modify the original model. We have to start with the principle of converting PyTorch to ONNX and force the ONNX model to understand our ideas.

Carefully observe the ONNX model visualized on Netron, and you can find that no matter whether you use the earliest nn.Upsample or the later interpolate in PyTorch, the interpolation operation in PyTorch will eventually be converted into the Resize operation defined by ONNX. In other words, the so-called conversion of PyTorch to ONNX actually maps each PyTorch operation to an operator defined by ONNX.

Click the operator to see its detailed parameters as follows:

Among them, expand scales, you can see that scales is a one-dimensional tensor with a length of 4, and its content is [1, 1, 3, 3], indicating the scaling factor of each dimension of the Resize operation; its type is Initializer, indicating this Values ​​are directly initialized from constants. If we can generate an ONNX Resize operator by ourselves and make scales a variable instead of a constant, just like the X above it, then this super-resolution model can be dynamically scaled.

The existing PyTorch operators that implement interpolation have a set of defined methods for mapping to ONNX Resize operators. The scales of these mapped Resize operators can only be constant, which cannot meet our needs. We have to define a PyTorch operator that implements interpolation, and then map it to an ONNX Resize operator we expect.

The following script defines a PyTorch interpolator and uses it in the model. We first verify the correctness of the operator by running the model:

import torch 
from torch import nn 
from torch.nn.functional import interpolate 
import torch.onnx 
import cv2 
import numpy as np 
 
 
class NewInterpolate(torch.autograd.Function): 
 
    @staticmethod 
    def symbolic(g, input, scales): 
        return g.op("Resize", 
                    input, 
                    g.op("Constant", 
                         value_t=torch.tensor([], dtype=torch.float32)), 
                    scales, 
                    coordinate_transformation_mode_s="pytorch_half_pixel", 
                    cubic_coeff_a_f=-0.75, 
                    mode_s='cubic', 
                    nearest_mode_s="floor") 
 
    @staticmethod 
    def forward(ctx, input, scales): 
        scales = scales.tolist()[-2:] 
        return interpolate(input, 
                           scale_factor=scales, 
                           mode='bicubic', 
                           align_corners=False) 
 
 
class StrangeSuperResolutionNet(nn.Module): 
 
    def __init__(self): 
        super().__init__() 
 
        self.conv1 = nn.Conv2d(3, 64, kernel_size=9, padding=4) 
        self.conv2 = nn.Conv2d(64, 32, kernel_size=1, padding=0) 
        self.conv3 = nn.Conv2d(32, 3, kernel_size=5, padding=2) 
 
        self.relu = nn.ReLU() 
 
    def forward(self, x, upscale_factor): 
        x = NewInterpolate.apply(x, upscale_factor) 
        out = self.relu(self.conv1(x)) 
        out = self.relu(self.conv2(out)) 
        out = self.conv3(out) 
        return out 
 
 
def init_torch_model(): 
    torch_model = StrangeSuperResolutionNet() 
 
    state_dict = torch.load('srcnn.pth')['state_dict'] 
 
    # Adapt the checkpoint 
    for old_key in list(state_dict.keys()): 
        new_key = '.'.join(old_key.split('.')[1:]) 
        state_dict[new_key] = state_dict.pop(old_key) 
 
    torch_model.load_state_dict(state_dict) 
    torch_model.eval() 
    return torch_model 
 
 
model = init_torch_model() 
factor = torch.tensor([1, 1, 3, 3], dtype=torch.float) 
 
input_img = cv2.imread('face.png').astype(np.float32) 
 
# HWC to NCHW 
input_img = np.transpose(input_img, [2, 0, 1]) 
input_img = np.expand_dims(input_img, 0) 
 
# Inference 
torch_output = model(torch.from_numpy(input_img), factor).detach().numpy() 
 
# NCHW to HWC 
torch_output = np.squeeze(torch_output, 0) 
torch_output = np.clip(torch_output, 0, 255) 
torch_output = np.transpose(torch_output, [1, 2, 0]).astype(np.uint8) 
 
# Show image 
cv2.imwrite("face_torch_3.png", torch_output) 

If the model is running normally, a super-resolution image enlarged by 3 times will be saved in "face_torch_3.png", and its content is exactly the same as "face_torch.png".

In the script just now, we define the code for the PyTorch interpolation operator as follows:

class NewInterpolate(torch.autograd.Function): 
 
    @staticmethod 
    def symbolic(g, input, scales): 
        return g.op("Resize", 
                    input, 
                    g.op("Constant", 
                         value_t=torch.tensor([], dtype=torch.float32)), 
                    scales, 
                    coordinate_transformation_mode_s="pytorch_half_pixel", 
                    cubic_coeff_a_f=-0.75, 
                    mode_s='cubic', 
                    nearest_mode_s="floor") 
 
    @staticmethod 
    def forward(ctx, input, scales): 
        scales = scales.tolist()[-2:] 
        return interpolate(input, 
                           scale_factor=scales, 
                           mode='bicubic', 
                           align_corners=False) 

Before introducing the implementation of this operator in detail, let us clarify our thinking first. We want the new interpolation operator to have two inputs, one is the image to be manipulated, and the other is the scale of the image. As mentioned earlier, in order to connect to the scales parameter of the Resize operator in ONNX, the scaling ratio is a tensor of [1, 1, x, x], where x is the magnification factor. In the previous model that was enlarged by 3 times, this parameter was fixed to [1, 1, 3, 3]. Therefore, in the interpolation operator, we hope that the second input of the model is a tensor of [1, 1, w, h], where w and h are the magnifications of the image width and height, respectively.

After clarifying the input of the interpolation operator, let's take a look at the specific implementation of the operator. The operator's reasoning behavior is determined by the operator's foward method. The first parameter of this method must be ctx, and the following parameters are the custom input of the operator. We set two inputs, which are the image to be operated and the zoom ratio. In order to ensure correct reasoning, it is necessary to connect the input in the format [1, 1, w, h] to the original interpolate function. Our approach is to intercept the last two elements of the input tensor, and pass these two elements into the scale_factor parameter of interpolate in list format.

Next, we need to decide how new operators are mapped to ONNX operators. The method of mapping to ONNX is determined by the symbolic method of an operator. The first parameter of the symbolic method must be g, and the subsequent parameters are the custom input of the operator, which is the same as the forward function. The specific definition of ONNX operator   is implemented by g.op. Each parameter of g.op can be mapped to an operator attribute in ONNX:

For other parameters, we can fill in according to the current  Resize operator . Note, however, that we now want the scales parameter to be dynamically determined by the input. Therefore, when filling in the scales of ONNX, we need to fill in the scales in the input parameters of the symbolic method.

Next, let's export the new model as an ONNX model:

x = torch.randn(1, 3, 256, 256) 
 
with torch.no_grad(): 
    torch.onnx.export(model, (x, factor), 
                      "srcnn3.onnx", 
                      opset_version=11, 
                      input_names=['input', 'factor'], 
                      output_names=['output']) 

Visualize the exported " srcnn3.onnx ":

As you can see, the exported ONNX model has two inputs, as we would expect! The second input represents the scaling factor of the image.

Before verifying the PyTorch model and exporting the ONNX model, we set the width and height scaling to 3x3. Now, when inference with ONNX Runtime, we try to use 4x4 scaling:

import onnxruntime 
 
input_factor = np.array([1, 1, 4, 4], dtype=np.float32) 
ort_session = onnxruntime.InferenceSession("srcnn3.onnx") 
ort_inputs = {'input': input_img, 'factor': input_factor} 
ort_output = ort_session.run(None, ort_inputs)[0] 
 
ort_output = np.squeeze(ort_output, 0) 
ort_output = np.clip(ort_output, 0, 255) 
ort_output = np.transpose(ort_output, [1, 2, 0]).astype(np.uint8) 
cv2.imwrite("face_ort_3.png", ort_output) 

Run the above code, you can get a super-resolution image "face_ort_3.png" whose side length is enlarged by 4 times. Dynamic super-resolution model generation is successful! As long as the input_factor is modified, we can freely control the scaling of the image.

The work we just did was actually bypassing the limitations of PyTorch itself and "pinching" an ONNX operator out of thin air. In fact, we can not only create existing ONNX operators, but also define new ONNX operators to expand the expressive capabilities of ONNX. In the follow-up tutorial, we will introduce the method of customizing the new ONNX operator.

Summarize

By studying the first two tutorials, we have completed the entire deployment pipeline and successfully deployed a super-resolution model that supports dynamic magnification. In this process, we not only learned how to simply call the API of each framework to implement model deployment, but also learned how to analyze and try to solve the problems encountered in model deployment.

Again, let us summarize the knowledge points of this tutorial:

  • Common difficulties in model deployment include: dynamic model; implementation of new operators; compatibility between frameworks.
  • Converting PyTorch to ONNX is actually converting each operation into an operator defined by ONNX. For example, Upsample and interpolate in PyTorch will eventually become the Resize operator of ONNX after converting to ONNX.
  • By modifying the symbolic method of the operator inherited from torch.autograd.Function, the behavior of mapping the operator to the ONNX operator can be changed.

So far, the tutorial of "Deploying the first model" has come to an end. Do you feel that you have not learned enough knowledge? It doesn't matter, in the next few tutorials, we will combine the model deployment open source library  MMDeploy  , focusing on the knowledge of ONNX intermediate representation and ONNX Runtime/TensorRT reasoning engine, so that everyone can learn how to deploy more complex models. Stay tuned!

https://github.com/open-mmlab/mmdeploy​github.com/open-mmlab/mmdeploy

Series Portal

OpenMMLab: Interpretation of TorchScript (1): Getting to know TorchScript for the first time

OpenMMLab: Interpretation of TorchScript (2): Torch jit tracer implements analysis 82 Agreed· 5 Comments The article is uploading...ReuploadCancel

OpenMMLab: Interpretation of TorchScript (3): subgraph rewriter43 in jit Agree· 0 Comments Article is uploading...ReuploadCancel

OpenMMLab: Interpretation of TorchScript (4): Alias ​​analysis in Torch jit

OpenMMLab: Introduction to Model Deployment (1): Introduction to Model Deployment

OpenMMLab: Introduction to Model Deployment Tutorial (2): Solving Difficulties in Model Deployment

OpenMMLab: Introductory Tutorial for Model Deployment (3): PyTorch to ONNX Detailed Explanation

OpenMMLab: Model Deployment Tutorial (4): Support more ONNX operators in PyTorch 190 Agreed· 53 Comments The article is uploading...ReuploadCancel

OpenMMLab: Introduction to Model Deployment Tutorial (5): Modification and Debugging of ONNX Model 217 Agreed 25 Comments The article is uploading...ReuploadCancel

Guess you like

Origin blog.csdn.net/qq_43456016/article/details/130249157