DINO reasoning module implementation

How to put a model into practical application? That is, the reasoning module we often say. The previous blogger has introduced how to use it for DETRreasoning. Today, the blogger introduced DINOthe reasoning implementation process:
in fact, DINOthe actual code has given the implementation of the reasoning module. Here the blogger will describe its process. Sort out and give solutions to the problems.

First the required import packages:

import torch, json
from main import build_model_main
from util.slconfig import SLConfig
from util.visualizer import COCOVisualizer
from util import box_ops

Then build the model and load the weight file. It means that if you want to DINOperform inference, you need to put the inference code into DINOthe project. The blogger wants to separate the model from the network and deploy it in the cloud to develop a target detection Interface.

model_config_path = "config/DINO/DINO_4scale.py" 
# change the path of the model config file
model_checkpoint_path = "checkpoint_best_regular.pth" # change 
args = SLConfig.fromfile(model_config_path)
args.device = 'cuda'
model, criterion, postprocessors = build_model_main(args)
checkpoint = torch.load(model_checkpoint_path)
model.load_state_dict(checkpoint['model'])
_ = model.eval()#开启推理

Introduce the label category name in the COCO dataset. The blogger uses the COCO reduced dataset. You can make it according to this format.

with open('util/coco_name.json') as f:
     id2name = json.load(f)
     id2name = {
    
    int(k):v for k,v in id2name.items()}

Dataset read file format:

{
    
    "1": "car",  "2": "truck","3": "bus"}

Next, start to read pictures and start reasoning. Remember to turn it on torch.no_grad, which means that the gradient is not calculated, because the reasoning here does not require gradient updates, otherwise it will burst the video memory.

with torch.no_grad():

Load the image for inference:

from PIL import Image
    import datasets.transforms as T
    image = Image.open("figs/1.jpg").convert("RGB")  # load image
    # transform images
    transform = T.Compose([
        T.RandomResize([800], max_size=1333),
        T.ToTensor(),
        T.Normalize([0.485, 0.456, 0.406], [0.229, 0.224, 0.225])
    ])
    image, _ = transform(image, None)
    # predict images
    output = model.cuda()(image[None].cuda())
    output = postprocessors['bbox'](output, torch.Tensor([[1.0, 1.0]]).cuda())[0]
    # visualize outputs
    thershold = 0.3  # set a thershold
    vslzr = COCOVisualizer()
    scores = output['scores']
    labels = output['labels']
    boxes = box_ops.box_xyxy_to_cxcywh(output['boxes'])
    select_mask = scores > thershold

    box_label = [id2name[int(item)] for item in labels[select_mask]]
    pred_dict = {
    
    
        'boxes': boxes[select_mask],
        'size': torch.Tensor([image.shape[1], image.shape[2]]),
        'box_label': box_label
    }
    vslzr.visualize(image, pred_dict, savedir=None, dpi=100)

The reasoning display diagram is as follows:

insert image description here

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/131445470