The process of yolov8onnx

       Recently, yolov8 is going to be deployed on the Horizon Rising Sun x3, but the Horizon official does not provide a routine for the model post-processing process, so I will do my own research. Since the model quantization of the Horizon only supports the opset10/11 model in onnx, the opset must be set to 11 during the conversion process. Configure the output onnx, opset11 in the default.yaml file, and export the onnx model.

        Onnx local cpu inference was performed on my own computer, which is about 50ms per frame, which is about 20 frames. The following describes the debug process of yolov8 post-processing:

1. First start with the predict_cli function

 2. Enter the stream_inference function (reasoning) after 1:

 The setup_model in the default hyperparameter setting function is used to complete the identification of the form of the back-end model

3. Enter setup_source

 check_imgsz checks whether the image size is consistent, and then load_inference_source processes the inference source data

 Then there is the LoadImages class, which uses LetterBox. The role of LetterBox is to resize the input image to the specified size (`imgsz` parameter) and add a black background around the image to maintain the aspect ratio.

 
 

This code first uses the stack function to combine the shapes of multiple images processed by the LetterBox function into a new array s. Then, use the np.unique function to find the unique row in array s and get its row count. If the number of unique rows is 1, then all images have the same shape and rect inference can be done. What this code does is calculate the shape of all the images in the dataset and determine if rectangle inference is possible. Additionally, it stores the transformation and size of the dataset.

 transforms is None, so at each iteration each image will be resized using the LetterBox function and stacked into a new array im. Then, convert all images in the im array from BGR format to RGB format, and convert their dimensions from BHWC (batch, height, width, channels) to BCHW (batch, channels, height, width).

4. Perform preprocessing before inputting onnx, that is, preprocess

 Data were normalized and not half-precision.

5.inference

 The shape of the output is 1*84*8400, the data set of coco128 is used here, and the category is 80 classes

6. Enter the topic of this article, postprocessing postprocess

(1) The non_max_suppression method in the ops code is called

 For the model output, there are generally two, one is the prediction result, and the other is the loss function output result, so the first dimension is selected in the perdiction, and there are some variable records:

nc: number of classes

nm: the number of attributes per rectangle

mi: attribute starting position

xc: returns a 1*8400 Tensor, bool type, indicating whether the detection frame meets the requirements of the confidence threshold

output: Output the attributes of each detection box.

(2) Start to process the prediction results, xi is the index, and x is the tensor of 84*8400

 Taking the built-in bus.jpg as an example, the model output found 48 target boxes that conform to conf, so x became a 48*84 tensor

Slice x, in 84, into 4 (box), 80 (cls), 0 (mask)

In the xywhxyxy function, the four elements of the box are changed from x, y (center point), the width and height of the box to (x1, y1), (x2, y2), multi_label is false, so the index and confidence of the class with the highest confidence are returned, and then box (48*4), conf (48*1), j (48*1), mask (48*0) are one-dimensionally spliced ​​to obtain a (48*6)
tensor

n is the number of boxes, then sort the boxes (descending order), select the max_nms boxes set in the hyperparameters, the default is 300, and finally x is still a (48*6) tensor, and then perform conf calculations on the 48 boxes corresponding to the category, max=wh indicates the impact on the score when adding the size of the box, and the final returned c is a (48*1) 

(3) The nms method in torchvision is used to return the index tensor->i of the remaining detection frame. In this debug, there are 5 indexes in i, so there are 5 frames left. The specific nms process:

     ① First, according to the bounding boxes predicted by the model, the confidence score of each bounding box is calculated.

     ② Next, starting from the bounding box with the highest confidence score, all bounding boxes are sorted in descending order of confidence score.

     ③ Then, the bounding box with the highest confidence score is selected and added to the output list.

     ④ For the remaining bounding boxes, calculate their overlapping area with the last bounding box in the output list (i.e. intersection), and calculate the ratio of their overlapping area to their own area (i.e. IoU). If the ratio is greater than some threshold (usually 0.5), the bounding box is removed from the list, otherwise the bounding box is kept.

     ⑤ Repeat steps 3 and 4 until all bounding boxes are processed.

Post the content of the official function:

source code: 

import torch
from torch.jit.annotations import Tuple
from torch import Tensor
from ._box_convert import _box_cxcywh_to_xyxy, _box_xyxy_to_cxcywh, _box_xywh_to_xyxy, _box_xyxy_to_xywh
import torchvision
from torchvision.extension import _assert_has_ops


[docs]def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
    """
    Performs non-maximum suppression (NMS) on the boxes according
    to their intersection-over-union (IoU).

    NMS iteratively removes lower scoring boxes which have an
    IoU greater than iou_threshold with another (higher scoring)
    box.

    If multiple boxes have the exact same score and satisfy the IoU
    criterion with respect to a reference box, the selected box is
    not guaranteed to be the same between CPU and GPU. This is similar
    to the behavior of argsort in PyTorch when repeated values are present.

    Parameters
    ----------
    boxes : Tensor[N, 4])
        boxes to perform NMS on. They
        are expected to be in (x1, y1, x2, y2) format
    scores : Tensor[N]
        scores for each one of the boxes
    iou_threshold : float
        discards all overlapping
        boxes with IoU > iou_threshold

    Returns
    -------
    keep : Tensor
        int64 tensor with the indices
        of the elements that have been kept
        by NMS, sorted in decreasing order of scores
    """
    _assert_has_ops()
    return torch.ops.torchvision.nms(boxes, scores, iou_threshold)



[docs]@torch.jit._script_if_tracing
def batched_nms(
    boxes: Tensor,
    scores: Tensor,
    idxs: Tensor,
    iou_threshold: float,
) -> Tensor:
    """
    Performs non-maximum suppression in a batched fashion.

    Each index value correspond to a category, and NMS
    will not be applied between elements of different categories.

    Parameters
    ----------
    boxes : Tensor[N, 4]
        boxes where NMS will be performed. They
        are expected to be in (x1, y1, x2, y2) format
    scores : Tensor[N]
        scores for each one of the boxes
    idxs : Tensor[N]
        indices of the categories for each one of the boxes.
    iou_threshold : float
        discards all overlapping boxes
        with IoU > iou_threshold

    Returns
    -------
    keep : Tensor
        int64 tensor with the indices of
        the elements that have been kept by NMS, sorted
        in decreasing order of scores
    """
    if boxes.numel() == 0:
        return torch.empty((0,), dtype=torch.int64, device=boxes.device)
    # strategy: in order to perform NMS independently per class.
    # we add an offset to all the boxes. The offset is dependent
    # only on the class idx, and is large enough so that boxes
    # from different classes do not overlap
    else:
        max_coordinate = boxes.max()
        offsets = idxs.to(boxes) * (max_coordinate + torch.tensor(1).to(boxes))
        boxes_for_nms = boxes + offsets[:, None]
        keep = nms(boxes_for_nms, scores, iou_threshold)
        return keep



[docs]def remove_small_boxes(boxes: Tensor, min_size: float) -> Tensor:
    """
    Remove boxes which contains at least one side smaller than min_size.

    Arguments:
        boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
        min_size (float): minimum size

    Returns:
        keep (Tensor[K]): indices of the boxes that have both sides
            larger than min_size
    """
    ws, hs = boxes[:, 2] - boxes[:, 0], boxes[:, 3] - boxes[:, 1]
    keep = (ws >= min_size) & (hs >= min_size)
    keep = torch.where(keep)[0]
    return keep



[docs]def clip_boxes_to_image(boxes: Tensor, size: Tuple[int, int]) -> Tensor:
    """
    Clip boxes so that they lie inside an image of size `size`.

    Arguments:
        boxes (Tensor[N, 4]): boxes in (x1, y1, x2, y2) format
        size (Tuple[height, width]): size of the image

    Returns:
        clipped_boxes (Tensor[N, 4])
    """
    dim = boxes.dim()
    boxes_x = boxes[..., 0::2]
    boxes_y = boxes[..., 1::2]
    height, width = size

    if torchvision._is_tracing():
        boxes_x = torch.max(boxes_x, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
        boxes_x = torch.min(boxes_x, torch.tensor(width, dtype=boxes.dtype, device=boxes.device))
        boxes_y = torch.max(boxes_y, torch.tensor(0, dtype=boxes.dtype, device=boxes.device))
        boxes_y = torch.min(boxes_y, torch.tensor(height, dtype=boxes.dtype, device=boxes.device))
    else:
        boxes_x = boxes_x.clamp(min=0, max=width)
        boxes_y = boxes_y.clamp(min=0, max=height)

    clipped_boxes = torch.stack((boxes_x, boxes_y), dim=dim)
    return clipped_boxes.reshape(boxes.shape)



[docs]def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:
    """
    Converts boxes from given in_fmt to out_fmt.
    Supported in_fmt and out_fmt are:

    'xyxy': boxes are represented via corners, x1, y1 being top left and x2, y2 being bottom right.

    'xywh' : boxes are represented via corner, width and height, x1, y2 being top left, w, h being width and height.

    'cxcywh' : boxes are represented via centre, width and height, cx, cy being center of box, w, h
    being width and height.

    Arguments:
        boxes (Tensor[N, 4]): boxes which will be converted.
        in_fmt (str): Input format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh'].
        out_fmt (str): Output format of given boxes. Supported formats are ['xyxy', 'xywh', 'cxcywh']

    Returns:
        boxes (Tensor[N, 4]): Boxes into converted format.
    """

    allowed_fmts = ("xyxy", "xywh", "cxcywh")
    if in_fmt not in allowed_fmts or out_fmt not in allowed_fmts:
        raise ValueError("Unsupported Bounding Box Conversions for given in_fmt and out_fmt")

    if in_fmt == out_fmt:
        return boxes.clone()

    if in_fmt != 'xyxy' and out_fmt != 'xyxy':
        # convert to xyxy and change in_fmt xyxy
        if in_fmt == "xywh":
            boxes = _box_xywh_to_xyxy(boxes)
        elif in_fmt == "cxcywh":
            boxes = _box_cxcywh_to_xyxy(boxes)
        in_fmt = 'xyxy'

    if in_fmt == "xyxy":
        if out_fmt == "xywh":
            boxes = _box_xyxy_to_xywh(boxes)
        elif out_fmt == "cxcywh":
            boxes = _box_xyxy_to_cxcywh(boxes)
    elif out_fmt == "xyxy":
        if in_fmt == "xywh":
            boxes = _box_xywh_to_xyxy(boxes)
        elif in_fmt == "cxcywh":
            boxes = _box_cxcywh_to_xyxy(boxes)
    return boxes



[docs]def box_area(boxes: Tensor) -> Tensor:
    """
    Computes the area of a set of bounding boxes, which are specified by its
    (x1, y1, x2, y2) coordinates.

    Arguments:
        boxes (Tensor[N, 4]): boxes for which the area will be computed. They
            are expected to be in (x1, y1, x2, y2) format

    Returns:
        area (Tensor[N]): area for each box
    """
    return (boxes[:, 2] - boxes[:, 0]) * (boxes[:, 3] - boxes[:, 1])



# implementation from https://github.com/kuangliu/torchcv/blob/master/torchcv/utils/box.py
# with slight modifications
[docs]def box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
    """
    Return intersection-over-union (Jaccard index) of boxes.

    Both sets of boxes are expected to be in (x1, y1, x2, y2) format.

    Arguments:
        boxes1 (Tensor[N, 4])
        boxes2 (Tensor[M, 4])

    Returns:
        iou (Tensor[N, M]): the NxM matrix containing the pairwise IoU values for every element in boxes1 and boxes2
    """
    area1 = box_area(boxes1)
    area2 = box_area(boxes2)

    lt = torch.max(boxes1[:, None, :2], boxes2[:, :2])  # [N,M,2]
    rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])  # [N,M,2]

    wh = (rb - lt).clamp(min=0)  # [N,M,2]
    inter = wh[:, :, 0] * wh[:, :, 1]  # [N,M]

    iou = inter / (area1[:, None] + area2 - inter)
    return iou



# Implementation adapted from https://github.com/facebookresearch/detr/blob/master/util/box_ops.py
[docs]def generalized_box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:
    """
    Return generalized intersection-over-union (Jaccard index) of boxes.

    Both sets of boxes are expected to be in (x1, y1, x2, y2) format.

    Arguments:
        boxes1 (Tensor[N, 4])
        boxes2 (Tensor[M, 4])

    Returns:
        generalized_iou (Tensor[N, M]): the NxM matrix containing the pairwise generalized_IoU values
        for every element in boxes1 and boxes2
    """

    # degenerate boxes gives inf / nan results
    # so do an early check
    assert (boxes1[:, 2:] >= boxes1[:, :2]).all()
    assert (boxes2[:, 2:] >= boxes2[:, :2]).all()

    area1 = box_area(boxes1)
    area2 = box_area(boxes2)

    lt = torch.max(boxes1[:, None, :2], boxes2[:, :2])  # [N,M,2]
    rb = torch.min(boxes1[:, None, 2:], boxes2[:, 2:])  # [N,M,2]

    wh = (rb - lt).clamp(min=0)  # [N,M,2]
    inter = wh[:, :, 0] * wh[:, :, 1]  # [N,M]

    union = area1[:, None] + area2 - inter

    iou = inter / union

    lti = torch.min(boxes1[:, None, :2], boxes2[:, :2])
    rbi = torch.max(boxes1[:, None, 2:], boxes2[:, 2:])

    whi = (rbi - lti).clamp(min=0)  # [N,M,2]
    areai = whi[:, :, 0] * whi[:, :, 1]

    return iou - (areai - union) / areai

The above is the official source code, with some explanations for the above functions:

       def nms(boxes: Tensor, scores: Tensor, iou_threshold: float) -> Tensor:
       basic NMS function. Given a set of bounding boxes and corresponding scores, the function ranks the bounding boxes by the IoU value between them and applies a thresholding strategy to remove overlapping bounding boxes. This function consists of the following steps:

        Receive input parameters boxes (N, 4) , N is the number of bounding boxes, scores(N,), contains the score and IoU threshold of each bounding box. Returns keep, containing the indices of the kept bounding boxes sorted by their scores from highest to lowest.

       def batched_nms(
            boxes: Tensor,
            scores: Tensor,
            idxs: Tensor,
            iou_threshold: float, ) -> Tensor:

        A PyTorch function that executes NMS in batch mode. The function supports grouping bounding boxes by category and doing NMS on each category independently. Implementation consists of the following steps:

        Receive input parameters  boxes(N, 4), N is the number of bounding boxes, scores(N,), which contains the score of each bounding box), idxs (N,) , which contains the category index and IoU threshold corresponding to each bounding box. If there are no bounding boxes in the input tensor  boxes , an empty tensor is returned, indicating that there are no bounding boxes to keep. Otherwise, compute offsets for each class to ensure no overlap between bounding boxes. Specifically, the index of each category is multiplied by a value greater than the maximum value of all bounding box coordinates and converted to the  boxes same data type as the tensor. Pass the offset bounding boxes to  nms the function along with the corresponding scores and class indices to perform NMS on the bounding boxes of each class independently. Returns  keep a Tensor containing the indices of the retained bounding boxes sorted by their scores from highest to lowest.

        def remove_small_boxes(boxes: Tensor, min_size: float) -> Tensor:

        The function remove_small_boxes is used to remove bounding boxes that have at least one bounding box that is small. Include the following steps:

        Tensor that takes input parameters boxes(N, 4), where N is the number of bounding boxes and min_size (minimum size). Calculate the width and height of each bounding box and determine which bounding boxes have both width and height greater than or equal to min_size. Returns keep, containing the indices of kept bounding boxes whose width and height are both greater than or equal to min_size.

        def clip_boxes_to_image(boxes: Tensor, size: Tuple[int, int]) -> Tensor:

        clip_boxes_to_image is used to clip bounding boxes to within the bounds of a given image. This function consists of the following steps:

        Receive the tensor and size of the input parameters boxes(N, 4) (a two-tuple representing the height and width of the image), extract the coordinates in the input tensor boxes as boxes_x and boxes_y tensors, and calculate the height and width of the image. For JIT mode), clips the values ​​of the boxes_x and boxes_y tensors to the range [0, width] and [0, height], representing the width and height of the image, respectively. For non-trace mode, use the clamp function to do the same. Merges the clipped boxes_x and boxes_y tensors into one tensor and returns a clipped_boxes tensor of shape (N, 4) where the coordinates of each bounding box have been clipped to fit within the bounds of the image.

        def box_convert(boxes: Tensor, in_fmt: str, out_fmt: str) -> Tensor:

        Here is a PyTorch function for converting bounding box formats.

        xyxy: Indicates that the bounding box is represented by the coordinates of the upper left corner and the lower right corner.
        xywh: Indicates that the bounding box is represented by the coordinates of the upper left corner and the width and height.
        cxcywh: Indicates that the bounding box is represented by the center point coordinates and width and height.

        _box_xyxy_to_xywh: Convert bounding box from `'xyxy'` format to `'xywh'` format.
        _box_xyxy_to_cxcywh: Convert bounding box from `'xyxy'` format to `'cxcywh'` format.
        _box_xywh_to_xyxy: Convert bounding box from `'xywh'` format to `'xyxy'` format.
        _box_cxcywh_to_xyxy: Convert bounding box from `'cxcywh'` format to `'xyxy'` format.

        Returns the transformed bounding boxes boxes.

        def box_area(boxes: Tensor) -> Tensor:

        Calculate the area of ​​the bounding box, this is a simpler answer

        def box_iou(boxes1: Tensor, boxes2: Tensor) -> Tensor:

        Calculate the intersection and union ratio of bounding boxes. The function  box_iou receives the input parameters  boxes1 and  boxes2, which respectively represent two sets of bounding boxes. The format is  (x1, y1, x2, y2).

area1 Calculate the sum          of the areas of the two sets of bounding boxes  area2, using a function  box_area to calculate the area of ​​each bounding box. For each bounding box pair  (i, j), find the maximum and minimum values ​​of their upper-left coordinates and lower-right coordinates, denote and  lt calculate  rb。the width and height of each bounding box pair  wh, ie  wh[i,j] = rb[i,j] - lt[i,j], and truncate the negative part to 0. Compute the intersection area of ​​each pair of bounding boxes  inter, ie  inter[i,j] = wh[i,j, 0] * wh[i,j, 1]. Computes the area of ​​the union of pairs of bounding boxes  union, ie  union[i,j] = area1[i] + area2[j] - inter[i,j]. Compute the intersection-over-union ratio of bounding box pairs  iou, ie  iou[i,j] = inter[i,j] / union[i,j]. Returns a shape of  (N, M) and   the   number of bounding boxes for the two sets respectively.iouNM

Then choose 5 of x (48*6), which becomes (5*6) and the final output is also 5*6

The last thing is to restore the boxes to the original size of the picture frame! The next step is to bring this process to fruition on the horizon!

Guess you like

Origin blog.csdn.net/w1036427372/article/details/130048333