記事ディレクトリ

概要
1. 環境展開
- 「YOLOv5」アルゴリズム「ONNX」モデル取得
- `opencv-python` モジュールのインストール
2. キーコード
3. サンプルコード（実行可能）
- 3.1 開梱された状態
- 3.2 クラス呼び出しへのカプセル化

概要

このドキュメントでは主に、ディープニューラルネットワークモジュールをpython使用したプラットフォームとモデルの推論方法について説明します。opencv-pythondnnYOLOv5

この文書には主に次の内容が含まれています。

opencv-pythonモジュールのインストール
YOLOv5モデル形式の説明
ONNXフォーマットモデルのロード
画像データの前処理
モデル推論
NMS座標から座標cxcywhへの変換など、推論結果の後処理。xyxy
主要なメソッド呼び出しとパラメータの説明
完全なサンプルコード

1. 環境展開

`YOLOv5`アルゴリズム`ONNX`モデルの取得

YOLOv5 の公式事前トレーニングモデルは、公式リンクからダウンロードできます。モデルの形式は .ダウンロードリンクptです。公式プロジェクトは、形式モデルを形式モデルに変換するためのスクリプトを提供します。プロジェクトリンク
YOLOv5ptONNX

モデルのエクスポートコマンド:

python export --weights yolov5s.pt --include onnx

注: 命令を実行するためにファイルをエクスポートするために必要な環境のインストールと構成については、プロジェクトの公式READMEドキュメントを参照してください。ここでは繰り返しません。

`opencv-python`モジュールのインストール

仮想環境を作成してアクティブ化する

conda create -n opencv python=3.8 -y
conda activate opencv

pipopencv-pythonモジュールをインストールする
```
pip install opencv-python
```
注:モジュールpipをインストールする場合opencv-python、デフォルトのインストールではCPU推論のみがサポートされます。推論をサポートする必要がある場合はGPU、ソースコードからコンパイルしてインストールする必要があります。具体的なインストール方法は複雑なので、ここでは説明しません。

2. キーコード

2.1 モデルのロード

opencv-pythonこのモジュールは、形式モデルreadNetFromONNXをロードするためのメソッドを提供しますONNX。

import cv2
cv2.dnn.readNetFromONNX(model_path)

2.2 画像データの前処理

データの前処理ステップには、サイズ変更、正規化、カラーチャネル変換、NCWH 寸法変換などが含まれます。

resize以前は、非正方形の画像を処理する非常に一般的なトリックがありました。つまり、グラフィックの最長辺を計算し、この最長辺に基づいて正方形を作成し、元のグラフィックを左上隅に配置して塗りつぶします。これを行う利点は、元のグラフィックスのアスペクト比が変更されず、同時に元のグラフィックスの内容も変更されないことです。

 # image preprocessing, the trick is to make the frame to be a square but not twist the image
row, col, _ = frame.shape  # get the row and column of the origin frame array
_max = max(row, col)  # get the max value of row and column
input_image = np.zeros((_max, _max, 3), dtype=np.uint8)  # create a new array with the max value
input_image[:row, :col, :] = frame  # paste the original frame  to make the input_image to be a square

画像の塗りつぶしが完了したら、引き続きリサイズ、正規化、カラーチャンネル変換などの操作を行ってください。

blob = cv2.dnn.blobFromImage(image, scalefactor=1 / 255.0, size=(640,640), swapRB=True, crop=False)

image: 入力画像データ、numpy.ndarray形式shapeは(H,W,C)、チャネル順序はですBGR。
scalefactor: 画像データの正規化係数、通常は1/255.0。
size: 画像のサイズ変更サイズは、モデルの入力要件の影響を受けます。これは次のとおりです(640,640)。
swapRB: カラーチャネルを交換するかどうか、つまり、交換することBGRを示す変換と交換しないことを示す画像データを読み取るためのカラーチャネルの順序はであり、モデルの入力要件はであるため、ここでカラーチャネルを交換する必要があります。RGB TrueFalseopencvBGRYOLOv5RGB
crop: 画像をトリミングするかどうか。Falseトリミングしないことを意味します。

blobFromImageこの関数は 4 次元 Mat オブジェクト (NCHW 次元順序) を返します。データの形状は次のとおりです。(1,3,640,640)

2.3 モデル推論

推論バックエンドとターゲットを設定する
```
model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
```
モデルをロードした後、推論用のデバイスを設定する必要があります。通常、推論デバイスはでありCPU、設定方法は次のとおりです。
```
model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
```
もちろん、この時点で環境内のopencv-pythonモジュールがGPU推論をサポートしていれば推論に設定することも可能でGPU、設定方法は以下の通りです。
```
model.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
```
opencv-python注:モジュールが推論をサポートしているかどうかを判断する方法はGPU次のとおりです。cv2.cuda.getCudaEnabledDeviceCount()戻り値が 0 より大きい場合はGPU推論がサポートされていることを示し、それ以外の場合はサポートされていないことを意味します。
モデル入力データの設定
```
model.setInput(blob)
```
blob前ステップデータとして取得したデータを前処理します。
モデルの順伝播forwardメソッドを呼び出す
```
outputs = model.forward()
```
outputsはモデル推論の出力、出力形式は (1,25200,5+nc)、は25200モデルによって出力されたグリッドの数、は5+nc各グリッドの予測値5+nc、5はカテゴリの数です。x,y,w,h,confnc

2.4 推論結果の後処理

重複する推論結果が多数あるため、それらを処理し、各信頼レベルとユーザーが設定した信頼しきい値に従ってフィルタリングするbbox必要があり、最終的に対応するカテゴリと信頼レベルが最終的に取得されます。NMSbboxbbox

2.4.1 NMS

opencv-pythonモジュールは処理NMSBoxesのためのメソッドを提供しますNMS。

cv2.dnn.NMSBoxes(bboxes, scores, score_threshold, nms_threshold, eta=None, top_k=None)

bboxes:bboxリスト、shape、(N,4)数量、。N_ _bbox4bboxx,y,w,h
scores:bbox対応する信頼度リスト、shapeは(N,1)、数量Nです。bbox
score_threshold: 信頼度のしきい値。しきい値より小さい場合はbboxフィルタリングされます。
nms_threshold:NMSしきい値

NMSBoxes関数の戻り値はbboxインデックスのリスト、つまり数値shapeです。(M,)Mbbox

2.4.2 スコアしきい値フィルタリング

NMS処理されたインデックスリストによるとbbox、フィルターの信頼度は未満score_thresholdですbbox。

2.4.3 bbox 座標の変換と復元

YOLOv5モデルが出力したbbox座標はフォーマットcxcywh変換が必要な形式でありxyxy、また、以前に画像を加工しているためresize、座標をbbox元の画像のサイズに戻す必要があります。
変換方法は以下のとおりです。

# 获取原始图片的尺寸(填充后)
image_width, image_height, _ = input_image.shape
# 计算缩放比
x_factor = image_width / INPUT_WIDTH  #  640
y_factor = image_height / INPUT_HEIGHT #  640 

# 将cxcywh坐标转换为xyxy坐标
x1 = int((x - w / 2) * x_factor)
y1 = int((y - h / 2) * y_factor)
w = int(w * x_factor)
h = int(h * y_factor)
x2 = x1 + w
y2 = y1 + h

x1、y1、x2、はの座標y2です。bboxxyxy

3. サンプルコード（実行可能）

2 つのソースコードがあり、1 つはデバッグに便利な関数の結合と呼び出しであり、もう 1 つはクラスにパッケージ化されているため、他のプロジェクトへの統合に便利です。

3.1 開梱された状態

"""
running the onnx model inference with opencv dnn module

"""
from typing import List

import cv2
import numpy as np
import time
from pathlib import Path


def build_model(model_path: str) -> cv2.dnn_Net:
    """
    build the model with opencv dnn module
    Args:
        model_path: the path of the model, the model should be in onnx format

    Returns:
        the model object
    """
    # check if the model file exists
    if not Path(model_path).exists():
        raise FileNotFoundError(f"model file {
      
      model_path} not found")
    model = cv2.dnn.readNetFromONNX(model_path)

    # check if the opencv-python in your environment supports cuda
    cuda_available = cv2.cuda.getCudaEnabledDeviceCount() > 0

    if cuda_available:  # if cuda is available, use cuda
        model.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
        model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
    else:  # if cuda is not available, use cpu
        model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)
    return model


def inference(image: np.ndarray, model: cv2.dnn_Net) -> np.ndarray:
    """
    inference the model with the input image
    Args:
        image: the input image in numpy array format, the shape should be (height, width, channel),
        the color channels should be in GBR order, like the original opencv image format
        model: the model object

    Returns:
        the output data of the model, the shape should be (1, 25200, nc+5), nc is the number of classes
    """
    # image preprocessing, include resize, normalization, channel swap like BGR to RGB, and convert to blob format
    # get a 4-dimensional Mat with NCHW dimensions order.
    blob = cv2.dnn.blobFromImage(image, 1 / 255.0, (INPUT_WIDTH, INPUT_HEIGHT), swapRB=True, crop=False)

    # the alternative way to get the blob
    # rgb = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)
    # input_image = cv2.resize(src=rgb, dsize=(INPUT_WIDTH, INPUT_HEIGHT))
    # blob_img = np.float32(input_image) / 255.0
    # input_x = blob_img.transpose((2, 0, 1))
    # blob = np.expand_dims(input_x, 0)

    if cv2.cuda.getCudaEnabledDeviceCount() > 0:
        model.setPreferableBackend(cv2.dnn.DNN_BACKEND_CUDA)
        model.setPreferableTarget(cv2.dnn.DNN_TARGET_CUDA)
    else:
        model.setPreferableBackend(cv2.dnn.DNN_BACKEND_OPENCV)
        model.setPreferableTarget(cv2.dnn.DNN_TARGET_CPU)

    # set the input data
    model.setInput(blob)

    start = time.perf_counter()
    # inference
    outs = model.forward()

    end = time.perf_counter()

    print("inference time: ", end - start)

    # the shape of the output data is (1, 25200, nc+5), nc is the number of classes
    return outs


def xywh_to_xyxy(bbox_xywh, image_width, image_height):
    """
    Convert bounding box coordinates from (center_x, center_y, width, height) to (x_min, y_min, x_max, y_max) format.

    Parameters:
        bbox_xywh (list or tuple): Bounding box coordinates in (center_x, center_y, width, height) format.
        image_width (int): Width of the image.
        image_height (int): Height of the image.

    Returns:
        tuple: Bounding box coordinates in (x_min, y_min, x_max, y_max) format.
    """
    center_x, center_y, width, height = bbox_xywh
    x_min = max(0, int(center_x - width / 2))
    y_min = max(0, int(center_y - height / 2))
    x_max = min(image_width - 1, int(center_x + width / 2))
    y_max = min(image_height - 1, int(center_y + height / 2))
    return x_min, y_min, x_max, y_max


def wrap_detection(
        input_image: np.ndarray,
        output_data: np.ndarray,
        labels: List[str],
        confidence_threshold: float = 0.6
) -> (List[int], List[float], List[List[int]]):
    # the shape of the output_data is (25200,5+nc),
    # the first 5 elements are [x, y, w, h, confidence], the rest are prediction scores of each class

    image_width, image_height, _ = input_image.shape
    x_factor = image_width / INPUT_WIDTH
    y_factor = image_height / INPUT_HEIGHT

    # transform the output_data[:, 0:4] from (x, y, w, h) to (x_min, y_min, x_max, y_max)

    indices = cv2.dnn.NMSBoxes(output_data[:, 0:4].tolist(), output_data[:, 4].tolist(), 0.6, 0.4)

    raw_boxes = output_data[:, 0:4][indices]
    raw_confidences = output_data[:, 4][indices]
    raw_class_prediction_probabilities = output_data[:, 5:][indices]

    criteria = raw_confidences > confidence_threshold
    raw_class_prediction_probabilities = raw_class_prediction_probabilities[criteria]
    raw_boxes = raw_boxes[criteria]
    raw_confidences = raw_confidences[criteria]

    bounding_boxes, confidences, class_ids = [], [], []
    for class_prediction_probability, box, confidence in zip(raw_class_prediction_probabilities, raw_boxes,
                                                             raw_confidences):
        #
        # find the least and most probable classes' indices and their probabilities
        # min_val, max_val, min_loc, mac_loc = cv2.minMaxLoc(class_prediction_probability)
        most_probable_class_index = np.argmax(class_prediction_probability)
        label = labels[most_probable_class_index]
        confidence = float(confidence)

        # bounding_boxes.append(box)
        # confidences.append(confidence)
        # class_ids.append(most_probable_class_index)

        x, y, w, h = box
        left = int((x - 0.5 * w) * x_factor)
        top = int((y - 0.5 * h) * y_factor)
        width = int(w * x_factor)
        height = int(h * y_factor)
        bounding_box = [left, top, width, height]
        bounding_boxes.append(bounding_box)
        confidences.append(confidence)
        class_ids.append(most_probable_class_index)

    return class_ids, confidences, bounding_boxes

coco_class_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
                    "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
                    "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack",
                    "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball",
                    "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
                    "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
                    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
                    "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse",
                    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
                    "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier",
                    "toothbrush"]
# generate different colors for coco classes
colors = np.random.uniform(0, 255, size=(len(coco_class_names), 3))

INPUT_WIDTH = 640
INPUT_HEIGHT = 640
CONFIDENCE_THRESHOLD = 0.7
NMS_THRESHOLD = 0.45

def video_detector(video_src):
    cap = cv2.VideoCapture(video_src)

    # 3. inference and show the result in a loop
    while cap.isOpened():
        success, frame = cap.read()
        start = time.perf_counter()
        if not success:
            break
        # image preprocessing, the trick is to make the frame to be a square but not twist the image
        row, col, _ = frame.shape  # get the row and column of the origin frame array
        _max = max(row, col)  # get the max value of row and column
        input_image = np.zeros((_max, _max, 3), dtype=np.uint8)  # create a new array with the max value
        input_image[:row, :col, :] = frame  # paste the original frame  to make the input_image to be a square
        # inference
        output_data = inference(input_image, net)  # the shape of output_data is (1, 25200, 85)

        # 4. wrap the detection result
        class_ids, confidences, boxes = wrap_detection(input_image, output_data[0], coco_class_names)

        # 5. draw the detection result on the frame
        for (class_id, confidence, box) in zip(class_ids, confidences, boxes):
            color = colors[int(class_id) % len(colors)]
            label = coco_class_names[int(class_id)]

            xmin, ymin, width, height = box
            cv2.rectangle(frame, (xmin, ymin), (xmin + width, ymin + height), color, 2)
            # cv2.rectangle(frame, box, color, 2)
            # cv2.rectangle(frame, [box[0], box[1], box[2], box[3]], color, thickness=2)

            # cv2.rectangle(frame, (box[0], box[1] - 20), (box[0] + 100, box[1]), color, -1)
            cv2.putText(frame, str(label), (box[0], box[1] - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
        finish = time.perf_counter()
        FPS = round(1.0 / (finish - start), 2)
        cv2.putText(frame, str(FPS), (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
        # 6. show the frame
        cv2.imshow("frame", frame)

        # 7. press 'q' to exit
        if cv2.waitKey(1) == ord('q'):
            break

    # 8. release the capture and destroy all windows
    cap.release()
    cv2.destroyAllWindows()

if __name__ == '__main__':
    # there are 4 steps to use opencv dnn module to inference onnx model exported by yolov5 and show the result

    # 1. load the model
    model_path = Path("weights/yolov5s.onnx")
    net = build_model(str(model_path))
    # 2. load the video capture
    # video_source = 0
    video_source = 'rtsp://admin:[email protected]:554/h264/ch1/main/av_stream'
    video_detector(video_source)

    exit(0)

3.2 クラス呼び出しへのカプセル化

from typing import List

import onnx
from torchvision import transforms

from torchvision.ops import nms,box_convert
import cv2
import time
import numpy as np
import onnxruntime as ort
import torch

INPUT_WIDTH = 640
INPUT_HEIGHT = 640

def wrap_detection(
        input_image: np.ndarray,
        output_data: np.ndarray,
        labels: List[str],
        confidence_threshold: float = 0.6
) -> (List[int], List[float], List[List[int]]):
    # the shape of the output_data is (25200,5+nc),
    # the first 5 elements are [x, y, w, h, confidence], the rest are prediction scores of each class

    image_width, image_height, _ = input_image.shape
    x_factor = image_width / INPUT_WIDTH
    y_factor = image_height / INPUT_HEIGHT

    # transform the output_data[:, 0:4] from (x, y, w, h) to (x_min, y_min, x_max, y_max)
    # output_data[:, 0:4] = np.apply_along_axis(xywh_to_xyxy, 1, output_data[:, 0:4], image_width, image_height)

    nms_start = time.perf_counter()
    indices = cv2.dnn.NMSBoxes(output_data[:, 0:4].tolist(), output_data[:, 4].tolist(), 0.6, 0.4)
    nms_finish = time.perf_counter()
    print(f"nms time: {
      
      nms_finish - nms_start}")
    # print(indices)
    raw_boxes = output_data[:, 0:4][indices]
    raw_confidences = output_data[:, 4][indices]
    raw_class_prediction_probabilities = output_data[:, 5:][indices]

    criteria = raw_confidences > confidence_threshold
    raw_class_prediction_probabilities = raw_class_prediction_probabilities[criteria]
    raw_boxes = raw_boxes[criteria]
    raw_confidences = raw_confidences[criteria]

    bounding_boxes, confidences, class_ids = [], [], []
    for class_prediction_probability, box, confidence in zip(raw_class_prediction_probabilities, raw_boxes,
                                                             raw_confidences):
        #
        # find the least and most probable classes' indices and their probabilities
        # min_val, max_val, min_loc, mac_loc = cv2.minMaxLoc(class_prediction_probability)
        most_probable_class_index = np.argmax(class_prediction_probability)
        label = labels[most_probable_class_index]
        confidence = float(confidence)

        # bounding_boxes.append(box)
        # confidences.append(confidence)
        # class_ids.append(most_probable_class_index)

        x, y, w, h = box
        left = int((x - 0.5 * w) * x_factor)
        top = int((y - 0.5 * h) * y_factor)
        width = int(w * x_factor)
        height = int(h * y_factor)
        bounding_box = [left, top, width, height]
        bounding_boxes.append(bounding_box)
        confidences.append(confidence)
        class_ids.append(most_probable_class_index)

    return class_ids, confidences, bounding_boxes


coco_class_names = ["person", "bicycle", "car", "motorcycle", "airplane", "bus", "train", "truck", "boat",
                    "traffic light", "fire hydrant", "stop sign", "parking meter", "bench", "bird", "cat",
                    "dog", "horse", "sheep", "cow", "elephant", "bear", "zebra", "giraffe", "backpack",
                    "umbrella", "handbag", "tie", "suitcase", "frisbee", "skis", "snowboard", "sports ball",
                    "kite", "baseball bat", "baseball glove", "skateboard", "surfboard", "tennis racket",
                    "bottle", "wine glass", "cup", "fork", "knife", "spoon", "bowl", "banana", "apple",
                    "sandwich", "orange", "broccoli", "carrot", "hot dog", "pizza", "donut", "cake", "chair",
                    "couch", "potted plant", "bed", "dining table", "toilet", "tv", "laptop", "mouse",
                    "remote", "keyboard", "cell phone", "microwave", "oven", "toaster", "sink",
                    "refrigerator", "book", "clock", "vase", "scissors", "teddy bear", "hair drier",
                    "toothbrush"]

colors = np.random.uniform(0, 255, size=(len(coco_class_names), 3))
if __name__ == '__main__':
    # Load the model
    model_path = "weights/yolov5s.onnx"
    onnx_model = onnx.load(model_path)
    onnx.checker.check_model(onnx_model)
    session = ort.InferenceSession(model_path, providers=['CUDAExecutionProvider',"CPUExecutionProvider"])
    capture = cv2.VideoCapture(0)

    trans = transforms.Compose([
        transforms.Resize((640, 640)),
        transforms.ToTensor()
    ])

    from PIL import Image

    while capture.isOpened():
        success, frame = capture.read()
        start = time.perf_counter()
        if not success:
            break
        rows, cols, channels = frame.shape
        # Preprocessing
        max_size = max(rows, cols)
        input_image = np.zeros((max_size, max_size, 3), dtype=np.uint8)
        input_image[:rows, :cols, :] = frame
        input_image = cv2.cvtColor(input_image, cv2.COLOR_BGR2RGB)

        inputs = trans(Image.fromarray(input_image))
        inputs = inputs.unsqueeze(0)
        print(inputs.shape)
        # inputs.to('cuda')
        ort_inputs = {
    
    session.get_inputs()[0].name: inputs.numpy()}
        ort_outs = session.run(None, ort_inputs)
        out_prob = ort_outs[0][0]
        print(out_prob.shape)

        scores = out_prob[:, 4] # Confidence scores are in the 5th column (0-indexed)
        class_ids = out_prob[:, 5:].argmax(axis=1)  # Class labels are from the 6th column onwards
        bounding_boxes_xywh = out_prob[:, :4]  # Bounding boxes in cxcywh format

        # Filter out boxes based on confidence threshold
        confidence_threshold = 0.7
        mask = scores >= confidence_threshold
        class_ids = class_ids[mask]
        bounding_boxes_xywh = bounding_boxes_xywh[mask]
        scores = scores[mask]

        bounding_boxes_xywh = torch.tensor(bounding_boxes_xywh, dtype=torch.float32)

        # Convert bounding boxes from xywh to xyxy format
        bounding_boxes_xyxy = box_convert(bounding_boxes_xywh, in_fmt='cxcywh', out_fmt='xyxy')

        # Perform Non-Maximum Suppression to filter candidate boxes


        scores = torch.tensor(scores, dtype=torch.float32)
        bounding_boxes_xyxy.to('cuda')
        scores.to('cuda')
        nms_start = time.perf_counter()
        keep_indices = nms(bounding_boxes_xyxy, scores, 0.4)
        nms_end = time.perf_counter()
        print(f"NMS took {
      
      nms_end - nms_start} seconds")
        class_ids = class_ids[keep_indices]
        confidences = scores[keep_indices]
        bounding_boxes = bounding_boxes_xyxy[keep_indices]

        # class_ids, confidences, bounding_boxes = wrap_detection(input_image, out_prob[0], coco_class_names, 0.6)
        # break

        for i in range(len(keep_indices)):
            try:
                class_id = class_ids[i]
            except IndexError as e:
                print(e)
                print(class_ids,i, len(keep_indices))
                break
            confidence = confidences[i]
            box = bounding_boxes[i]
            color = colors[int(class_id) % len(colors)]
            label = coco_class_names[int(class_id)]

            # cv2.rectangle(frame, box, color, 2)

            print(type(box), box[0], box[1], box[2], box[3], box)
            xmin, ymin, xmax, ymax = int(box[0]), int(box[1]), int(box[2]), int(box[3])
            cv2.rectangle(frame, (xmin, ymin), (xmax, ymax), color, 2)
            # cv2.rectangle(frame, box, color, 2)
            # cv2.rectangle(frame, [box[0], box[1], box[2], box[3]], color, thickness=2)

            cv2.rectangle(frame, (xmin, ymin - 20), (xmin + 100, ymin), color, -1)
            cv2.putText(frame, str(label), (xmin, ymin - 5), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)

        finish = time.perf_counter()
        FPS = round(1.0 / (finish - start), 2)
        cv2.putText(frame, f"FPS: {
      
      str(FPS)}", (10, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 0), 2)
        # 6. show the frame
        cv2.imshow("frame", frame)

        # 7. press 'q' to exit
        if cv2.waitKey(1) == ord('q'):
            break

    # 8. release the capture and destroy all windows
    capture.release()
    cv2.destroyAllWindows()

    exit(0)

OpenCV DNN モジュール推論 YOLOv5 ONNX モデルメソッド

記事ディレクトリ

概要

1. 環境展開

`YOLOv5`アルゴリズム`ONNX`モデルの取得

`opencv-python`モジュールのインストール

2. キーコード

2.1 モデルのロード

2.2 画像データの前処理

2.3 モデル推論

2.4 推論結果の後処理

2.4.1 NMS

2.4.2 スコアしきい値フィルタリング

2.4.3 bbox 座標の変換と復元

3. サンプルコード（実行可能）

3.1 開梱された状態

3.2 クラス呼び出しへのカプセル化

おすすめ

OpenCV DNN モジュール推論 YOLOv5 ONNX モデル メソッド

記事ディレクトリ

概要

1. 環境展開

YOLOv5アルゴリズムONNXモデルの取得

opencv-pythonモジュールのインストール

2. キーコード

2.1 モデルのロード

2.2 画像データの前処理

2.3 モデル推論

2.4 推論結果の後処理

2.4.1 NMS

2.4.2 スコアしきい値フィルタリング

2.4.3 bbox 座標の変換と復元

3. サンプルコード（実行可能）

3.1 開梱された状態

3.2 クラス呼び出しへのカプセル化

おすすめ

OpenCV DNN モジュール推論 YOLOv5 ONNX モデルメソッド

`YOLOv5`アルゴリズム`ONNX`モデルの取得

`opencv-python`モジュールのインストール