[Target Detection Actual Combat] A target detection combat framework (based on Python and PyTorch development)


foreword

Object detection is an important task in computer vision, which can be used in object tracking, automatic driving, intelligent security and other fields. In practical applications, we often need to design different target detection algorithms for different scenarios and data sets, so a flexible and scalable target detection framework is very useful.

This article will introduce a practical framework for target detection, which is developed based on Python and PyTorch , and supports common target detection algorithms (such as Faster R-CNN, SSD, YOLOv5, etc.), multi-task learning and distributed training. The following is an example to demonstrate how to use this framework for target detection.

data preparation

First, we need to prepare the dataset. This example uses the COCO dataset, which can be downloaded from the official website. We just need to download the training set and validation set, put them both in a folder, like this:

data/
├── train2017/
├── val2017/
└── annotations/
    ├── instances_train2017.json
    └── instances_val2017.json

Among them, train2017/ and val2017/ store the pictures of the training set and validation set respectively, and annotations/ stores the annotation files of the COCO dataset (saved in JSON format).

Next, we need to use the COCO API to read the data and label it. COCO API is a set of official Python tools for reading, processing and visualizing COCO datasets. It can be downloaded and installed on GitHub.

!git clone https://github.com/cocodataset/cocoapi.git
!cd cocoapi/PythonAPI && python setup.py install --user

We can then write the data loader coco.py, which calls the COCO API to read the data and annotations and convert them to PyTorch's tensor format. The specific code is as follows:

import os
import torch
import torchvision.transforms as T
from pycocotools.coco import COCO
from torch.utils.data import DataLoader, Dataset


class CocoDataset(Dataset):
    def __init__(self, root, year, mode, transforms=None):
        super().__init__()
        ann_dir = os.path.join(root, "annotations")
        img_dir = os.path.join(root, f"{
      
      mode}{
      
      year}")
        ann_file = os.path.join(ann_dir, f"instances_{
      
      mode}{
      
      year}.json")
        self.coco = COCO(ann_file)
        self.image_ids = sorted(self.coco.getImgIds())
        self.img_dir = img_dir
        self.transforms = transforms

    def __getitem__(self, index):
        image_id = self.image_ids[index]
        image_info = self.coco.loadImgs(image_id)[0]
        image_path = os.path.join(self.img_dir, image_info["file_name"])
        image = torch.load(image_path)

        ann_ids = self.coco.getAnnIds(imgIds=image_id)
        annotations = self.coco.loadAnns(ann_ids)
        boxes = []
        labels = []
        for ann in annotations:
            box = ann["bbox"]
            label = ann["category_id"]
            boxes.append(box)
            labels.append(label)

        boxes = torch.FloatTensor(boxes)
        labels = torch.LongTensor(labels)
        target = {
    
    "boxes": boxes, "labels": labels}

        if self.transforms:
            image, target = self.transforms(image, target)

        return image, target

    def __len__(self):
        return len(self.image_ids)

In this data loader, we use PyTorch's Transforms to preprocess images and annotations. Specifically, we used the following Transforms:

class Resize(object):
    def __init__(self, size):
        self.size = size

    def __call__(self, image, target=None):
        w, h = image.size
        image = image.resize(self.size)
        if target is not None:
            boxes = target["boxes"].clone()
            boxes[:, [0, 2]] = boxes[:, [0, 2]] * self.size[0] / w
            boxes[:, [1, 3]] = boxes[:, [1, 3]] * self.size[1] / h
            target["boxes"] = boxes
        return image, target


class ToTensor(object):
    def __call__(self, image, target=None):
        image = T.ToTensor()(image)
        if target is not None:
            boxes = target["boxes"]
            labels = target["labels"]
            boxes = torch.cat([boxes[:, :2], boxes[:, :2] + boxes[:, 2:]], dim=-1)  # [x1, y1, x2, y2]
            boxes = torch.FloatTensor(boxes)
            labels = torch.LongTensor(labels)
            target = {
    
    "boxes": boxes, "labels": labels}
        return image, target

Among them, Resize scales the image to the specified size, and adjusts the coordinates of the label accordingly; ToTensor converts the PIL.Image object into a PyTorch tensor, and converts the label into a PyTorch tensor format to facilitate subsequent training.

model definition

Next, we need to define the model. This example uses the Faster R-CNN algorithm , which consists of a predictor (to detect the location and class of objects) and a region generator (to generate box proposals). We used the official Faster R-CNN model and fine-tuned it. The specific code is as follows:

import torchvision
from torch import nn


class FasterRCNN(nn.Module):
    def __init__(self, num_classes):
        super().__init__()
        self.backbone = torchvision.models.vgg16(pretrained=True).features
        feature_map_size = 512
        self.roi_pool = nn.AdaptiveMaxPool2d((7, 7))
        self.fc1 = nn.Sequential(
            nn.Linear(512 * 7 * 7, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5)
        )
        self.fc2 = nn.Sequential(
            nn.Linear(4096, 4096),
            nn.ReLU(inplace=True),
            nn.Dropout(p=0.5)
        )
        self.cls_score = nn.Linear(4096, num_classes + 1)
        self.bbox_pred = nn.Linear(4096, 4 * (num_classes + 1))

    def forward(self, x, proposals=None):
        x = self.backbone(x)
        x = self.roi_pool(x)
        x = x.view(x.size(0), -1)
        x = self.fc1(x)
        x = self.fc2(x)
        cls_scores = self.cls_score(x)
        bbox_deltas = self.bbox_pred(x)

        if proposals is not None:
            cls_scores = cls_scores[proposals.batch_indices]  # [N, num_classes + 1]
            bbox_deltas = bbox_deltas[proposals.batch_indices]  # [N, 4 * (num_classes + 1)]
            proposals_boxes = proposals.boxes  # [N, 4]

            # convert bounding box coordinates from [x, y, w, h] to [x1, y1, x2, y2]
            num_classes = cls_scores.size(-1) - 1
            bbox_deltas = bbox_deltas.view(-1, num_classes, 4)
            xmin, ymin, xmax, ymax = proposals_boxes.unbind(1)
            xc, yc, w, h = bbox_deltas.unbind(2)
            x1 = xc * w + xmin
            y1 = yc * h + ymin
            x2 = (xc + torch.exp(w)) * xmax
            y2 = (yc + torch.exp(h)) * ymax
            proposals_boxes = torch.stack([x1, y1, x2, y2], dim=-1)

            return proposals_boxes, cls_scores

        return cls_scores, bbox_deltas

In this model, we first used VGG16 as the backbone and extracted its last feature map. Then, we use an adaptive pooling layer to scale the feature map to a certain size (here set to 7x7), and use two fully connected layers to predict it. Finally, we predict the classification and bounding box of each object separately, and use these predictions and labels to calculate the loss function during training. When predicting, we need to use the region generator to generate several candidate boxes and input them into the predictor to get the final prediction result.

training model

With the data loader and model in place, we can start training. Here we use the PyTorch Lightning framework to simplify the training process. PyTorch Lightning is an efficient, concise, and extensible PyTorch framework that provides many practical functions (such as automated training, distributed training, logging, etc.) to facilitate users to quickly build and train models.

We can define a LightningModule to encapsulate the data loader, model and optimizer, and overload the training_step and validation_step functions in it to define the training and validation process. The specific code is as follows:

import pytorch_lightning as pl

class DetectionModule(pl.LightningModule):
    def __init__(self, num_classes, lr=0.001, batch_size=32):
        super().__init__()
        self.num_classes = num_classes
        self.lr = lr
        self.batch_size = batch_size

        # define model
        self.model = FasterRCNN(num_classes)

        # define optimizer
        self.optimizer = torch.optim.Adam(self.model.parameters(), lr=lr)

    def forward(self, x, proposals=None):
        return self.model(x, proposals)

    def configure_optimizers(self):
        return self.optimizer

    def training_step(self, batch, batch_idx):
        images, targets = batch
        proposals = generate_proposals(images, self.model)
        outputs = self.model(images, proposals)
        loss = compute_loss(outputs, targets)
        self.log('train_loss', loss)
        return loss

    def validation_step(self, batch, batch_idx):
        images, targets = batch
        proposals = generate_proposals(images, self.model)
        outputs = self.model(images, proposals)
        loss = compute_loss(outputs, targets)
        self.log('val_loss', loss)

In this LightningModule, we first define training information (such as number of classes, learning rate, and batch size). Then, we define the model and optimizer in init , and return the optimizer in configure_optimizers. Finally, we train and validate a batch of data in training_step and validation_step, respectively calculate the loss function and record the results in the log.

Next, we can create a Trainer and use the following command to start training:

from pytorch_lightning import Trainer
from pytorch_lightning.callbacks import ModelCheckpoint

# create data loaders
train_loader = DataLoader(
    CocoDataset("./data", "2017", "train", transforms=transforms),
    batch_size=batch_size, shuffle=True, num_workers=4)
val_loader = DataLoader(
    CocoDataset("./data", "2017", "val", transforms=transforms),
    batch_size=batch_size, shuffle=False, num_workers=4)

# create model and trainer
detector = DetectionModule(num_classes=num_classes, lr=lr, batch_size=batch_size)
checkpoint_callback = ModelCheckpoint(monitor='val_loss')
trainer = Trainer(gpus=1, max_epochs=max_epochs, checkpoint_callback=checkpoint_callback)
trainer.fit(detector, train_loader, val_loader)

Among them, we use DataLoader to load data and pass it to Trainer for training. We also use the ModelCheckpoint callback to save the best model from each epoch. Finally, we call the fit function to train the model. During training, PyTorch Lightning automatically optimizes the model, records training information, and calculates validation metrics. We can also use HorovodTrainer for distributed training if needed.

predict object

After training, we can use the trained model to predict objects. Specifically, we can read a picture and input it into the model to get the position and category of each object. The specific code is as follows:

import torchvision.transforms.functional as F

def predict_image(model, image_path):
    image = Image.open(image_path).convert("RGB")
    image_tensor = F.to_tensor(F.resize(image, (800, 800)))
    proposals = generate_proposals(image_tensor.unsqueeze(0), model)
    boxes, scores = model(image_tensor.unsqueeze(0), proposals)
    boxes, labels, scores = filter_predictions(boxes, scores, 0.5)
    draw_boxes(image, boxes, labels)
    plt.imshow(image)
    plt.show()

predict_image(detector, "test.jpg")

In this prediction function, we first read an image and convert it to PyTorch's tensor format. We then feed this tensor into the model to get the location and category of each object. Finally, we use the filter_predictions and draw_boxes functions to filter out objects with high confidence and draw their location boxes and class labels on the image. Finally, we can see the results predicted by the model.

Guess you like

Origin blog.csdn.net/qq_41454577/article/details/129782663