Image processing practice 02-yolov5 target detection

yolov5

YOLOv5 is a target detection algorithm, which is the latest version of the YOLO (You Only Look Once) series of algorithms. YOLOv5 adopts a new architecture, which includes a backbone network based on CSPNet (Cross Stage Partial Network) and a series of improved techniques, such as multi-scale training, data enhancement, network mixed precision training, etc., thereby achieving faster Detection speed and better detection accuracy.

YOLOv5 supports multiple types of target detection tasks, such as object detection, face detection, vehicle detection, etc., and can be applied to various practical scenarios, such as intelligent security, autonomous driving, robot vision, etc. At the same time, YOLOv5 also provides pre-trained models and open source code to facilitate developers to train and apply models.

github address: https://github.com/ultralytics/yolov5/blob/master/README.zh-CN.md
official website: https://ultralytics.com/

development path

YOLO (You Only Look Once) is a series of target detection models developed by Joseph Redmon and others. The following is the development history of the YOLO series:

  1. YOLOv1: First proposed in 2015, it is the first version of the YOLO series. YOLOv1 divides the image into grids and predicts the bounding box and class probability of each grid by transforming the object detection task into a regression problem. However, YOLOv1 has problems with inaccurate positioning and sensitivity to small targets.

  2. YOLOv2 (YOLO9000): proposed in 2016, it is the second version of the YOLO series. YOLOv2 improves detection performance by introducing the Darknet-19 network structure, using anchor boxes and multi-scale prediction. At the same time, YOLOv2 also introduces semantic segmentation of target categories, which can detect more categories of targets.

  3. YOLOv3: Proposed in 2018, it is the third version of the YOLO series. YOLOv3 improves on the problems existing in YOLOv2, introducing technologies such as multi-scale prediction, using FPN structure and using smaller anchor boxes, which improves detection accuracy and the ability to detect small targets.

  4. YOLOv4: Proposed in 2020, it is the fourth version of the YOLO series. YOLOv4 introduces a series of improvements based on YOLOv3, including CSPDarknet53 as the backbone network, using SAM and PANet modules to extract features, and using pre-trained weights of YOLOv3 and YOLOv4 for initialization, etc., which improves detection performance and speed.

  5. YOLOv5: Proposed in 2020, it is the fifth version of the YOLO series. YOLOv5 adopts a lightweight network structure, improves the speed of detection, and introduces some new features, such as YOLOv5-seg segmentation model, Paddle Paddle export function, YOLOv5 AutoCache automatic caching function and Comet logging and visualization integration function.

Overall, the YOLO series models have improved the performance and speed of target detection through continuous improvement and optimization, and have made important breakthroughs in the field of computer vision.

8

YOLOv8 is a variant of the YOLO series model, which is improved and optimized based on YOLOv5. The YOLOv8 model includes functions such as Detect, Segment, Pose, Track and Classify. Below is a brief description of these features:

  1. Detect: The YOLOv8 model can perform real-time object detection on targets in images or videos. It completes the detection task by predicting the bounding box and category information of the target.

  2. Segment: The YOLOv8 model also supports the function of target segmentation, which is to classify each pixel in the image and segment different target areas. This feature can be used to identify different objects in images and perform more precise positioning and analysis.

  3. Pose estimation (Pose): The YOLOv8 model can also perform pose estimation on the detected target, that is, infer the pose information of the target in three-dimensional space. This is very useful for some applications that need to know the direction and position of the target, such as human posture analysis, robot navigation, etc.

  4. Track: The YOLOv8 model also has the function of target tracking, that is, continuously tracking the position and trajectory of the same target in the video. This is very important for applications such as video surveillance and autonomous driving.

  5. Classify: In addition to the target detection and segmentation functions, the YOLOv8 model can also classify the detected targets, that is, give the target category information. This is important for understanding the properties of the target and conducting more fine-grained analysis.

All in all, the YOLOv8 model integrates a variety of functions, including detection, segmentation, pose estimation, tracking and classification, etc., giving it a wider range of applications and more powerful functions.
Insert image description here
Github address: https://github.com/ultralytics/ultralytics

v5 getting started example

Install

Clone the repo and require installation of requirements.txt in Python>=3.7.0 environment and PyTorch>=1.7.

micromamba create prefix=d:/python380 python=3.8  #创建3.8的虚拟环境
micromamba activate d:/python380
git clone https://github.com/ultralytics/yolov5  # clone
cd yolov5
pip install -r requirements.txt  # install

Source code directory structure

yolov5/
├── data/                  # 数据集配置目录
│   ├── coco.yaml            # COCO数据集配置文件,里面有数据集的下载地址和加载的python脚本
│   ├──ImageNet.yaml           # ImageNet数据集
│   ├── custom.yaml          # 自定义数据集配置文件
│   └── ...                  # 其他数据集配置文件
├── models/                # 模型定义目录
│   ├── common.py            # 通用函数和类定义
│   ├── experimental.py      # 实验性模型定义
│   ├── export.py            # 导出模型为ONNX的脚本
│   ├── models.py            # YOLOv5模型定义
│   ├── yolo.py              # YOLO类定义
│   └── ...                  # 其他模型定义文件
├── utils/                 # 实用工具目录
│   ├── autoanchor.py        # 自动锚框生成工具
│   ├── datasets.py          # 数据集处理工具
│   ├── general.py           # 通用实用函数
│   ├── google_utils.py      # Google云平台工具
│   ├── loss.py              # 损失函数定义
│   ├── metrics.py           # 评估指标定义
│   ├── torch_utils.py       # PyTorch工具
│   ├── wandb_logging.py     # WandB日志记录工具
│   └── ...                  # 其他实用工具文件
├── runs/                 # 训练和预测的结果输出目录
│   ├── detect        # 使用detect.py训练后输出目录,输出的目录是[ex自增数字]
│   ├── train        # 使用detect.py训练后输出目录,输出的目录是[ex自增数字],包含了训练好的模型和测试集效果
├── weights/               # 预训练模型权重目录
├── .gitignore             # Git忽略文件配置
├── Dockerfile             # Docker容器构建文件
├── LICENSE                # 许可证文件
├── README.md              # 项目说明文档
├── requirements.txt       # 项目依赖包列表
├── train.py               # 训练脚本
├── detect.py               # 预测脚本
├── export.py               # 导出YOLOv5 PyTorch model to 其他格式
├── hubconf.py               # hubconf.py文件是用于定义模型和数据集的Python模块
└── ...                    # 其他源代码文件

Here you can download many commonly used training data sets through yolov5, and you can easily find the download address, such as ImageNet,
coco128, etc., so you don’t have to search hard by yourself.

Model download

Download address: https://github.com/ultralytics/yolov5/releases

v6.1

The version here is v6.1, which is the sub-version number of yolov5

Pretrained Checkpoints

Pretrained Checkpoints is a name for pre-trained weight files. In deep learning, pre-training weights refer to model parameters obtained through unsupervised learning or supervised learning on large-scale data sets. These parameters can often be used to initialize a new model, thereby speeding up model training and improving model performance.

Pretrained Checkpoints refer to pre-trained weight files that have been trained and can be used to initialize a new model and continue to train the model to adapt to new tasks or data sets. This method is called transfer learning and can greatly improve the training efficiency and generalization ability of the model. In the field of computer vision, common pre-trained networks include VGG, ResNet, Inception, MobileNet, etc.

Model overview

Explanation of the following model columns

List explain
Model model name
size(pixels) Size of the input image in pixels
mAPval0.5:0.95 Mean Average Precision on the validation set, considering all IOU thresholds from 0.5 to 0.95, the accuracy is %
mAPval0.5 Average accuracy on the validation set, only considering the case where the IOU threshold is 0.5
Speed CPU b1(ms) Inference speed (in milliseconds) on CPU using batch size 1
Speed V100 b1(ms) Inference speed (in milliseconds) on NVIDIA V100 GPU using batch size 1
Speed V100 b32(ms) Inference speed (in milliseconds) on NVIDIA V100 GPU using batch size 32
params (M) Number of parameters of the model (in millions)
FLOPs @640 (B) Number of floating point operations in billions for the model when the input image size is 640
Model size(pixels) mAPval0.5:0.95 mAPval0.5 Speed CPU b1(ms) Speed V100 b1(ms) Speed V100 b32(ms) params (M) FLOPs @640 (B)
YOLOv5n 640 28.0 45.7 45 6.3 0.6 1.9 4.5
YOLOv5s 640 37.4 56.8 98 6.4 0.9 7.2 16.5
YOLOv5m 640 45.4 64.1 224 8.2 1.7 21.2 49.0
YOLOv5l 640 49.0 67.3 430 10.1 2.7 46.5 109.1
YOLOv5x 640 50.7 68.9 766 12.1 4.8 86.7 205.7
YOLOv5n6 1280 36.0 54.4 153 8.1 2.1 3.2 4.6
YOLOv5s6 1280 44.8 63.7 385 8.2 3.6 12.6 16.8
YOLOv5m6 1280 51.3 69.3 887 11.1 6.8 35.7 50.0
YOLOv5l6 1280 53.7 71.3 1784 15.8 10.5 76.8 111.4

v7.0

The new YOLOv5 v7.0 instance segmentation model is the world’s fastest and most accurate, surpassing all current SOTA benchmarks. We make them very simple to use and can be easily trained, validated and deployed.
The main goal in this release is to introduce a super simple YOLOv5 segmentation workflow similar to our existing object detection models.
Important updates

  • Segmentation Model ⭐ New: SOTA YOLOv5-seg COCO pre-trained segmentation model is provided for the first time (#9052 developed by @glenn-jocher, @AyushExel and @Laughing-q)
  • Paddle Paddle export: Use python export.py --include paddle to export any YOLOv5 model (cls, seg, det) to Paddle format (#9459 developed by @glenn-jocher)
  • YOLOv5 AutoCache: Using python train.py --cache ram now scans available memory and compares to predicted dataset RAM usage. This reduces caching risk and should help improve usage of the dataset caching feature, resulting in significantly faster training. (#10027 developed by @glenn-jocher)
  • Comet logging and visualization integration: Free forever, Comet can save YOLOv5 models, resume training, and make interactive visualizations and debug predictions. (#9232 developed by @DN6)
Model size (pixels) mAPbox
50-95
mAP mask
50-95
Train time 300 epochs
A100 (hours)
Speed ONNX CPU
(ms)
Speed TRT A100
(ms)
params (M) FLOPs @640(B)
YOLOv5n-se 640 27.6 23.4 80:17 62.7 1.2 2.0 7.1
YOLOv5s-se 640 37.6 31.7 88:16 173.3 1.4 7.6 26.4
YOLOv5m-seg 640 45.0 37.1 108:36 427.0 2.2 22.0 70.8
YOLOv5l-se 640 49.0 39.9 66:43 (2x) 857.4 2.9 47.9 147.7
YOLOv5x-seg 640 50.7 41.4 62:56 (3x) 1579.2 4.5 88.8 265.7

Here I choose a V6.1 model yolov5n6.pt
and throw the model into the root directory of the yolov5 project.
Insert image description here

predict

Because the pre-trained model already has the ability to detect certain categories, we can look at the names in data/coco.yml and see that there are a total of 80 categories.
Insert image description here
In yolov5, you can use the ./detect.py script to detect target items.
The following is a detailed explanation of common parameters in the "./detect.py" script:

  1. --source: Specify the input source, which can be an image path, video file path or camera index (the default is the current directory data/images, which contains only two pictures).

  2. --weights: Specify the path to the model weight file. It can be a local path or the model name of PaddleHub Model Center. The default is yolov5s.pt in the current directory.

  3. --data: Specifies the configuration file for the dataset to use. The configuration file of the data set contains information such as the path of the data set, category labels, training set, verification set, and test set division. The default is data/coco128.yaml, which is optional.

  4. --img-size: Specify the size of the input image in the format of ",", such as "640,480". The default is 640x640.

  5. --conf-thres: Target confidence threshold, ranging from 0 to 1. Targets exceeding this threshold will be retained, which defaults to 0.25.

  6. --iou-thres: The IoU (intersection over union) threshold of NMS (non-maximum suppression), ranging from 0 to 1. Targets with overlap greater than this threshold will be merged, default is 0.45.

  7. --max-det: The maximum number of detected targets in each image, the default is 100.

  8. --device: Specify the device used, which can be "cpu" or "cuda". Defaults to "cpu".

  9. --view-img: Display the image window during detection.

  10. --save-txt: txt file that saves the test results.

  11. --save-conf: Save the confidence of the detection results.

  12. --save-crop: Save the cropped image of the detection result.

  13. --half: Use half-precision floating point numbers for inference.

These parameters can be adjusted according to your needs to obtain the best detection results. --helpYou can view more parameter options and descriptions using parameters when running a script .

Execute command prediction

python ./detect.py --source ./data/images --weight ./yolov5n6.pt

Results of the

(D:\condaenv\yolov5) D:\code1\yolov5-master\yolov5-master>python ./detect.py --source ./data/images --weight ./yolov5n6.pt
detect: weights=['./yolov5n6.pt'], source=./data/images, data=data\coco128.yaml, imgsz=[640, 640], conf_thres=0.25, iou_thres=0.45, max_det=1000, device=, view_img=False, save_txt=False, save_conf=False, save_crop=False, nosave=False, classes=None, agnostic_nms=Fal
se, augment=False, visualize=False, update=False, project=runs\detect, name=exp, exist_ok=False, line_thickness=3, hide_labels=False, hide_conf=False, half=False, dnn=False, vid_stride=1
YOLOv5  2023-5-30 Python-3.8.16 torch-2.0.1+cpu CPU

Fusing layers...
YOLOv5n6 summary: 280 layers, 3239884 parameters, 0 gradients
image 1/2 D:\code1\yolov5-master\yolov5-master\data\images\bus.jpg: 640x512 4 persons, 1 bus, 211.9ms
image 2/2 D:\code1\yolov5-master\yolov5-master\data\images\zidane.jpg: 384x640 3 persons, 1 tie, 152.9ms
Speed: 1.0ms pre-process, 182.4ms inference, 3.0ms NMS per image at shape (1, 3, 640, 640)
Results saved to runs\detect\exp8

Find runs\detect\exp8 and open the directory to view the classified pictures
Insert image description here
Insert image description here

Training model

Reference from the official website: https://docs.ultralytics.com/yolov5/tutorials/train_custom_data/#before-you-start

Prepare dataset

Create data set yaml

COCO128 is an example of a small tutorial dataset consisting of the first 128 images from COCO train2017. These 128 images are used for both training and validation to verify that our training process can overfit. data/coco128.yaml is the dataset configuration file, which defines the following:
1) Dataset root directory path and relative path to the training/validation/test image directory (or *.txt file containing the image path);
2) Category name dictionary.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ../datasets/coco128  # dataset root dir
train: images/train2017  # train images (relative to 'path') 128 images
val: images/train2017  # val images (relative to 'path') 128 images
test:  # test images (optional)

# Classes (80 COCO classes)
names:
  0: person
  1: bicycle
  2: car
  ...
  77: teddy bear
  78: hair drier
  79: toothbrush
  # Download script/URL (optional)
download: https://ultralytics.com/assets/coco128.zip

After downloading https://ultralytics.com/assets/coco128.zip, the directory structure is as follows.
Insert image description here
I use it here to train to judge the front and back of an ID card. I create a new idcard directory in the project root directory, and I am building a mul directory below. This The directory is only used to train different ID card information to distinguish. All our data sets are in the mul directory.

# Train/val/test sets as 1) dir: path/to/imgs, 2) file: path/to/imgs.txt, or 3) list: [path/to/imgs1, path/to/imgs2, ..]
path: ./idcard/mul  # dataset root dir
train: images  # train images 
val: images  # val images
test: images   # test images 

# Classes
names:
  0: idcard_z  #表示身份证正面
  1: idcard_f   #表示身份证反面

Note that yolov5 will automatically find the train directory under path and add your images as the image directory.
For example, the real training directory is: ./idcard/mul/train/images. There will be a labels directory under the same level directory of images. Note
: The directory of the verification set is: ./idcard/mul/val/images
The directory of the test set is: ./idcard/test/val/images
Insert image description here

Generally speaking, a common practice is to divide the data set into a training set, a validation set, and a test set, such as dividing the data into 70% training set, 15% validation set, and 15% test set. This scale is generally suitable for smaller data sets. For larger data sets, consider increasing the ratio of validation set to test set.

Create labels

After using annotation tools (labelme, labelimg) to annotate images, export the labels to YOLO format, and each image corresponds to a *.txt file (if there are no objects in the image, no *.txt file is required). The specifications for *.txt files are as follows:

  • Each object occupies one line
  • The format of each line is: category x center point y center point width height.
    The coordinates of the box must be in normalized xywh format (range 0-1). If your box's coordinates are in pixels, you'll need to divide the x center point and width by the image width, and divide the y center point and height by the image height.
  • Category numbers start from zero (index is 0) and correspond to the names index of the data set yaml.

It is recommended to use labelimg for annotation.

pip install labelimg -i https://pypi.tuna.tsinghua.edu.cn/simple

Switch to the current environment and enter labelimg, enter the labelimage command to open
Insert image description here

Select open dir to select the directory of pictures you need to label (idcard/mul/train/images directory), Change Save Dir to select your idcard/mul/train/labels directory, and select YOLO format to open the pictures one by one
. To mark images, the commonly used steps are:

  1. Press w to bring up a rectangular box and select the target you want to select. After selecting, the label will pop up. Note that you must first label a data.yaml with an index of 0, and then 1. You can select it later in the pop-up.
    Insert image description here
  2. After the standard is completed, ctrl+s saves.
  3. Press the d key on the keyboard to switch to the next picture, and continue to press w to mark the rectangle until all pictures are completed.

There will be classes.txt in your labels directory. Check whether its order is consistent with data.yaml. If it is inconsistent, do not adjust classes.txt. Just adjust data.yaml to keep it consistent.

train

I have prepared almost 350 annotated pictures here, and the recognition rate after training is 98%.
Execute using train.py

#  --weight是指定初始的权重,可以用它来fine tuning调整训练你自己的模型。
python train.py --batch-size 4 --epochs 10 --data .\idcard\mul\idcard.yaml --weight .\yolov5n6.pt

After the execution is completed, runs\trains\expn\weights\best.pt is the trained model. You can use the previous detect.py to specify this model to predict.

python ./detect.py --source .\idcard\mul\test\images --weight .\runs\train\exp3\weights\best.pt

View the prediction image under runs\detect\expn\
Insert image description here
Insert image description here
Insert image description here

Model application

We need to use the generated best.pt model in our application. We can use torch.hub

#使用我们本地之前用于训练的yolov5-master,我有把best.pt拷贝到当前目录
model = torch.hub.load('D:\\code1\\yolov5-master\\yolov5-master', 'custom', path='./best.pt', source='local')  # local repo
#print(model)
# 读取图像
img = cv2.imread('../images/zm.jpg')
# 进行预测
results = model(img)
resultLabel=[]
# 解析预测结果
for result in results.xyxy[0]:
    x1, y1, x2, y2, conf, cls = result.tolist()
    if conf > 0.5:
        # 绘制边框和标签
        cv2.rectangle(img, (int(x1), int(y1)), (int(x2), int(y2)), (0, 255, 0), 2)
        cv2.putText(img, f"{model.names[int(cls)]} {conf:.2f}", (int(x1), int(y1 - 10)), cv2.FONT_HERSHEY_SIMPLEX, 0.5, (0, 255, 0), 2)
        resultLabel.append(model.names[int(cls)])
# 显示图像
print("预测的结果是",resultLabel)
plt.imshow(img)
plt.show()

Insert image description here

This is the official online version call, but the program will automatically download the ultralytics/yolov5 package and yolov5s model, which is very slow.

import torch
# Model
model = torch.hub.load("ultralytics/yolov5", "yolov5s")  # or yolov5n - yolov5x6, custom
# Images
img = "https://ultralytics.com/images/zidane.jpg"  # or file, Path, PIL, OpenCV, numpy, list
# Inference
results = model(img)
# Results
results.print()  # or .show(), .save(), .crop(), .pandas(), etc.

Guess you like

Origin blog.csdn.net/liaomin416100569/article/details/131213359