YOLOv5 Tutorial - How to use other people's data sets for training + testing evaluation models

Table of contents

1. Preface and Dataset

 2. Divide data sets and modify configuration files

1. Put the pictures and .txt annotation files into the corresponding VOCData folder

 2..txt file to .xml file

3. Create the program split_train_val.py in the VOCData directory and run it

4. Convert xml format to yolo_txt format

5. Set up test files

6. Configuration file

3. Clustering to obtain a priori frame

1. Generate anchors file

 2. Modify the model configuration file

4. Model Training

1. Start training

2. Training process

5. Test file 

1. Call val.py test

 2. Test video frame rate

6. Others

1. Weight file .pt to .pth

 2. File format.xml to .json


1. Preface and Dataset

This tutorial of mine has won the support of many friends. I am very grateful to everyone here. I will post a slightly higher article today.

Xiaobai YOLOv5 whole process - training + digital recognition_Create a script in the root directory of yolov5 and create a split_train_val.py file

The above article is to teach you how to DIY a weight model for target detection from scratch. For undergraduates in water projects (such as me), most of the time is spent on finding and labeling data sets.

Many friends have asked me where to find the data set. This tutorial is my idea, but many of them are also picture sets, lacking annotation files. How to find the target detection YOLOv5 dataset? _The Blog of Niu Da 2023-CSDN Blog

However, some competitions and open source projects will provide pictures + annotation data sets for contestants. For example, this small competition provides ready-made data sets with explanatory pictures

Training set: 28773 endoscopic images of colorectal polyps (raw images, labels and Readme files).
The network disk link is as follows:
Link: https://pan.baidu.com/s/1n08y04DokW5LyF0t7tMIog
Extract code: tmkn

 

 

Finally, teach everyone to call val.py to test and get relevant data. 

 2. Divide data sets and modify configuration files

The overall structure is still like this document, Xiaobai YOLOv5 whole process-training + digital recognition_Create a script in the root directory of yolov5, create a split_train_val.py file_Niu Da 2023 Blog-CSDN Blog

1. Put the pictures and .txt annotation files into the corresponding VOCData folder

 2..txt file to .xml file

Create a txt_to_xml.py file under the VOCData folder , copy and paste the following code, pay attention to the path and category to be modified

import os
import xml.etree.ElementTree as ET
from PIL import Image
import numpy as np

# 图片文件夹,后面的/不能省
img_path = 'D:/python_work/yolov5/VOCData/images/'

# txt文件夹,后面的/不能省
labels_path = 'D:/python_work/yolov5/VOCData/labels/'

# xml存放的文件夹,后面的/不能省
annotations_path = 'D:/python_work/yolov5/VOCData/Annotations/'

labels = os.listdir(labels_path)

# 类别
classes = ["GHP","APC"]  #类别名

# 图片的高度、宽度、深度
sh = sw = sd = 0

def write_xml(imgname, sw, sh, sd, filepath, labeldicts):
    '''
    imgname: 没有扩展名的图片名称
    '''

    # 创建Annotation根节点
    root = ET.Element('Annotation')

    # 创建filename子节点,无扩展名
    ET.SubElement(root, 'filename').text = str(imgname)

    # 创建size子节点
    sizes = ET.SubElement(root,'size')
    ET.SubElement(sizes, 'width').text = str(sw)
    ET.SubElement(sizes, 'height').text = str(sh)
    ET.SubElement(sizes, 'depth').text = str(sd)

    for labeldict in labeldicts:
        objects = ET.SubElement(root, 'object')
        ET.SubElement(objects, 'name').text = labeldict['name']
        ET.SubElement(objects, 'pose').text = 'Unspecified'
        ET.SubElement(objects, 'truncated').text = '0'
        ET.SubElement(objects, 'difficult').text = '0'
        bndbox = ET.SubElement(objects,'bndbox')
        ET.SubElement(bndbox, 'xmin').text = str(int(labeldict['xmin']))
        ET.SubElement(bndbox, 'ymin').text = str(int(labeldict['ymin']))
        ET.SubElement(bndbox, 'xmax').text = str(int(labeldict['xmax']))
        ET.SubElement(bndbox, 'ymax').text = str(int(labeldict['ymax']))
    tree = ET.ElementTree(root)
    tree.write(filepath, encoding='utf-8')


for label in labels:
    with open(labels_path + label, 'r') as f:
        img_id = os.path.splitext(label)[0]
        contents = f.readlines()
        labeldicts = []
        for content in contents:
            # !!!这里要看你的图片格式了,我这里是png,注意修改
            img = np.array(Image.open(img_path + label.strip('.txt') + '.jpg'))

            # 图片的高度和宽度
            sh, sw, sd = img.shape[0], img.shape[1], img.shape[2]
            content = content.strip('\n').split()
            x = float(content[1])*sw
            y = float(content[2])*sh
            w = float(content[3])*sw
            h = float(content[4])*sh

            # 坐标的转换,x_center y_center width height -> xmin ymin xmax ymax
            new_dict = {'name': classes[int(content[0])],
                        'difficult': '0',
                        'xmin': x+1-w/2,
                        'ymin': y+1-h/2,
                        'xmax': x+1+w/2,
                        'ymax': y+1+h/2
                        }
            labeldicts.append(new_dict)
        write_xml(img_id, sw, sh, sd, annotations_path + label.strip('.txt') + '.xml', labeldicts)
#[转载链接](https://zhuanlan.zhihu.com/p/383660741)

The generated .xml file will be stored in the VOCData\Annotations folder

3. Create a program in the VOCData directory  split_train_val.py and run it

no modification

# coding:utf-8
 
import os
import random
import argparse
 
parser = argparse.ArgumentParser()
#xml文件的地址,根据自己的数据进行修改 xml一般存放在Annotations下
parser.add_argument('--xml_path', default='Annotations', type=str, help='input xml label path')
#数据集的划分,地址选择自己数据下的ImageSets/Main
parser.add_argument('--txt_path', default='ImageSets/Main', type=str, help='output txt label path')
opt = parser.parse_args()
 
trainval_percent = 1.0  # 训练集和验证集所占比例。 这里没有划分测试集
train_percent = 0.9     # 训练集所占比例,可自己进行调整
xmlfilepath = opt.xml_path
txtsavepath = opt.txt_path
total_xml = os.listdir(xmlfilepath)
if not os.path.exists(txtsavepath):
    os.makedirs(txtsavepath)
 
num = len(total_xml)
list_index = range(num)
tv = int(num * trainval_percent)
tr = int(tv * train_percent)
trainval = random.sample(list_index, tv)
train = random.sample(trainval, tr)
 
file_trainval = open(txtsavepath + '/trainval.txt', 'w')
file_test = open(txtsavepath + '/test.txt', 'w')
file_train = open(txtsavepath + '/train.txt', 'w')
file_val = open(txtsavepath + '/val.txt', 'w')
 
for i in list_index:
    name = total_xml[i][:-4] + '\n'
    if i in trainval:
        file_trainval.write(name)
        if i in train:
            file_train.write(name)
        else:
            file_val.write(name)
    else:
        file_test.write(name)
 
file_trainval.close()
file_train.close()
file_val.close()
file_test.close()

After running, the test set, training set, training verification set and verification set will be generated under VOCData\ImagesSets\Main

4. Convert xml format to yolo_txt format

Create the program text_to_yolo.py in the VOCData directory and run it

Change the classes part at the beginning to your own category

 After that, the path should also be changed to your own, pay attention to whether the suffix of the penultimate line is .png or .jpg

# -*- coding: utf-8 -*-
import xml.etree.ElementTree as ET
import os
from os import getcwd

sets = ['train', 'val', 'test']
classes = ["GHP","APC"]  # 改成自己的类别
abs_path = os.getcwd()
print(abs_path)


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return x, y, w, h


def convert_annotation(image_id):
    in_file = open('D:/python_work/yolov5/VOCData/Annotations/%s.xml' % (image_id), encoding='UTF-8')
    out_file = open('D:/python_work/yolov5/VOCData/labels/%s.txt' % (image_id), 'w')
    tree = ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        # difficult = obj.find('Difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text),
             float(xmlbox.find('ymax').text))
        b1, b2, b3, b4 = b
        # 标注越界修正
        if b2 > w:
            b2 = w
        if b4 > h:
            b4 = h
        b = (b1, b2, b3, b4)
        bb = convert((w, h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')


wd = getcwd()
for image_set in sets:
    if not os.path.exists('D:/python_work/yolov5/VOCData/labels/'):
        os.makedirs('D:/python_work/yolov5/VOCData/labels/')
    image_ids = open('D:/python_work/yolov5/VOCData/ImageSets/Main/%s.txt' % (image_set)).read().strip().split()

    if not os.path.exists('D:/python_work/yolov5/VOCData/dataSet_path/'):
        os.makedirs('D:/python_work/yolov5/VOCData/dataSet_path/')

    list_file = open('dataSet_path/%s.txt' % (image_set), 'w')
    for image_id in image_ids:
        list_file.write('D:/python_work/yolov5/VOCData/images/%s.jpg\n' % (image_id))
        convert_annotation(image_id)
    list_file.close()

 Among them, labels are annotation files of different images. Each image corresponds to a txt file, and each line of the file contains the information of a target, including class, x_center, y_center, width, height format, which is the yolo_txt format.

The dataSet_path folder contains txt files of three data sets. The txt files such as train.txt are the absolute paths of the locations of the divided images. For example, train.txt contains the absolute paths of all training set images.

5. Set up test files

Here I did not find the test set alone (lazy), I directly found 100 lines from train.txt and put them in test.txt as a test file

6. Configuration file

Create a new myvoc.yaml file (you can customize the name) under the data folder in the yolov5 directory  , and open it with Notepad.

The content is: the path of the training set, validation set and test set (train.txt, val.txt and test.txt) ( can be changed to a relative path )

and the category number and category name of the target.

3. Clustering to obtain a priori frame

1. Generate anchors file

Create two programs kmeans.py  and  clauculate_anchors.py in the VOCData directory 

No need to run kmeans.py, just run clauculate_anchors.py.

The kmeans.py program is as follows: This does not need to be run or changed, and if an error is reported, check the content of the thirteenth line.

import numpy as np
 
 
def iou(box, clusters):
    """
    Calculates the Intersection over Union (IoU) between a box and k clusters.
    :param box: tuple or array, shifted to the origin (i. e. width and height)
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: numpy array of shape (k, 0) where k is the number of clusters
    """
    x = np.minimum(clusters[:, 0], box[0])
    y = np.minimum(clusters[:, 1], box[1])
    if np.count_nonzero(x == 0) > 0 or np.count_nonzero(y == 0) > 0:
        raise ValueError("Box has no area")  # 如果报这个错,可以把这行改成pass即可
 
    intersection = x * y
    box_area = box[0] * box[1]
    cluster_area = clusters[:, 0] * clusters[:, 1]
 
    iou_ = intersection / (box_area + cluster_area - intersection)
 
    return iou_
 
 
def avg_iou(boxes, clusters):
    """
    Calculates the average Intersection over Union (IoU) between a numpy array of boxes and k clusters.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param clusters: numpy array of shape (k, 2) where k is the number of clusters
    :return: average IoU as a single float
    """
    return np.mean([np.max(iou(boxes[i], clusters)) for i in range(boxes.shape[0])])
 
 
def translate_boxes(boxes):
    """
    Translates all the boxes to the origin.
    :param boxes: numpy array of shape (r, 4)
    :return: numpy array of shape (r, 2)
    """
    new_boxes = boxes.copy()
    for row in range(new_boxes.shape[0]):
        new_boxes[row][2] = np.abs(new_boxes[row][2] - new_boxes[row][0])
        new_boxes[row][3] = np.abs(new_boxes[row][3] - new_boxes[row][1])
    return np.delete(new_boxes, [0, 1], axis=1)
 
 
def kmeans(boxes, k, dist=np.median):
    """
    Calculates k-means clustering with the Intersection over Union (IoU) metric.
    :param boxes: numpy array of shape (r, 2), where r is the number of rows
    :param k: number of clusters
    :param dist: distance function
    :return: numpy array of shape (k, 2)
    """
    rows = boxes.shape[0]
 
    distances = np.empty((rows, k))
    last_clusters = np.zeros((rows,))
 
    np.random.seed()
 
    # the Forgy method will fail if the whole array contains the same rows
    clusters = boxes[np.random.choice(rows, k, replace=False)]
 
    while True:
        for row in range(rows):
            distances[row] = 1 - iou(boxes[row], clusters)
 
        nearest_clusters = np.argmin(distances, axis=1)
 
        if (last_clusters == nearest_clusters).all():
            break
 
        for cluster in range(k):
            clusters[cluster] = dist(boxes[nearest_clusters == cluster], axis=0)
 
        last_clusters = nearest_clusters
 
    return clusters
 
 
if __name__ == '__main__':
    a = np.array([[1, 2, 3, 4], [5, 7, 6, 8]])
    print(translate_boxes(a))

Run: clauculate_anchors.py

Files that will call kmeans.py clustering to generate new anchors

The procedure is as follows:

It is necessary to change the file path of lines 9 and 13 and the label category name of line 16

# -*- coding: utf-8 -*-
# 根据标签文件求先验框

import os
import numpy as np
import xml.etree.cElementTree as et
from kmeans import kmeans, avg_iou

FILE_ROOT = "D:/python_work/yolov5/VOCData/"  # 根路径
ANNOTATION_ROOT = "Annotations"  # 数据集标签文件夹路径
ANNOTATION_PATH = FILE_ROOT + ANNOTATION_ROOT

ANCHORS_TXT_PATH = "D:/python_work/yolov5/VOCData/anchors.txt"  # anchors文件保存位置

CLUSTERS = 9
CLASS_NAMES = ['GHP','APC']  # 类别名称


def load_data(anno_dir, class_names):
    xml_names = os.listdir(anno_dir)
    boxes = []
    for xml_name in xml_names:
        xml_pth = os.path.join(anno_dir, xml_name)
        tree = et.parse(xml_pth)

        width = float(tree.findtext("./size/width"))
        height = float(tree.findtext("./size/height"))

        for obj in tree.findall("./object"):
            cls_name = obj.findtext("name")
            if cls_name in class_names:
                xmin = float(obj.findtext("bndbox/xmin")) / width
                ymin = float(obj.findtext("bndbox/ymin")) / height
                xmax = float(obj.findtext("bndbox/xmax")) / width
                ymax = float(obj.findtext("bndbox/ymax")) / height

                box = [xmax - xmin, ymax - ymin]
                boxes.append(box)
            else:
                continue
    return np.array(boxes)


if __name__ == '__main__':

    anchors_txt = open(ANCHORS_TXT_PATH, "w")

    train_boxes = load_data(ANNOTATION_PATH, CLASS_NAMES)
    count = 1
    best_accuracy = 0
    best_anchors = []
    best_ratios = []

    for i in range(10):  ##### 可以修改,不要太大,否则时间很长
        anchors_tmp = []
        clusters = kmeans(train_boxes, k=CLUSTERS)
        idx = clusters[:, 0].argsort()
        clusters = clusters[idx]
        # print(clusters)

        for j in range(CLUSTERS):
            anchor = [round(clusters[j][0] * 640, 2), round(clusters[j][1] * 640, 2)]
            anchors_tmp.append(anchor)
            print(f"Anchors:{anchor}")

        temp_accuracy = avg_iou(train_boxes, clusters) * 100
        print("Train_Accuracy:{:.2f}%".format(temp_accuracy))

        ratios = np.around(clusters[:, 0] / clusters[:, 1], decimals=2).tolist()
        ratios.sort()
        print("Ratios:{}".format(ratios))
        print(20 * "*" + " {} ".format(count) + 20 * "*")

        count += 1

        if temp_accuracy > best_accuracy:
            best_accuracy = temp_accuracy
            best_anchors = anchors_tmp
            best_ratios = ratios

    anchors_txt.write("Best Accuracy = " + str(round(best_accuracy, 2)) + '%' + "\r\n")
    anchors_txt.write("Best Anchors = " + str(best_anchors) + "\r\n")
    anchors_txt.write("Best Ratios = " + str(best_ratios))
    anchors_txt.close()

Run to generate the anchors file. If the generated file is empty, just run it again.

The second line of Best Anchors needs to be used later. (This is the value of anchors obtained manually)

 2. Modify the model configuration file

Select a model, the model folder under the yolov5 directory is the model configuration file, there are n, s, m, l, x versions, which gradually increase (as the architecture increases, the training time also gradually increases).

Here choose yolov5s.yaml  to open with Notepad

Mainly change two parameters:

Change  nc : back to your own number of label categories (the picture has not been changed, and the picture has opened yolov5m by mistake...)
Modify the anchors , according to the Best Anchors in anchors.txt, you need to round up (rounding, up, down).
Keep the format of the anchors in yaml unchanged, one-to-one in order, such as the six I framed and the first line of anchors 6 (18 must be changed)

4. Model Training

1. Start training

Open the anaconda terminal, select the yolov5 file, and activate the corresponding environment (my name is yolov5)

  Then enter the following training command: (Because the data set is nearly 3w, my small graphics card can’t finish running even if it is smoking. In order to demonstrate the meaning of training for 20 rounds, normal training should be at least 50 rounds)

python train.py --weights weights/yolov5s.pt  --cfg models/yolov5s.yaml  --data data/myvoc.yaml --epoch 22 --batch-size 8 --img 640   --device 0 

Parameter explanation:

 –weights weights/yolov5s.pt : Maybe you need to change the path. I put all the pt files of yolov5 in the weights directory, you may not have it, you need to change the path.

--epoch 22 : train for 22 rounds

–batch-size 8: weight update after training 8 images

--device cpu: use CPU for training. //Here device 0 is gpu training

2. Training process

22 rounds took over 3 hours...

 The trained model will be saved under runs/train/weights/expxx in the yolov5 directory.

5. Test file 

Different from looking for pictures and videos to see the effect in the past, this time calls the test set and returns relevant data

1. Call val.py test

Also enter in the console of anaconda

(yolov5) D:\python_work\yolov5>python val.py --weights runs/train/exp12/weights/best.pt --data data/myvoc.yaml --img-size 640 --iou-thres 0.5 --conf-thres 0.4 --batch-size 8 --task test

The test file will be generated in the following path

These data will be generated, predicted json data, F1, P, PR, R, etc. 

 

 

 2. Test video frame rate

Find the detect.py file in the main directory of yolov5 and open it.

 Mainly modify the weights and source: (about 218 lines) test video, the test results will be under yolov5\runs\detect\exp…

    parser.add_argument('--weights', nargs='+', type=str, default='runs/train/exp12/weights/best.pt', help='model.pt path(s)')
    parser.add_argument('--source', type=str, default='sytest1.mp4', help='source') #file/dir/URL/glob/screen/0(webcam)

 

 Finally, use the following function to check the frame rate (but I feel that this method is not rigorous, please correct me)

import cv2

cap = cv2.VideoCapture('result1.mp4')
fps = cap.get(cv2.CAP_PROP_FPS)
total_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))

# 获取帧率
fps = cap.get(cv2.CAP_PROP_FPS)
print(fps)

# 释放视频
cap.release()

6. Others

1. Weight file .pt to .pth

import torch

# 将pt文件转换为pth文件
pt_file_path = 'D:/python_work/yolov5/runs/train/exp12/weights/last.pt'
pth_file_path = 'D:/python_work/yolov5/runs/train/exp12/weights/last.pth'
model_weights = torch.load(pt_file_path)
torch.save(model_weights, pth_file_path)

 2. File format.xml to .json

import os
import json
import xml.etree.ElementTree as ET

# 指定包含所有.xml文件的文件夹路径
xml_folder = "data/test/labels"


# 定义解析XML文件的函数
def parse_xml(xml_path):
    tree = ET.parse(xml_path)
    root = tree.getroot()

    # 从XML文件中获取需要的信息,存储在字典中
    filename = root.find("filename").text
    width = int(root.find("size/width").text)
    height = int(root.find("size/height").text)

    object_list = []
    for obj in root.findall("object"):
        name = obj.find("name").text
        xmin = int(obj.find("bndbox/xmin").text)
        ymin = int(obj.find("bndbox/ymin").text)
        xmax = int(obj.find("bndbox/xmax").text)
        ymax = int(obj.find("bndbox/ymax").text)

        obj_dict = {
            "name": name,
            "xmin": xmin,
            "ymin": ymin,
            "xmax": xmax,
            "ymax": ymax
        }
        object_list.append(obj_dict)

    # 返回字典
    return {
        "filename": filename,
        "width": width,
        "height": height,
        "objects": object_list
    }


# 解析所有.xml文件并将结果存储在列表中
annotation_list = []
for xml_file in os.listdir(xml_folder):
    if xml_file.endswith(".xml"):
        xml_path = os.path.join(xml_folder, xml_file)
        annotation = parse_xml(xml_path)
        annotation_list.append(annotation)

# 将列表写入.json文件中
json_path = "data/test/output.json"
with open(json_path, "w") as f:
    json.dump(annotation_list, f)

Guess you like

Origin blog.csdn.net/m0_62237233/article/details/130556772