YOLOv7 training custom data set

To use YOLOv7 for comparative experiments, you need to redeploy the YOLO environment and convert the COCO format dataset to YOLO format. The
blogger’s COCO dataset is converted from the WiderPerson dataset and some processing has been done.

environment

Ubuntu18.0 CUDA11.2 NVIDIA T4

project deployment

Download items:

git clone https://gitcode.net/mirrors/WongKinYiu/yolov7.git

Environment deployment

conda create -n yolo python=3.8
conda activate yolo

Installation dependencies:

 pip install -r requirements.txt

The environment of YOLO is still very easy to install.

Dataset format conversion

# COCO 格式的数据集转化为 YOLO 格式的数据集
# --json_path 输入的json文件路径
# --save_path 保存的文件夹名字,默认为当前目录下的labels。

import os
import json
from tqdm import tqdm
import argparse

parser = argparse.ArgumentParser()
# 这里根据自己的json文件位置,换成自己的就行
parser.add_argument('--json_path',
                    default='/home/ubuntu/conda/data/annotations/instances_val2017.json', type=str,
                    help="input: coco format(json)")
# 这里设置.txt文件保存位置
parser.add_argument('--save_path', default='/home/ubuntu/conda/data/labels/val/', type=str,
                    help="specify where to save the output dir of labels")
arg = parser.parse_args()


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = box[0] + box[2] / 2.0
    y = box[1] + box[3] / 2.0
    w = box[2]
    h = box[3]
    # round函数确定(xmin, ymin, xmax, ymax)的小数位数
    x = round(x * dw, 6)
    w = round(w * dw, 6)
    y = round(y * dh, 6)
    h = round(h * dh, 6)
    return (x, y, w, h)


if __name__ == '__main__':
    json_file = arg.json_path  # COCO Object Instance 类型的标注
    ana_txt_save_path = arg.save_path  # 保存的路径

    data = json.load(open(json_file, 'r'))
    if not os.path.exists(ana_txt_save_path):
        os.makedirs(ana_txt_save_path)

    id_map = {
    
    }  # coco数据集的id不连续!重新映射一下再输出!
    with open(os.path.join(ana_txt_save_path, 'classes.txt'), 'w') as f:
        # 写入classes.txt
        for i, category in enumerate(data['categories']):
            f.write(f"{
      
      category['name']}\n")
            id_map[category['id']] = i
    # print(id_map)
    # 这里需要根据自己的需要,更改写入图像相对路径的文件位置。
    list_file = open(os.path.join(ana_txt_save_path, 'train2017.txt'), 'w')
    for img in tqdm(data['images']):
        filename = img["file_name"]
        img_width = img["width"]
        img_height = img["height"]
        img_id = img["id"]
        head, tail = os.path.splitext(filename)
        ana_txt_name = head + ".txt"  # 对应的txt名字,与jpg一致
        f_txt = open(os.path.join(ana_txt_save_path, ana_txt_name), 'w')
        for ann in data['annotations']:
            if ann['image_id'] == img_id:
                box = convert((img_width, img_height), ann["bbox"])
                f_txt.write("%s %s %s %s %s\n" % (id_map[ann["category_id"]], box[0], box[1], box[2], box[3]))
        f_txt.close()
        # 将图片的相对路径写入train2017或val2017的路径
        list_file.write('/home/ubuntu/conda/data/images/%s.jpg\n' % (head))
    list_file.close()

The generated data set is like this, the first is train2017.txt, record the data set address

insert image description here

Then there is the annotation file for each image file:

insert image description here

At this time, the data set needs to be divided, because all the images of our custom data set are in the same file at this time, and the training set and verification set should be divided according to the files in train2017.txt and val2017.txt

import shutil
import os
f = open("/home/ubuntu/conda/data/labels/val/val2017.txt")
dstpath="/home/ubuntu/conda/data/image/val/"
lines = f.readlines()
for line in lines:
    line=line.replace("\n","")
    fpath,fname=os.path.split(line)
    print(fname)
    shutil.copy(line, dstpath + fname)
f.close()

At this point, the data set processing work is completed.

start training

Dataset configuration file modification

To modify the configuration file, the first is the configuration of the dataset file. Since we initially used the COCO dataset, we will modify it directly on the /data/coco.ymal file.
The following three places need to be modified, corresponding to the address of the dataset, the number of dataset categories, and the category of the dataset.

insert image description here
The modified file looks like this:

train: /home/ubuntu/conda/data/images/train/  # 118287 images
val: /home/ubuntu/conda/data/images/val/  # 5000 images
nc: 1
names: [ 'pedestrains',  ]

train.py configuration file modification

 	parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default='/home/ubuntu/conda/yolov7/weights/yolov7_training.pt', help='initial weights path')
    parser.add_argument('--cfg', type=str, default='/home/ubuntu/conda/yolov7/cfg/training/yolov7.yaml', help='model.yaml path')
    parser.add_argument('--data', type=str, default='data/coco.yaml', help='data.yaml path')
    parser.add_argument('--hyp', type=str, default='data/hyp.scratch.p5.yaml', help='hyperparameters path')
    parser.add_argument('--epochs', type=int, default=200)
    parser.add_argument('--batch-size', type=int, default=4, help='total batch size for all GPUs')

The configuration file of the model needs to be modified, whichever configuration file of the model is used, which is used by the blogger /home/ubuntu/conda/yolov7/cfg/training/yolov7.yaml, and the number of categories is modified.

insert image description here
Then download the pre-trained weights.

insert image description here

Guess you like

Origin blog.csdn.net/pengxiang1998/article/details/131138743