Complete Yolov5 target recognition from scratch (2) Make and train your own training set

previous articles

Complete Yolov5 target recognition from scratch (1) preparation work

Table of contents

previous articles

1. Picture preparation

1. Image Slicing

2. Use labelimg for image calibration

2. Make Yolov5 data set

1. Preparation of pictures and labels

2. Generate the .txt data set and divide the training set and test set in one step:

 3. Training model

1. Modify the dataset configuration file

2. Modify the model configuration file

3. Modify the main function parameters

4. Enable TensorBoard to track training progress

4. Model Validation

1. Verify picture or video:

3. Verify with the camera:


1. Picture preparation

1. Image Slicing

Prepare the video to be put on the network, put it into the ./video folder, and use the following program to slice the video:

import cv2
import os
import sys
#视频片段的路径
input_path = "./视频所在路径"
#设定每隔多少帧截取一帧
fram_interval = 7
filenames = os.listdir(input_path)
video_prefix = input_path.split(os.sep)[-1]
#建立一个新文件夹,名称为原文件夹名称后加上_frames
frame_path = '{}_frame'.format(input_path)
if not os.path.exists(frame_path):
    os.mkdir(frame_path)
cap = cv2.VideoCapture()
for filename in filenames:
    filepath = os.sep.join([input_path,filename])    
    cap.open(filepath)    
    #获取视频帧数
    n_frames = int(cap.get(cv2.CAP_PROP_FRAME_COUNT))        
    #若画质低,则略过头帧
    #for i in range(42):
        #cap.read()        
    for i in range(n_frames):
        ret, frame = cap.read()        
        #每隔frame_interval帧进行一次截屏操作
        if i % fram_interval == 0:
            imagename = '{}_{:0>6d}.jpg'.format(filename.split('.')[0],i)
            imagepath = os.sep.join([frame_path,imagename])
            print('exported {}!'.format(imagepath))
            cv2.imwrite(imagepath,frame)           
#图片重命名
path="D:/wyc/myfirsttest/video_frame"
#获取该目录下所有文件,存入列表中
fileList=os.listdir(path)
n=0
m=0   
for i in fileList:    
    #设置旧文件名
    oldname=path+ os.sep + fileList[n]   
    #设置新文件名
    newname=path+os.sep +"train"+str(m+1)+".jpg"   
    os.rename(oldname,newname) 
    print(oldname,'======>',newname)
    n+=1
    m+=1
cap.release()

The picture after slicing is saved in ./video_frame. Be sure to modify the name of the new picture to facilitate later training.

2. Use labelimg for image calibration

Download the .exe file of labelimg

Page introduction:

OpenDir: Open the folder where the pictures to be calibrated are located, and all pictures to be calibrated will be displayed in the lower right corner

ChangeSaveDir: Save location of the selected VOC tag.xml file

View-"Auto-Save-Mode: Select to open auto-save

Keyboard switch English input-"w: start calibration

Keyboard switching English input-"a/d: previous/next picture

2. Make Yolov5 data set

1. Preparation of pictures and labels

Yolov5's VOC dataset consists of three parts: picture.jpg, label in VOC format.xml, dataset in yolov5 format.txt

The structure of the new VOC folder under the root directory of yolov5 is as follows:

VOC——VOC2007 ——Annotations
           |——JPEGImages
           |——YOLOLabels

The first two folders can be named according to preferences, and the three subfolders are best named according to the conventional method to avoid affecting YOLO reading.

Place the calibrated .xml tags in Annotations, and place pictures in JPEGImages. Since the storage path of the tags has been changed, modify the pictures pointed to in .xml in batches in VOC2007:

import xml.etree.ElementTree as ET
import os
path="./.xml文件存放位置/" # 路径后面记得加斜杠!!!
sv_path="./修改后要存放.xml的位置/" 
imgpath="./新的path路径(图片所在的文件路径)"
files=os.listdir(path) #读取路径下所有文件名
for xmlFile in files: 
    if xmlFile.endswith('.xml'):
        tree=ET.ElementTree(file = path+xmlFile) #打开xml文件,送到tree解析
        root=tree.getroot() #得到文档元素对象  
        root[0].text='ImageSets'
        #root[0].text是annotation下第一个子节点中内容,直接赋值替换文本内容
        root[2].text=imgpath+xmlFile
        root[2].text = root[2].text.replace('xml','jpg')
    	#替换后的内容保存在内存中需要将其写出
        tree.write(sv_path+xmlFile)        

2. Generate the .txt data set and divide the training set and test set in one step:

Remember to modify the path name and target name

import xml.etree.ElementTree as ET
import pickle
import os
from os import listdir, getcwd
from os.path import join
import random
from shutil import copyfile
 
classes = ["Mr.C"]#这里修改成你的标签名称
#classes=["ball"]
 
TRAIN_RATIO = 80
 
def clear_hidden_files(path):
    dir_list = os.listdir(path)
    for i in dir_list:
        abspath = os.path.join(os.path.abspath(path), i)
        if os.path.isfile(abspath):
            if i.startswith("._"):
                os.remove(abspath)
        else:
            clear_hidden_files(abspath)
 
def convert(size, box):
    dw = 1./size[0]
    dh = 1./size[1]
    x = (box[0] + box[1])/2.0
    y = (box[2] + box[3])/2.0
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x*dw
    w = w*dw
    y = y*dh
    h = h*dh
    return (x,y,w,h)
 
def convert_annotation(image_id):
    in_file = open('VOC/VOC2007/Annotations/%s.xml' %image_id)
    out_file = open('VOC/VOC2007/YOLOLabels/%s.txt' %image_id, 'w')
    tree=ET.parse(in_file)
    root = tree.getroot()
    size = root.find('size')
    w = int(size.find('width').text)
    h = int(size.find('height').text)
 
    for obj in root.iter('object'):
        difficult = obj.find('difficult').text
        cls = obj.find('name').text
        if cls not in classes or int(difficult) == 1:
            continue
        cls_id = classes.index(cls)
        xmlbox = obj.find('bndbox')
        b = (float(xmlbox.find('xmin').text), float(xmlbox.find('xmax').text), float(xmlbox.find('ymin').text), float(xmlbox.find('ymax').text))
        bb = convert((w,h), b)
        out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
    in_file.close()
    out_file.close()
 
wd = os.getcwd()
wd = os.getcwd()
data_base_dir = os.path.join(wd, "VOC/")
if not os.path.isdir(data_base_dir):
    os.mkdir(data_base_dir)
work_sapce_dir = os.path.join(data_base_dir, "VOC2007/")
if not os.path.isdir(work_sapce_dir):
    os.mkdir(work_sapce_dir)
annotation_dir = os.path.join(work_sapce_dir, "Annotations/")
if not os.path.isdir(annotation_dir):
        os.mkdir(annotation_dir)
clear_hidden_files(annotation_dir)
image_dir = os.path.join(work_sapce_dir, "JPEGImages/")
if not os.path.isdir(image_dir):
        os.mkdir(image_dir)
clear_hidden_files(image_dir)
yolo_labels_dir = os.path.join(work_sapce_dir, "YOLOLabels/")
if not os.path.isdir(yolo_labels_dir):
        os.mkdir(yolo_labels_dir)
clear_hidden_files(yolo_labels_dir)
yolov5_images_dir = os.path.join(data_base_dir, "images/")
if not os.path.isdir(yolov5_images_dir):
        os.mkdir(yolov5_images_dir)
clear_hidden_files(yolov5_images_dir)
yolov5_labels_dir = os.path.join(data_base_dir, "labels/")
if not os.path.isdir(yolov5_labels_dir):
        os.mkdir(yolov5_labels_dir)
clear_hidden_files(yolov5_labels_dir)
yolov5_images_train_dir = os.path.join(yolov5_images_dir, "train/")
if not os.path.isdir(yolov5_images_train_dir):
        os.mkdir(yolov5_images_train_dir)
clear_hidden_files(yolov5_images_train_dir)
yolov5_images_test_dir = os.path.join(yolov5_images_dir, "val/")
if not os.path.isdir(yolov5_images_test_dir):
        os.mkdir(yolov5_images_test_dir)
clear_hidden_files(yolov5_images_test_dir)
yolov5_labels_train_dir = os.path.join(yolov5_labels_dir, "train/")
if not os.path.isdir(yolov5_labels_train_dir):
        os.mkdir(yolov5_labels_train_dir)
clear_hidden_files(yolov5_labels_train_dir)
yolov5_labels_test_dir = os.path.join(yolov5_labels_dir, "val/")
if not os.path.isdir(yolov5_labels_test_dir):
        os.mkdir(yolov5_labels_test_dir)
clear_hidden_files(yolov5_labels_test_dir)
 
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'w')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'w')
train_file.close()
test_file.close()
train_file = open(os.path.join(wd, "yolov5_train.txt"), 'a')
test_file = open(os.path.join(wd, "yolov5_val.txt"), 'a')
list_imgs = os.listdir(image_dir) # list image files
prob = random.randint(1, 100)
print("Probability: %d" % prob)
for i in range(0,len(list_imgs)):
    path = os.path.join(image_dir,list_imgs[i])
    if os.path.isfile(path):
        image_path = image_dir + list_imgs[i]
        voc_path = list_imgs[i]
        (nameWithoutExtention, extention) = os.path.splitext(os.path.basename(image_path))
        (voc_nameWithoutExtention, voc_extention) = os.path.splitext(os.path.basename(voc_path))
        annotation_name = nameWithoutExtention + '.xml'
        annotation_path = os.path.join(annotation_dir, annotation_name)
        label_name = nameWithoutExtention + '.txt'
        label_path = os.path.join(yolo_labels_dir, label_name)
    prob = random.randint(1, 100)
    print("Probability: %d" % prob)
    if(prob < TRAIN_RATIO): # train dataset
        if os.path.exists(annotation_path):
            train_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_train_dir + voc_path)
            copyfile(label_path, yolov5_labels_train_dir + label_name)
    else: # test dataset
        if os.path.exists(annotation_path):
            test_file.write(image_path + '\n')
            convert_annotation(nameWithoutExtention) # convert label
            copyfile(image_path, yolov5_images_test_dir + voc_path)
            copyfile(label_path, yolov5_labels_test_dir + label_name)
train_file.close()
test_file.close()

The file structure after generation:

VOC——images————train
   |         |——val
   |
   |——labels————train
   |         |——val
   | 
   |——VOC2007——……

The number and position of the calibration target are recorded in labels, and you can check whether it is empty or has a negative number:

 3. Training model

1. Modify the dataset configuration file

Find the ./data file in the root directory, copy voc.yaml and rename it, and modify the parameters as follows:

train: ./train图像文件夹 #最好写成绝对路径,下同
val: ./val图像文件夹
nc: 1  # number of classes
names: ['标定的目标名称']  # class names

Comment out all the rest

2. Modify the model configuration file

Find the root and record the ./model file, select the .yaml to be used as the training model, and modify the parameters:

nc: 1  # number of classes

The rest don't move.

3. Modify the main function parameters

Find train.py in the root directory, the configuration part:

def parse_opt(known=False):
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default=ROOT / 'yolov5s.pt', help='initial weights path')
    parser.add_argument('--cfg', type=str, default=ROOT/'models/yolov5s.yaml', help='model.yaml path')
    parser.add_argument('--data', type=str, default=ROOT / 'data/myvoc.yaml', help='dataset.yaml path')
    parser.add_argument('--hyp', type=str, default=ROOT / 'data/hyps/hyp.scratch-low.yaml', help='hyperparameters path')
    parser.add_argument('--epochs', type=int, default=150)
    parser.add_argument('--batch-size', type=int, default=16, help='total batch size for all GPUs, -1 for autobatch')
    parser.add_argument('--imgsz', '--img', '--img-size', type=int, default=640, help='train, val image size (pixels)')
    parser.add_argument('--rect', action='store_true', help='rectangular training')
    parser.add_argument('--resume', nargs='?', const=True, default=False, help='resume most recent training')
    parser.add_argument('--nosave', action='store_true', help='only save final checkpoint')
    parser.add_argument('--noval', action='store_true', help='only validate final epoch')
    parser.add_argument('--noautoanchor', action='store_true', help='disable AutoAnchor')
    parser.add_argument('--evolve', type=int, nargs='?', const=300, help='evolve hyperparameters for x generations')
    parser.add_argument('--bucket', type=str, default='', help='gsutil bucket')
    parser.add_argument('--cache', type=str, nargs='?', const='ram', help='--cache images in "ram" (default) or "disk"')
    parser.add_argument('--image-weights', action='store_true', help='use weighted image selection for training')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--multi-scale', action='store_true', help='vary img-size +/- 50%%')
    parser.add_argument('--single-cls', action='store_true', help='train multi-class data as single-class')
    parser.add_argument('--optimizer', type=str, choices=['SGD', 'Adam', 'AdamW'], default='SGD', help='optimizer')
    parser.add_argument('--sync-bn', action='store_true', help='use SyncBatchNorm, only available in DDP mode')
    parser.add_argument('--workers', type=int, default=8, help='max dataloader workers (per RANK in DDP mode)')
    parser.add_argument('--project', default=ROOT / 'runs/train', help='save to project/name')
    parser.add_argument('--name', default='exp', help='save to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--quad', action='store_true', help='quad dataloader')
    parser.add_argument('--cos-lr', action='store_true', help='cosine LR scheduler')
    parser.add_argument('--label-smoothing', type=float, default=0.0, help='Label smoothing epsilon')
    parser.add_argument('--patience', type=int, default=100, help='EarlyStopping patience (epochs without improvement)')
    parser.add_argument('--freeze', nargs='+', type=int, default=[0], help='Freeze layers: backbone=10, first3=0 1 2')
    parser.add_argument('--save-period', type=int, default=-1, help='Save checkpoint every x epochs (disabled if < 1)')
    parser.add_argument('--local_rank', type=int, default=-1, help='DDP parameter, do not modify')

    # Weights & Biases arguments
    parser.add_argument('--entity', default=None, help='W&B: Entity')
    parser.add_argument('--upload_dataset', nargs='?', const=True, default=False, help='W&B: Upload data, "val" option')
    parser.add_argument('--bbox_interval', type=int, default=-1, help='W&B: Set bounding-box image logging interval')
    parser.add_argument('--artifact_alias', type=str, default='latest', help='W&B: Version of dataset artifact to use')

Parameter analysis:

    --weights: the path address of the initialized weight file
    --cfg: the path address of the model yaml file
    --data: the path address of the data yaml file
    --hyp: the path address of the hyperparameter file
    --epochs: training rounds
    --batch -size: how much to feed the batch file
    --img-size: input image size
    --rect: whether to use rectangular training, the default is False
    --resume: then interrupt the last training result and continue training
    --nosave: do not save Model, default False
    --notest: no test, default False
    --noautoanchor: do not automatically adjust the anchor, default False
    --evolve: whether to perform hyperparameter evolution, default False
    --bucket: Google cloud disk bucket, generally not used To
    --cache-images: whether to cache images in memory in advance to speed up training, the default is False
    --image-weights: use weighted image selection for training
    --device: training equipment, cpu; 0 (represents a gpu device cuda :0); 0,1,2,3 (multiple gpu devices)
    --multi-scale: Whether to perform multi-scale training, the default is False
    --single-cls: Whether the data set has only one category, the default is False
    --adam: whether to use the adam optimizer
    --sync-bn: whether to use cross-card synchronization BN, use in DDP mode
    --local_rank: DDP parameter, do not modify
    --workers: maximum number of working cores
    --project: training model The save location
    --name: the name of the directory where the model is saved
    --exist-ok: whether the model directory exists, or create it if it does not exist

--default=path in weights is modified to the location of the .pt to be trained

--default=corresponding weight.yaml location in cfg

--data where default=dataset configuration file.yaml location

--default=number of rounds of training in epochs (directly affects training effect and time)

Then you can run train.py to train the model!

4. Enable TensorBoard to track training progress

Open a command prompt in the root directory, start the virtual environment, and type:

tensorboard --logdir=runs/train

You can view the training results at the provided localhost6006:

After the model is trained, use TensorBoard to view the results:

tensorboard --logdir=runs

4. Model Validation

The trained data is saved in ./run/train/exp, including two trained .pt weight files (one is the last training and the other is the best).

Open detect.py in the root directory, configuration section:

def parse_opt():
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', nargs='+', type=str, default=ROOT / 'best.pt', help='model path(s)')
    parser.add_argument('--source', type=str, default=0, help='file/dir/URL/glob, 0 for webcam')
    parser.add_argument('--data', type=str, default=ROOT / 'data/coco128.yaml', help='(optional) dataset.yaml path')
    parser.add_argument('--imgsz', '--img', '--img-size', nargs='+', type=int, default=[640], help='inference size h,w')
    parser.add_argument('--conf-thres', type=float, default=0.25, help='confidence threshold')
    parser.add_argument('--iou-thres', type=float, default=0.45, help='NMS IoU threshold')
    parser.add_argument('--max-det', type=int, default=1000, help='maximum detections per image')
    parser.add_argument('--device', default='', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--view-img', action='store_true', help='show results')
    parser.add_argument('--save-txt', action='store_true', help='save results to *.txt')
    parser.add_argument('--save-conf', action='store_true', help='save confidences in --save-txt labels')
    parser.add_argument('--save-crop', action='store_true', help='save cropped prediction boxes')
    parser.add_argument('--nosave', action='store_true', help='do not save images/videos')
    parser.add_argument('--classes', nargs='+', type=int, help='filter by class: --classes 0, or --classes 0 2 3')
    parser.add_argument('--agnostic-nms', action='store_true', help='class-agnostic NMS')
    parser.add_argument('--augment', action='store_true', help='augmented inference')
    parser.add_argument('--visualize', action='store_true', help='visualize features')
    parser.add_argument('--update', action='store_true', help='update all models')
    parser.add_argument('--project', default=ROOT / 'runs/detect', help='save results to project/name')
    parser.add_argument('--name', default='exp', help='save results to project/name')
    parser.add_argument('--exist-ok', action='store_true', help='existing project/name ok, do not increment')
    parser.add_argument('--line-thickness', default=3, type=int, help='bounding box thickness (pixels)')
    parser.add_argument('--hide-labels', default=False, action='store_true', help='hide labels')
    parser.add_argument('--hide-conf', default=False, action='store_true', help='hide confidences')
    parser.add_argument('--half', action='store_true', help='use FP16 half-precision inference')
    parser.add_argument('--dnn', action='store_true', help='use OpenCV DNN for ONNX inference')
    opt = parser.parse_args()
    opt.imgsz *= 2 if len(opt.imgsz) == 1 else 1  # expand
    print_args(FILE.stem, opt)
    return opt

Parameter analysis:

--weights: the path address of the weight
--source: test data, which can be a picture/video path, or '0' (the computer comes with a camera), or a video stream such as rtsp --output: after network
prediction Image/video save path
--img-size: Network input image size
--conf-thres: Confidence threshold
--iou-thres: IOU threshold for nms
--device: GPU or CPU for reasoning
--view -img: Whether to display the picture/video after prediction, the default is False
--save-txt: Whether to save the predicted frame coordinates in the form of a txt file, the default is False
--classes: The setting only retains a certain category, such as 0 or 0 2 3
--agnostic-nms: Whether to remove the frames between different categories when performing nms, the default is False
--augment: Perform multi-scale, flipping and other operations (TTA) reasoning during reasoning

Modify --weights in the same way as train, and select the trained .pt file

1. Verify picture or video:

Put the picture under ./data/image, modify the path under --source, run detect.py, and save the verification result in ./runs/detect/exp

The same method is used to verify the video and verify the image

3. Verify with the camera:

Change --source to:

parser.add_argument('--source', type=str, default=0, help='file/dir/URL/glob, 0 for webcam')

The camera referred to by default is the same as the camera called by OpenCV's VideoCapture.

So far, we have completed training our own data set with yolov5.

Guess you like

Origin blog.csdn.net/WZT725/article/details/123416017