Fight recognition (AI+Python+PyQt5) (1)

        I recently worked on a fight recognition project. Feeling the lack of development materials at that time, I would like to make a summary for your reference. Gossip less, let's see the effect first.                     


1. Research Status

        At present, there are three mainstream methods for fighting detection, namely:

(1) Fight detection based on Detection. The main idea is: take fighting as a category, and detect fighting behavior by classification. At present, there are few studies in this area, and there are no publicly available data sets. If you want to go along this road, you need to bring your own data set and explore it yourself.

(2) Fight detection based on skeleton points. The main idea is to return the bone points of the human body through frameworks such as OpenPose, and then write logic based on the bone points to make judgments. At present, some people are doing fight detection based on this. However, if the personnel are entangled during the fight, it is more difficult to use bone points to make accurate judgments.

(3) Fight detection based on video understanding. The main idea is to judge based on timing. Fighting has a strong dependence on timing, and using target detection technology to identify fights is prone to false detection or missed detection. In addition, if the overlapping and occlusion of people is serious, the behavior recognition based on skeleton points has great limitations. The fight detection based on video understanding can better solve these problems. But this is also more difficult to achieve.


2. Selected plan

        I choose option 1 here, which is to do fight recognition based on target detection. As mentioned earlier, the current data set is very scarce. The author also searched repeatedly, and finally got a good data set from abroad. Considering that it is different from general target detection tasks, the author also marked the data set himself, without involving third-party personnel. The purpose is to ensure that the labeling is reasonable and accurate.

The basic flow is:

Labelme labeling -> Labeling data organization and format conversion -> Model training -> Deployment


2.1 Labeling

        The current development work is on win11, using the open source labelme tool. It is also the first time for the author to use this tool. After using it, I found that it is actually good, and the functions are very complete. In addition, the foreign data sets I got are in the form of videos, so the videos need to be converted into pictures first, and then labeled. For details, please refer to this article, which is well written.

Labelme annotation video https://www.pudn.com/news/623b0a3f49c1dc3c8980863b.html

Fig.1 Using Labelme for data labeling

         Using a few days of free time, the author marked thousands of pictures, and then eliminated some invalid images. The information of the final marked data set is as follows:

Fight dataset information
Markup tool Label me
dataset name fight dataset
Image Quantity/Format About 800 sheets/jpg
Image Resolution 1920*1080
Annotation file format json
Is it classified? no

2.2 Labeling data collation and format conversion

        The data labeled by Labelme cannot be directly used in training, and needs to be converted by yourself. Because the Yolo algorithm is going to be used, here we need to convert the Labelme format to the Yolo format. Here is the conversion script:

"""
2023.1.1
该代码实现了labelme导出的json文件,批量转换成yolo需要的txt文件,且包含了坐标归一化

原来labelme标注之后的是:1.jpg  1.json

经过该脚本处理后,得到的是1.jpg 1.json 1.txt

"""
import os
import numpy as np
import json
from glob import glob
import cv2
from sklearn.model_selection import train_test_split
from os import getcwd

classes = ["NOFight", "Fight", "Person"]
# 1.标签路径
labelme_path = "Data20200108/"
isUseTest = False  # 是否创建test集
# 3.获取待处理文件
files = glob(labelme_path + "*.json")
files = [i.replace("\\", "/").split("/")[-1].split(".json")[0] for i in files]
print(files)
if isUseTest:
    trainval_files, test_files = train_test_split(files, test_size=0.1, random_state=55)
else:
    trainval_files = files
# split
train_files, val_files = train_test_split(trainval_files, test_size=0.1, random_state=55)


def convert(size, box):
    dw = 1. / (size[0])
    dh = 1. / (size[1])
    x = (box[0] + box[1]) / 2.0 - 1
    y = (box[2] + box[3]) / 2.0 - 1
    w = box[1] - box[0]
    h = box[3] - box[2]
    x = x * dw
    w = w * dw
    y = y * dh
    h = h * dh
    return (x, y, w, h)


wd = getcwd()
print(wd)


def ChangeToYolo5(files, txt_Name):
    if not os.path.exists('tmp/'):
        os.makedirs('tmp/')
    list_file = open('tmp/%s.txt' % (txt_Name), 'w')
    for json_file_ in files:
        json_filename = labelme_path + json_file_ + ".json"
        imagePath = labelme_path + json_file_ + ".jpg"
        list_file.write('%s/%s\n' % (wd, imagePath))
        out_file = open('%s/%s.txt' % (labelme_path, json_file_), 'w')
        json_file = json.load(open(json_filename, "r", encoding="utf-8"))
        height, width, channels = cv2.imread(labelme_path + json_file_ + ".jpg").shape
        for multi in json_file["shapes"]:
            points = np.array(multi["points"])
            xmin = min(points[:, 0]) if min(points[:, 0]) > 0 else 0
            xmax = max(points[:, 0]) if max(points[:, 0]) > 0 else 0
            ymin = min(points[:, 1]) if min(points[:, 1]) > 0 else 0
            ymax = max(points[:, 1]) if max(points[:, 1]) > 0 else 0
            label = multi["label"]
            if xmax <= xmin:
                pass
            elif ymax <= ymin:
                pass
            else:
                cls_id = classes.index(label)
                b = (float(xmin), float(xmax), float(ymin), float(ymax))
                bb = convert((width, height), b)
                out_file.write(str(cls_id) + " " + " ".join([str(a) for a in bb]) + '\n')
                print(json_filename, xmin, ymin, xmax, ymax, cls_id)


ChangeToYolo5(train_files, "train")
ChangeToYolo5(val_files, "val")
# ChangeToYolo5(test_files, "test")

2.3 Model training

        Here I use the yolov5 algorithm with a better response, and use Pycharm for development. Here is a brief introduction to yolov5: yolov5 is a well-designed deep learning network, which is very useful for target detection tasks. At the same time, the author also thoughtfully provided a series of scripts such as model conversion and deployment, which can be said to be very easy to use. And the repo is constantly being updated, with many developers, choosing this model saves time, effort and worry. Here the author uses the yolov5-6.1 version, and other versions have not been verified.

Fig.2 yolov5-v6.1

         Download the repo locally and open it with pycharm:

Fig.3 pycharm open project

        Then use anaconda to configure the yolov5 environment. It will not be expanded here, friends who don’t know how to configure, you can refer to here: annconda configures virtual environment

        The author's graphics card is 3070Ti, so a virtual environment supporting cuda is configured, as shown in the figure below. Based on the cuda environment, the training speed is fast, which can reach 20-40 times that of the cpu. If there is no independent N card, you need to configure the virtual environment of the cpu, which can also be used, but the speed will be much slower.

Fig.4 anaconda configuration virtual environment

         The author uses torch1.12.1+cu113 here. I see that many blogs are still teaching you to use torch1.7.1, which is really misleading. Here, because yolov5 is used to train our own data set, we need to make some configurations here, including the configuration of the data set, the writing of configuration files, etc., and some detailed instructions are given below.


2.3.1 Dataset configuration:

         The data set for fighting is our Data20200108 here, in which there are train and val under images, and all the original images are stored in it. The labeled files are stored in labels, all of which are converted into txt files. Like images, they are also divided into train and val.

Fig.5 Configure your own dataset

        The train_list.txt and val_list.txt here are the path of the training data set image and the path of the test data set. The excerpt is for you to see:

datasets/Data20200108/images/train/cam1_100000.jpg
datasets/Data20200108/images/train/cam1_100001.jpg
datasets/Data20200108/images/train/cam1_100002.jpg
datasets/Data20200108/images/train/cam1_100003.jpg
datasets/Data20200108/images/train/cam1_100004.jpg
datasets/Data20200108/images/train/cam1_100005.jpg
datasets/Data20200108/images/train/cam1_100006.jpg
datasets/Data20200108/images/train/cam1_100007.jpg
datasets/Data20200108/images/train/cam1_100008.jpg

2.3.2 Configuration file writing:

        Under the data folder, we create a new fight_person.yaml script and write a configuration file.

Fig.6 Configuration file writing 

        Well, all the preparatory work has been completed. Next, we select the training script train.py and set the following parameters to train.

--data fight_person.yaml --weights yolov5s.pt --img 640 --batch-size 8  --device 0  --epochs 10 --workers 0

        Let me briefly introduce the meaning:

--data: The corresponding configuration script, which contains image path, category and other information

--weights: weight file, here I choose yolov5s.pt

--img: training resolution, the author chooses the default 640

--batch-size: batch size, this is related to the performance of the machine, the performance is good, the larger the number

--device: device, if there is no graphics card, write cpu, if it is a single graphics card, write 0, multiple cards write 0, 1, 2 according to the actual situation...

-epochs: the number of iterations


2.3.3 Start training

        After the training results, we got some training logs under run/train/, let's pick some key ones to see:

Fig.7 Training index graph

         The information in the above picture is relatively complete. We can clearly see that the loss is decreasing rapidly and the mAP is increasing rapidly. It means that it will converge soon. For detection tasks, the evaluation index is mainly mAP, for details, please refer to here:

mAP interpretation https://blog.csdn.net/HUAI_BI_TONG/article/details/121212733

        It can be simply understood that the larger the mAP, the better the effect. In the folder automatically generated by training, there are some pictures:

Fig.8 Manually marked box

Fig.9 Model detection box

        It can be seen that the effect is still very good. The marked frame and the actual frame can basically coincide, which means that our model fits well.


3. Conclusion    

        The above is the whole content of using yolov5 to train your own data set, including data set labeling, production, training environment configuration, training process analysis, etc. In the next article, we will do interface development. Artificial intelligence coupled with a gorgeous coat, quack to force.

Guess you like

Origin blog.csdn.net/opencv_yys/article/details/128609117