opencv python train your own classifier

1. Classifier production

1. Sample preparation

Collect the positive samples and negative samples you need and save them in different folders.

Create a new project in pycharm. The project structure is as follows: positive samples are placed in the has_mask folder, and negative samples are placed in the no_mask folder.

Install opencv and copy the files in the opencv package to the project mask folder

2.Sample production

(1) Picture renaming

To facilitate batch processing of samples, we need to rename the samples. The renaming code is as follows:

import os
# 正样本的路径
path = r'E:\pycharmWorkspace\maskTest\mask\has_mask'
filelist = os.listdir(path)
# 开始文件名1000.jpg
count = 1000
for file in filelist:
    Olddir = os.path.join(path, file)
    if os.path.isdir(Olddir):
        continue
    filename = os.path.splitext(file)[0]
    filetype = os.path.splitext(file)[1]
    Newdir = os.path.join(path, str(count) + filetype)
    os.rename(Olddir, Newdir)
    count += 1

# 负样本的路径
path = r'E:\pycharmWorkspace\maskTest\mask\no_mask'
filelist = os.listdir(path)
# 开始文件名10000.jpg
count = 10000
for file in filelist:
    Olddir = os.path.join(path, file)
    if os.path.isdir(Olddir):
        continue
    filename = os.path.splitext(file)[0]
    filetype = os.path.splitext(file)[1]
    Newdir = os.path.join(path, str(count) + filetype)
    os.rename(Olddir, Newdir)
    count += 1

(2) Modify picture pixels

Modify the positive sample size to 20×20 to improve model training accuracy. The pixels of the negative sample data set should not be less than 50×50.

import cv2

# 代表正数据集中开始和结束照片的数字
for n in range(1000, 1099):
    path = r'C:\Users\Administrator\Desktop\mask\mask/' + str(n) + '.jpg'
    # 读取图片
    img = cv2.imread(path)
    img = cv2.resize(img, (20, 20))  # 修改样本像素为20x20
    cv2.imwrite(r'C:\Users\Administrator\Desktop\mask\mask/' + str(n) + '.jpg', img)
    n += 1

# 代表正数据集中开始和结束照片的数字
for n in range(10000, 10099):
    path = r'C:\Users\Administrator\Desktop\mask\no_mask/' + str(n) + '.jpg'
    # 读取图片
    img = cv2.imread(path)
    img = cv2.resize(img, (80, 80))  # 修改样本像素为80x80
    cv2.imwrite(r'C:\Users\Administrator\Desktop\mask\no_mask/' + str(n) + '.jpg', img)
    n += 1

The python opencv library is used here. It is installed with pip under the pycharm console. The following command can solve the problem of slow installation of the opencv library.

pip install -i https://pypi.tuna.tsinghua.edu.cn/simple opencv-python --no-cache-dir

3. Generate resource record files

Enter the has_mask folder in the console

Enter the following code to create a path file

dir /b/s/p/w *.jpg > have_mask.txt

At this time, a have_mask.txt file will be generated under have_mask and placed in the mask directory.

Enter the no_mask folder and repeat the above steps.

The final result is as follows

After that, you need to preprocess the positive sample. Add 1 0 0 20 20 to the end of have_mask.txt and execute the following code.

#后缀
Houzhui=r" 1 0 0 20 20"
filelist = open(r'E:\pycharmWorkspace\maskTest\mask\have_mask.txt','r+',encoding = 'utf-8')
line = filelist.readlines()
for file in line:
    file=file.strip('\n')+Houzhui+'\n'
    print(file)
    filelist.write(file)
    
filelist = open(r'E:\pycharmWorkspace\maskTest\mask\no_mask.txt','r+',encoding = 'utf-8')
line = filelist.readlines()
for file in line:
    file=file.strip('\n')+Houzhui+'\n'
    print(file)
    filelist.write(file)

4. Generate vec file

Go to the mask folder in the terminal console and enter the following command

opencv_createsamples.exe -vec havemask.vec -info have_mask.txt -num 400 -w 20 -h 20

Description of opencv_createsamples.exe parameters:


-vec <vec_file_name>
	输出文件，内含用于训练的正样本。他应该有一个.vec文件扩展名。

-info <file_name>
	这是指定输入示例集合的文件的名字，包括文件名和在图像中示例目标的位置（例如自己创建的.dat
	文件）。

-img <image_file_name>
	这是-info的替代（必须提供其中一个）。使用-img，你可以提供单个裁剪的正向示例。在使用-img的
	模式中，将产生多个输出，且都来自于这一个输入。

-bg <background_file_name>
	背景图像的描述文件，文件中包含一系列的图像文件名，这些图像将被随机选作物体的背景。

-num <number_of_samples>
	生成的正样本的数目。

-bgcolor <background_color>
	背景颜色（目前为灰度图）；背景颜色表示透明颜色。因为图像压缩可造成颜色偏差，颜色的容差
	可以由 -bgthresh 指定。所有处于 bgcolor-bgthresh 和 bgcolor+bgthresh 之间的像素都被设置为
	透明像素。

-bgthresh <background_color_threshold>

-inv
	如果指定该标志，前景图像的颜色将翻转。

-randinv
	如果指定该标志，颜色将随机地翻转。

-maxidev <max_intensity_deviation>
	前景样本里像素的亮度梯度的最大值。

-maxxangle <max_x_rotation_angle>
	X轴最大旋转角度，必须以弧度为单位。

-maxyangle <max_y_rotation_angle>
	Y轴最大旋转角度，必须以弧度为单位。

-maxzangle <max_z_rotation_angle>
	Z轴最大旋转角度，必须以弧度为单位。

-show
	很有用的调试选项。如果指定该选项，每个样本都将被显示。如果按下 Esc 键，程序将继续创建样
	本但不再显示。

-w <sample_width>
	输出样本的宽度（以像素为单位）。

-h <sample_height>
	输出样本的高度（以像素为单位）。

Get the havemask.vec file

5.Train the model

Create a new start.bat file in the current folder and add the following code

opencv_traincascade.exe -data xml -vec havemask.vec -bg no_mask.txt -numPos 100-numNeg 100-numStages 20 -w 20 -h 20 -mode ALL
 
pause

Execute start.bat in terminal

After the training is completed, you can see the following files under the xml file. The first file is our trained classifier.

2. Test classifier

Enter the following code

import cv2
#加载分类器
mask_detector = cv2.CascadeClassifier(r'E:\pycharmWorkspace\maskTest\mask\xml\cascade.xml')
img = cv2.imread(r'D:\0001.jpg')
#转成灰度图片
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
#进行预测
mask_face = mask_detector.detectMultiScale(gray, 1.1, 5, cv2.CASCADE_SCALE_IMAGE, (50,50), (200, 200))
for (x2, y2, w2, h2) in mask_face:
    cv2.rectangle(img, (x2, y2), (x2 + w2, y2 + h2), (0, 255, 0), 2)
    cv2.putText(img, "have_mask", (x2, y2), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 255, 0), 2)
    cv2.imshow('mask', img)
cv2.imshow('mask', img)
cv2.imwrite(r'D:/test.jpg', img)
cv2.waitKey()

The following test results were obtained, and the effect is not very good.

Source code download