learn better from others,
be the better one.
—— "Weika Zhixiang"
The length of this article is 1749 words , and it is expected to read for 5 minutes
foreword
The previous four articles have introduced the training of the Minist data set and the reasoning of OpenCV. In practical application projects, it is often necessary to use its own data set for training, so this article specifically introduces how pyTorch trains its own data set.
Micro card Zhixiang
Generate your own training images
In the previous article " Introduction to pyTorch (4) - Exporting the Minist Model, C++ OpenCV DNN for Recognition ", the reasoning of OpenCV was implemented using VS Studio, and it was introduced that the image needs to be preprocessed before reasoning, including grayscale and binarization , the search and sorting contours have been processed, so just modify the above code and save the extracted information, which is the data we want to train. First upload the source code:
#pragma once
#include<iostream>
#include<chrono>
#include<time.h>
#include<opencv2/opencv.hpp>
#include<opencv2/dnn/dnn.hpp>
using namespace cv;
using namespace std;
//参数iType 0-提取图片保存 1-使用DNN推理
int iType = 1;
dnn::Net net;
//排序矩形
void SortRect(vector<Rect>& inputrects) {
for (int i = 0; i < inputrects.size(); ++i) {
for (int j = i; j < inputrects.size(); ++j) {
//说明顺序在上方,这里不用变
if (inputrects[i].y + inputrects[i].height < inputrects[i].y) {
}
//同一排
else if (inputrects[i].y <= inputrects[j].y + inputrects[j].height) {
if (inputrects[i].x > inputrects[j].x) {
swap(inputrects[i], inputrects[j]);
}
}
//下一排
else if (inputrects[i].y > inputrects[j].y + inputrects[j].height) {
swap(inputrects[i], inputrects[j]);
}
}
}
}
//处理DNN检测的MINIST图像,防止长方形图像直接转为28*28扁了
void DealInputMat(Mat& src, int row = 28, int col = 28, int tmppadding = 5) {
int w = src.cols;
int h = src.rows;
//看图像的宽高对比,进行处理,先用padding填充黑色,保证图像接近正方形,这样缩放28*28比例不会失衡
if (w > h) {
int tmptopbottompadding = (w - h) / 2 + tmppadding;
copyMakeBorder(src, src, tmptopbottompadding, tmptopbottompadding, tmppadding, tmppadding,
BORDER_CONSTANT, Scalar(0));
}
else {
int tmpleftrightpadding = (h - w) / 2 + tmppadding;
copyMakeBorder(src, src, tmppadding, tmppadding, tmpleftrightpadding, tmpleftrightpadding,
BORDER_CONSTANT, Scalar(0));
}
resize(src, src, Size(row, col));
}
// 获取当时系统时间
const string GetCurrentSystemTime()
{
auto t = chrono::system_clock::to_time_t(std::chrono::system_clock::now());
struct tm ptm { 60, 59, 23, 31, 11, 1900, 6, 365, -1 };
_localtime64_s(&ptm, &t);
char date[60] = { 0 };
sprintf_s(date, "%d%02d%02d%02d%02d%02d",
(int)ptm.tm_year + 1900, (int)ptm.tm_mon + 1, (int)ptm.tm_mday,
(int)ptm.tm_hour, (int)ptm.tm_min, (int)ptm.tm_sec);
return move(std::string(date));
}
int main(int argc, char** argv) {
//定义onnx文件
string onnxfile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/torchminist/ResNet.onnx";
//测试图片文件
string testfile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/testpic/test3.png";
//提取的图片保存位置
string savefile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/findcontoursMat";
if (iType == 1) {
net = dnn::readNetFromONNX(onnxfile);
if (net.empty()) {
cout << "加载Onnx文件失败!" << endl;
return -1;
}
}
//读取图片,灰度,高斯模糊
Mat src = imread(testfile);
//备份源图
Mat backsrc;
src.copyTo(backsrc);
cvtColor(src, src, COLOR_BGR2GRAY);
GaussianBlur(src, src, Size(3, 3), 0.5, 0.5);
//二值化图片,注意用THRESH_BINARY_INV改为黑底白字,对应MINIST
threshold(src, src, 0, 255, THRESH_BINARY_INV | THRESH_OTSU);
//做彭账处理,防止手写的数字没有连起来,这里做了3次膨胀处理
Mat kernel = getStructuringElement(MORPH_RECT, Size(3, 3));
//加入开运算先去燥点
morphologyEx(src, src, MORPH_OPEN, kernel, Point(-1, -1));
morphologyEx(src, src, MORPH_DILATE, kernel, Point(-1, -1), 3);
imshow("src", src);
vector<vector<Point>> contours;
vector<Vec4i> hierarchy;
vector<Rect> rects;
//查找轮廓
findContours(src, contours, hierarchy, RETR_EXTERNAL, CHAIN_APPROX_NONE);
for (int i = 0; i < contours.size(); ++i) {
RotatedRect rect = minAreaRect(contours[i]);
Rect outrect = rect.boundingRect();
//插入到矩形列表中
rects.push_back(outrect);
}
//按从左到右,从上到下排序
SortRect(rects);
//要输出的图像参数
for (int i = 0; i < rects.size(); ++i) {
Mat tmpsrc = src(rects[i]);
DealInputMat(tmpsrc);
if (iType == 1) {
//Mat inputBlob = dnn::blobFromImage(tmpsrc, 0.3081, Size(28, 28), Scalar(0.1307), false, false);
Mat inputBlob = dnn::blobFromImage(tmpsrc, 1, Size(28, 28), Scalar(), false, false);
//输入参数值
net.setInput(inputBlob, "input");
//预测结果
Mat output = net.forward("output");
//查找出结果中推理的最大值
Point maxLoc;
minMaxLoc(output, NULL, NULL, NULL, &maxLoc);
cout << "预测值:" << maxLoc.x << endl;
//画出截取图像位置,并显示识别的数字
rectangle(backsrc, rects[i], Scalar(255, 0, 255));
putText(backsrc, to_string(maxLoc.x), Point(rects[i].x, rects[i].y), FONT_HERSHEY_PLAIN, 5, Scalar(255, 0, 255), 1, -1);
}
else {
string filename = savefile + "/" + GetCurrentSystemTime() + "-" + to_string(i) + ".jpg";
cout << filename << endl;
imwrite(filename, tmpsrc);
}
}
imshow("backsrc", backsrc);
waitKey(0);
return 0;
}
focus
Added a parameter, when setting, 0 is to extract and save the picture, and 1 is the reasoning of the previous article.
Added a function to get the current time, the main function is to add the time to the file name when saving the picture.
Added a place to save pictures
According to the above parameters, when it is set to 1, it is still the original DNN reasoning, and when it is 0, the image is saved through imwrite.
Next, we make some data sets ourselves, use drawing tools to write numbers on them, and make 10 pictures of the numbers from 0 to 9.
The effect of running is as follows:
It can be seen that in the above figure, we have captured the picture of the number 9 separately and saved it to the specified directory.
At the same time, create a mydata directory under Dataset, and create a directory for train training, and create a folder of 0-9 under the directory. The directory in this way will be directly set according to the different folder directories under the train when pyTorch calls it. label tag, we don’t need to compare each one. Correspondingly, the extracted digital pictures should also be placed in the corresponding directory .
Cut the pictures of the number 9 just generated into the 9 folder, and use the same method for the rest of the numbers.
The test test set is also processed in the same way, except that we deleted a large part after copying it, and then did other processing. After doing this, the preparation for extracting pictures is completed, and the next step is to train through pyTorch.
Micro card Zhixiang
pyTorch trains its own data set
Created a new trainmydata.py file, the training process is actually similar to the original, but we are retraining on the basis of the original, so these models are to load the original training model first, and then train, or first the code
import torch
import time
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.optim as optim
import matplotlib.pyplot as plt
from pylab import mpl
import trainModel as tm
##训练轮数
epoch_times = 15
##设置初始预测率,用于判断高于当前预测率的保存模型
toppredicted = 0.0
##设置学习率
learnrate = 0.01
##设置动量值,如果上一次的momentnum与本次梯度方向是相同的,梯度下降幅度会拉大,起到加速迭代的作用
momentnum = 0.5
##自己训练的模型前面加个my
savemodel_name = "my" + tm.savemodel_name
##生成图用的数组
##预测值
predict_list = []
##训练轮次值
epoch_list = []
##loss值
loss_list = []
transform = transforms.Compose([
transforms.Grayscale(num_output_channels=1),
transforms.ToTensor(),
transforms.Normalize(mean=(0.1307,), std=(0.3081,))
]) ##Normalize 里面两个值0.1307是均值mean, 0.3081是标准差std,计算好的直接用了
##训练数据集位置
train_mydata = datasets.ImageFolder(
root = '../datasets/mydata/train',
transform = transform
)
train_mydataloader = DataLoader(train_mydata, batch_size=64, shuffle=True, num_workers=0)
##测试数据集位置
test_mydata = datasets.ImageFolder(
root = '../datasets/mydata/test',
transform = transform
)
test_mydataloader = DataLoader(test_mydata, batch_size=1, shuffle=True, num_workers=0)
##加载已经训练好的模型
model = tm.Net(tm.train_name)
model.load_state_dict(torch.load(tm.savemodel_name))
##加入判断是CPU训练还是GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)
##优化器
optimizer = optim.SGD(model.parameters(), lr= learnrate, momentum= momentnum)
##训练函数
def train(epoch):
model.train()
for batch_idx, data in enumerate(train_mydataloader, 0):
inputs, target = data
##加入CPU和GPU选择
inputs, target = inputs.to(device), target.to(device)
optimizer.zero_grad()
#前馈,反向传播,更新
outputs = model(inputs)
loss = model.criterion(outputs, target)
loss.backward()
optimizer.step()
loss_list.append(loss.item())
print("progress:", epoch, 'loss=', loss.item())
def test():
correct = 0
total = 0
model.eval()
##with这里标记是不再计算梯度
with torch.no_grad():
for data in test_mydataloader:
inputs, labels = data
##加入CPU和GPU选择
inputs, labels = inputs.to(device), labels.to(device)
outputs = model(inputs)
##预测返回的是两列,第一列是下标就是0-9的值,第二列为预测值,下面的dim=1就是找维度1(第二列)最大值输出
_, predicted = torch.max(outputs.data, dim=1)
total += labels.size(0)
correct += (predicted == labels).sum().item()
currentpredicted = (100 * correct / total)
##用global声明toppredicted,用于在函数内部修改在函数外部声明的全局变量,否则报错
global toppredicted
##当预测率大于原来的保存模型
if currentpredicted > toppredicted:
toppredicted = currentpredicted
torch.save(model.state_dict(), savemodel_name)
print(savemodel_name+" saved, currentpredicted:%d %%" % currentpredicted)
predict_list.append(currentpredicted)
print('Accuracy on test set: %d %%' % currentpredicted)
##开始训练
timestart = time.time()
for epoch in range(epoch_times):
train(epoch)
test()
timeend = time.time() - timestart
print("use time: {:.0f}m {:.0f}s".format(timeend // 60, timeend % 60))
##设置画布显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
##设置正常显示符号
mpl.rcParams["axes.unicode_minus"] = False
##创建画布
fig, (axloss, axpredict) = plt.subplots(nrows=1, ncols=2, figsize=(8,6))
#loss画布
axloss.plot(range(epoch_times), loss_list, label = 'loss', color='r')
##设置刻度
axloss.set_xticks(range(epoch_times)[::1])
axloss.set_xticklabels(range(epoch_times)[::1])
axloss.set_xlabel('训练轮数')
axloss.set_ylabel('数值')
axloss.set_title(tm.train_name+' 损失值')
#添加图例
axloss.legend(loc = 0)
#predict画布
axpredict.plot(range(epoch_times), predict_list, label = 'predict', color='g')
##设置刻度
axpredict.set_xticks(range(epoch_times)[::1])
axpredict.set_xticklabels(range(epoch_times)[::1])
# axpredict.set_yticks(range(100)[::5])
# axpredict.set_yticklabels(range(100)[::5])
axpredict.set_xlabel('训练轮数')
axpredict.set_ylabel('预测值')
axpredict.set_title(tm.train_name+' 预测值')
#添加图例
axpredict.legend(loc = 0)
#显示图像
plt.show()
focus
Add a my in front of the model file trained by yourself to not overwrite the original training model.
Load training set and test set
In transform, a line of transforms.Grayscale(num_output_channels=1) is added. The main reason is that the file saved with imwrite in OpenCV is a binary image, but it is 3-channel. In pyTorch, our training data are all It is 1X28X28, that is, a single-channel image, so adding this sentence here is to set the read picture as a single-channel.
Use datasets.ImageFolder to directly read the data in the train directory, and automatically load the images and corresponding labels.
Load the trained model
The model model here is loaded directly through load_state_dict, and then trains its own data. The following training method is the same as the original train.
Because I have very little data saved here, and the pictures in the test set are the same as those in the training set, and only 15 rounds of training have been performed, so it has reached 100% by the third round of training. Simply train your own dataset and you're done.
over
Wonderful review of the past
Getting started with pyTorch (4) - export the Minist model, C++ OpenCV DNN for recognition
Getting started with pyTorch (3) - GoogleNet and ResNet training