Getting started with pyTorch (5) - training your own data set

learn better from others,

be the better one.

—— "Weika Zhixiang"

419c279068cd36a2704020162893d3c7.jpeg

The length of this article is 1749 words , and it is expected to read for 5 minutes

foreword

The previous four articles have introduced the training of the Minist data set and the reasoning of OpenCV. In practical application projects, it is often necessary to use its own data set for training, so this article specifically introduces how pyTorch trains its own data set.

57242f16008f15a727e298f868261a8b.png

Micro card Zhixiang

Generate your own training images

In the previous article " Introduction to pyTorch (4) - Exporting the Minist Model, C++ OpenCV DNN for Recognition ", the reasoning of OpenCV was implemented using VS Studio, and it was introduced that the image needs to be preprocessed before reasoning, including grayscale and binarization , the search and sorting contours have been processed, so just modify the above code and save the extracted information, which is the data we want to train. First upload the source code:

#pragma once
#include<iostream>
#include<chrono>
#include<time.h>
#include<opencv2/opencv.hpp>
#include<opencv2/dnn/dnn.hpp>


using namespace cv;
using namespace std;


//参数iType  0-提取图片保存   1-使用DNN推理
int iType = 1;


dnn::Net net;


//排序矩形
void SortRect(vector<Rect>& inputrects) {
  for (int i = 0; i < inputrects.size(); ++i) {
    for (int j = i; j < inputrects.size(); ++j) {
      //说明顺序在上方,这里不用变
      if (inputrects[i].y + inputrects[i].height < inputrects[i].y) {


      }
      //同一排
      else if (inputrects[i].y <= inputrects[j].y + inputrects[j].height) {
        if (inputrects[i].x > inputrects[j].x) {
          swap(inputrects[i], inputrects[j]);
        }
      }
      //下一排
      else if (inputrects[i].y > inputrects[j].y + inputrects[j].height) {
        swap(inputrects[i], inputrects[j]);
      }
    }
  }
}


//处理DNN检测的MINIST图像,防止长方形图像直接转为28*28扁了
void DealInputMat(Mat& src, int row = 28, int col = 28, int tmppadding = 5) {
  int w = src.cols;
  int h = src.rows;
  //看图像的宽高对比,进行处理,先用padding填充黑色,保证图像接近正方形,这样缩放28*28比例不会失衡
  if (w > h) {
    int tmptopbottompadding = (w - h) / 2 + tmppadding;
    copyMakeBorder(src, src, tmptopbottompadding, tmptopbottompadding, tmppadding, tmppadding,
      BORDER_CONSTANT, Scalar(0));
  }
  else {
    int tmpleftrightpadding = (h - w) / 2 + tmppadding;
    copyMakeBorder(src, src, tmppadding, tmppadding, tmpleftrightpadding, tmpleftrightpadding,
      BORDER_CONSTANT, Scalar(0));


  }
  resize(src, src, Size(row, col));
}


// 获取当时系统时间
const string GetCurrentSystemTime()
{
  auto t = chrono::system_clock::to_time_t(std::chrono::system_clock::now());
  struct tm ptm { 60, 59, 23, 31, 11, 1900, 6, 365, -1 };
  _localtime64_s(&ptm, &t);
  char date[60] = { 0 };
  sprintf_s(date, "%d%02d%02d%02d%02d%02d",
    (int)ptm.tm_year + 1900, (int)ptm.tm_mon + 1, (int)ptm.tm_mday,
    (int)ptm.tm_hour, (int)ptm.tm_min, (int)ptm.tm_sec);
  return move(std::string(date));
}


int main(int argc, char** argv) {
  //定义onnx文件
  string onnxfile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/torchminist/ResNet.onnx";


  //测试图片文件
  string testfile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/testpic/test3.png";


  //提取的图片保存位置
  string savefile = "D:/Business/DemoTEST/CPP/OpenCVMinistDNN/findcontoursMat";


  if (iType == 1) {
    net = dnn::readNetFromONNX(onnxfile);
    if (net.empty()) {
      cout << "加载Onnx文件失败!" << endl;
      return -1;
    }
  }


  //读取图片,灰度,高斯模糊
  Mat src = imread(testfile);
  //备份源图
  Mat backsrc;
  src.copyTo(backsrc);
  cvtColor(src, src, COLOR_BGR2GRAY);
  GaussianBlur(src, src, Size(3, 3), 0.5, 0.5);
  //二值化图片,注意用THRESH_BINARY_INV改为黑底白字,对应MINIST
  threshold(src, src, 0, 255, THRESH_BINARY_INV | THRESH_OTSU);


  //做彭账处理,防止手写的数字没有连起来,这里做了3次膨胀处理
  Mat kernel = getStructuringElement(MORPH_RECT, Size(3, 3));
  //加入开运算先去燥点
  morphologyEx(src, src, MORPH_OPEN, kernel, Point(-1, -1));
  morphologyEx(src, src, MORPH_DILATE, kernel, Point(-1, -1), 3);
  imshow("src", src);


  vector<vector<Point>> contours;
  vector<Vec4i> hierarchy;
  vector<Rect> rects;


  //查找轮廓
  findContours(src, contours, hierarchy, RETR_EXTERNAL, CHAIN_APPROX_NONE);
  for (int i = 0; i < contours.size(); ++i) {
    RotatedRect rect = minAreaRect(contours[i]);
    Rect outrect = rect.boundingRect();
    //插入到矩形列表中
    rects.push_back(outrect);
  }


  //按从左到右,从上到下排序
  SortRect(rects);
  //要输出的图像参数
  for (int i = 0; i < rects.size(); ++i) {
    Mat tmpsrc = src(rects[i]);
    DealInputMat(tmpsrc);


    if (iType == 1) {
      //Mat inputBlob = dnn::blobFromImage(tmpsrc, 0.3081, Size(28, 28), Scalar(0.1307), false, false);
      Mat inputBlob = dnn::blobFromImage(tmpsrc, 1, Size(28, 28), Scalar(), false, false);


      //输入参数值
      net.setInput(inputBlob, "input");
      //预测结果 
      Mat output = net.forward("output");


      //查找出结果中推理的最大值
      Point maxLoc;
      minMaxLoc(output, NULL, NULL, NULL, &maxLoc);


      cout << "预测值:" << maxLoc.x << endl;


      //画出截取图像位置,并显示识别的数字
      rectangle(backsrc, rects[i], Scalar(255, 0, 255));
      putText(backsrc, to_string(maxLoc.x), Point(rects[i].x, rects[i].y), FONT_HERSHEY_PLAIN, 5, Scalar(255, 0, 255), 1, -1);
    }
    else {
      string filename = savefile + "/" + GetCurrentSystemTime() + "-" + to_string(i) + ".jpg";
      cout << filename << endl;
      imwrite(filename, tmpsrc);
    }
  }


  imshow("backsrc", backsrc);




  waitKey(0);
  return 0;
}

focus

696b35749fc2d08092aaea17c12bcf3e.png

Added a parameter, when setting, 0 is to extract and save the picture, and 1 is the reasoning of the previous article.

9b73b9524203bf40b821f0230a283166.png

Added a function to get the current time, the main function is to add the time to the file name when saving the picture.

a06bdbdd0839f168cd0eb8e90323b9e3.png

Added a place to save pictures

634a67e4e50c2621215c694fa2b13718.png

According to the above parameters, when it is set to 1, it is still the original DNN reasoning, and when it is 0, the image is saved through imwrite.

832dc656540202bd25a5f88deea31459.png

Next, we make some data sets ourselves, use drawing tools to write numbers on them, and make 10 pictures of the numbers from 0 to 9.

9663be5306a98fa4547d066a5315b660.png

6e3194a29168ced8937551dfed883c7a.png

79f8857319913e02716ae435ad402ca6.png

c6e6bd5458548371d1f7c98cbd0e86bd.png

The effect of running is as follows:

fa0b279295cfed140faa2d03a12cecfb.png

It can be seen that in the above figure, we have captured the picture of the number 9 separately and saved it to the specified directory.

db8689d46c2755d651e916d14e92e4b1.png

At the same time, create a mydata directory under Dataset, and create a directory for train training, and create a folder of 0-9 under the directory. The directory in this way will be directly set according to the different folder directories under the train when pyTorch calls it. label tag, we don’t need to compare each one. Correspondingly, the extracted digital pictures should also be placed in the corresponding directory .

4ce412583b43388e6f667e7fe32ea462.png

Cut the pictures of the number 9 just generated into the 9 folder, and use the same method for the rest of the numbers.

5cc7864cdaeb7777d1deca672abf98ab.png

The test test set is also processed in the same way, except that we deleted a large part after copying it, and then did other processing. After doing this, the preparation for extracting pictures is completed, and the next step is to train through pyTorch.

3e1769d38670b88898e73c7cf0744cb1.png

Micro card Zhixiang

pyTorch trains its own data set

420c8516d113837d9c52dde07fee8ab4.png

Created a new trainmydata.py file, the training process is actually similar to the original, but we are retraining on the basis of the original, so these models are to load the original training model first, and then train, or first the code

import torch
import time
from torchvision import datasets
from torch.utils.data import DataLoader
from torchvision import transforms
import torch.optim as optim
import matplotlib.pyplot as plt
from pylab import mpl
import trainModel as tm


##训练轮数
epoch_times = 15


##设置初始预测率,用于判断高于当前预测率的保存模型
toppredicted = 0.0


##设置学习率
learnrate = 0.01 
##设置动量值,如果上一次的momentnum与本次梯度方向是相同的,梯度下降幅度会拉大,起到加速迭代的作用
momentnum = 0.5


##自己训练的模型前面加个my
savemodel_name = "my" + tm.savemodel_name


##生成图用的数组
##预测值
predict_list = []
##训练轮次值
epoch_list = []
##loss值
loss_list = []


transform = transforms.Compose([
    transforms.Grayscale(num_output_channels=1),
    transforms.ToTensor(),
    transforms.Normalize(mean=(0.1307,), std=(0.3081,))
]) ##Normalize 里面两个值0.1307是均值mean, 0.3081是标准差std,计算好的直接用了


##训练数据集位置
train_mydata = datasets.ImageFolder(
    root = '../datasets/mydata/train', 
    transform = transform
)
train_mydataloader = DataLoader(train_mydata, batch_size=64, shuffle=True, num_workers=0)


##测试数据集位置
test_mydata = datasets.ImageFolder(
    root = '../datasets/mydata/test', 
    transform = transform
)
test_mydataloader = DataLoader(test_mydata, batch_size=1, shuffle=True, num_workers=0)




##加载已经训练好的模型
model = tm.Net(tm.train_name)
model.load_state_dict(torch.load(tm.savemodel_name))


##加入判断是CPU训练还是GPU训练
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
model.to(device)


##优化器 
optimizer = optim.SGD(model.parameters(), lr= learnrate, momentum= momentnum)


##训练函数
def train(epoch):
    model.train()
    for batch_idx, data in enumerate(train_mydataloader, 0):
        inputs, target = data
        ##加入CPU和GPU选择
        inputs, target = inputs.to(device), target.to(device)


        optimizer.zero_grad()


        #前馈,反向传播,更新
        outputs = model(inputs)
        loss = model.criterion(outputs, target)
        loss.backward()
        optimizer.step()


    loss_list.append(loss.item())
    print("progress:", epoch, 'loss=', loss.item())






def test():
    correct = 0 
    total = 0
    model.eval()
    ##with这里标记是不再计算梯度
    with torch.no_grad():
        for data in test_mydataloader:
            inputs, labels = data
            ##加入CPU和GPU选择
            inputs, labels = inputs.to(device), labels.to(device)




            outputs = model(inputs)
            ##预测返回的是两列,第一列是下标就是0-9的值,第二列为预测值,下面的dim=1就是找维度1(第二列)最大值输出
            _, predicted = torch.max(outputs.data, dim=1)


            total += labels.size(0)
            correct += (predicted == labels).sum().item()


    currentpredicted = (100 * correct / total)
    ##用global声明toppredicted,用于在函数内部修改在函数外部声明的全局变量,否则报错
    global toppredicted
    ##当预测率大于原来的保存模型
    if currentpredicted > toppredicted:
        toppredicted = currentpredicted
        torch.save(model.state_dict(), savemodel_name)
        print(savemodel_name+" saved, currentpredicted:%d %%" % currentpredicted)


    predict_list.append(currentpredicted)    
    print('Accuracy on test set: %d %%' % currentpredicted)        


##开始训练
timestart = time.time()
for epoch in range(epoch_times):
    train(epoch)
    test()
timeend = time.time() - timestart
print("use time: {:.0f}m {:.0f}s".format(timeend // 60, timeend % 60))






##设置画布显示中文字体
mpl.rcParams["font.sans-serif"] = ["SimHei"]
##设置正常显示符号
mpl.rcParams["axes.unicode_minus"] = False


##创建画布
fig, (axloss, axpredict) = plt.subplots(nrows=1, ncols=2, figsize=(8,6))


#loss画布
axloss.plot(range(epoch_times), loss_list, label = 'loss', color='r')
##设置刻度
axloss.set_xticks(range(epoch_times)[::1])
axloss.set_xticklabels(range(epoch_times)[::1])


axloss.set_xlabel('训练轮数')
axloss.set_ylabel('数值')
axloss.set_title(tm.train_name+' 损失值')
#添加图例
axloss.legend(loc = 0)


#predict画布
axpredict.plot(range(epoch_times), predict_list, label = 'predict', color='g')
##设置刻度
axpredict.set_xticks(range(epoch_times)[::1])
axpredict.set_xticklabels(range(epoch_times)[::1])
# axpredict.set_yticks(range(100)[::5])
# axpredict.set_yticklabels(range(100)[::5])


axpredict.set_xlabel('训练轮数')
axpredict.set_ylabel('预测值')
axpredict.set_title(tm.train_name+' 预测值')
#添加图例
axpredict.legend(loc = 0)


#显示图像
plt.show()

focus

22cb244fdcf4fa43784ef000cdc6e357.png

Add a my in front of the model file trained by yourself to not overwrite the original training model.

Load training set and test set

e9f3aebe4640a7592188864523459b18.png

In transform, a line of transforms.Grayscale(num_output_channels=1) is added. The main reason is that the file saved with imwrite in OpenCV is a binary image, but it is 3-channel. In pyTorch, our training data are all It is 1X28X28, that is, a single-channel image, so adding this sentence here is to set the read picture as a single-channel.

Use datasets.ImageFolder to directly read the data in the train directory, and automatically load the images and corresponding labels.

Load the trained model

71ae4d6479603740bce6094f709f95c2.png

The model model here is loaded directly through load_state_dict, and then trains its own data. The following training method is the same as the original train.

e00c31aa1abff71781568e86146dd6ac.png

4aa80b9bb402d525f1e20c6a8e52c462.png

Because I have very little data saved here, and the pictures in the test set are the same as those in the training set, and only 15 rounds of training have been performed, so it has reached 100% by the third round of training. Simply train your own dataset and you're done.

over

f7d7a0741f48d3512b9cf069c4706a1d.png

922088850941b926acc627227ca65eee.png

Wonderful review of the past

 

d8f73635e4ed1490e4abaca17a1c26c2.jpeg

Getting started with pyTorch (4) - export the Minist model, C++ OpenCV DNN for recognition

 

 

8a9bd2f215a3d2935dbf77425f29f973.jpeg

Getting started with pyTorch (3) - GoogleNet and ResNet training

 

 

1ed72c8a20a8e0f7295b23e5f0478060.jpeg

Getting Started with pyTorch (2) - Common Network Layer Functions and Convolutional Neural Network Training

 

Guess you like

Origin blog.csdn.net/Vaccae/article/details/128451088