OpenCV development notes (seventy-three): Fatty Red takes you to use opencv+dnn+yolov3 to identify objects in 8 minutes

If the text is the original article, reproduced, please indicate the original source
of this article blog address: https://blog.csdn.net/qq21497936/article/details/109201809
you readers, have poor human knowledge and infinite, or change the demand, either Find professionals, or study the blog post of Fatty Red (Red Imitation) by yourself
: development technology collection (including Qt practical technology, Raspberry Pi, 3D, OpenCV, OpenGL, ffmpeg, OSG, single-chip microcomputer, software and hardware combination, etc.) continue to update In... (click on the portal)

OpenCV development column (click on the portal)

Previous: " OpenCV Development Notes (72): Red Fat Man takes you to use opencv+dnn+tensorFlow to identify objects in 8 minutes "
Next: Continued to add...


Preface

  The effect of the cascade classifier is not very good, and the accuracy is relatively low relative to deep learning. The previous chapter used tensorflow in dnn, and this chapter uses the yolov3 model to identify specific classifications.


Demo

  320x320, confidence level 0.6
Insert picture description here
  608x608, confidence level 0.6 (608 in .cfg)
Insert picture description here


yolov3 model download

Insert picture description here

  CSDN: https://download.csdn.net/download/qq21497936/12995972
  QQ group: 1047134658 (click " file " to search for " yolov3 ", the group will be updated simultaneously with the blog post)


OpenCV deep recognition basic process

  Opencv3.4.x supports various models.

Supported models

  opencv3.4.x support what depth study of model:
- Caffe: .caffemodel
  official website: http://caffe.berkeleyvision.org
- tensorflow: .pb
  official website: https://www.tensorflow.org
- Torch: .t7 | .net
  official website: http
://torch.ch-darknet: .weights
  official website: https://pjreddie.com/darknet-DLDT
: .bin
  official website: https://software.intel.com/openvino-toolkit

Operation steps: yolov3

  Models generated by different deep learning frameworks have some differences in operation and data output. Sort out the steps to use the model trained by opencv using tensorflow.

Step 1: Read the classification file

  The model file corresponds to different classification files. The classification file is identified by the behavior, and the number of rows (starting from 0) is the number of classifications finally recognized.

std::string classesFile = "E:/qtProject/openCVDemo/dnnData/" \
                    "yolov3/coco.names";
// 读入分类名称,存入缓存
std::ifstream ifs(classesFile);
std::vector<std::string> classes;
std::string classLine;
while(std::getline(ifs, classLine))
{
    classes.push_back(classLine);
}

Step 2: Load the model and configuration files, and build a neural network.

  According to different models, use the cv::dnn::readNetFromXXX series functions to read, and the dnn model supported by the opencv3.4.x series (look up on the supported models).
  The yolov3 model is as follows:

std::string modelWeights = "E:/qtProject/openCVDemo/dnnData/" \
                       "yolov3/yolov3.weights";
std::string modelCfg = "E:/qtProject/openCVDemo/dnnData/" \
                       "yolov3/yolov3.cfg";
// 加载yolov3模型
cv::dnn::Net net = cv::dnn::readNetFromDarknet(modelCfg, modelWeights);
if(net.empty())
{
    qDebug() << __FILE__ << __LINE__ << "net is empty!!!";
    return;
}

Step 3: Add the picture to be predicted into the neural network

  After joining, you need to recognize the picture, then you need to input the picture into the neural network. Use the yolov3 model to pay special attention to normalization first, and then turn it into a picture of a specified size, as follows:

// 读取图片识别
mat = cv::imread("E:/testFile/15.jpg");
if(!mat.data)
{
    qDebug() << __FILE__ << __LINE__ << "Failed to read image!!!";
    return;
}
//    cv::dnn::blobFromImage(mat, blob);
//  必须要设置,否则会跑飞
cv::dnn::blobFromImage(mat,
                     blob,
                     1.0f/255,
                     cv::Size(320, 320),
                     cv::Scalar(0, 0, 0),
                     true,
                     false);
net.setInput(blob);

  The increase in width and height can improve the accuracy of detection. It is best to modify it according to the cfg file. This Demo is 320x320, and the actual .cfg file is 608x608. After testing, this is the pixel with the best recognition effect. Run away.
  Insert picture description here

Step 4: Classify prediction and obtain the recognition result

  After input, recognition is performed. The recognition is forward prediction (classification prediction), and the result is obtained. For the yolov3 model, there are 3 output layers, so you need to get 3 output layers first, and then you need to specify when predicting Predict these 3 output layers, otherwise they will run away.

// 获取输出的层
std::vector<cv::String> outPutNames;
std::vector<int> outLayers = net.getUnconnectedOutLayers();
for(int index = 0; index < outLayers.size(); index++)
{
    outPutNames.push_back(layerNames[outLayers[index] - 1]);
    qDebug() << __FILE__ << __LINE__
    << QString(layerNames[outLayers[index] - 1].c_str());
} 
// 推理预测:可以输入预测的图层名称
std::vector<cv::Mat> probs;
net.forward(probs, outPutNames);

  For the predicted result, stored in std::vectorcv::Mat type probs, each element is designated as cv::Mat type prob, each row represents a detected category, the specific column information is as follows:
  Insert picture description here
  (Note: specific Please refer to " Step 5" for the use of

Step 5: Classify and box select the mat that can be output by the confidence level

  The key output steps are different for different recognitions. Yolov3 is as shown below:

// 置信度预制,大于执行度的将其使用rect框出来
for(int index = 0; index < probs.size(); index++)
{
    for (int row = 0; row < probs[index].rows; row++)
    {
        // 获取probs中一个元素里面匹配对的所有对象中得分最高的
        cv::Mat scores = probs[index].row(row).colRange(5, probs[index].cols);
        cv::Point classIdPoint;
        double confidence;
        // Get the value and location of the maximum score
        cv::minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
        if(confidence > 0.6)
        {
            qDebug() << __FILE__ << __LINE__ << confidence << classIdPoint.x;
            int centerX = (int)(probs.at(index).at<float>(row, 0) * mat.cols);
            int centerY = (int)(probs.at(index).at<float>(row, 1) * mat.rows);
            int width   = (int)(probs.at(index).at<float>(row, 2) * mat.cols);
            int height  = (int)(probs.at(index).at<float>(row, 3) * mat.rows);
            int left = centerX - width / 2;
            int top = centerY - height / 2;
            cv::Rect objectRect(left, top, width, height);
            cv::rectangle(mat, objectRect, cv::Scalar(255, 0, 0), 2);
            cv::String label = cv::format("%s:%.4f",
                                          classes[classIdPoint.x].data(),
                                          confidence);
            cv::putText(mat,
                        label,
                        cv::Point(left, top - 10),
                        cv::FONT_HERSHEY_SIMPLEX,
                        0.4,
                        cv::Scalar(0, 0, 255));
            qDebug() << __FILE__ << __LINE__
                    << centerX << centerY << width << height;
        }
    }
}

Function prototype

Read yolov3 model and configuration file function prototype

Net readNetFromDarknet(const String &cfgFile,
                      const String &darknetModel = String());

  Read from the file.

  • Parameter 1 : The path of the .cfg file with the text description of the network architecture;
  • Parameter 2 : The path of the .weights file of the learned network;

Read the picture (need to identify) function prototype

void blobFromImage(InputArray image,
                  OutputArray blob,
                  double scalefactor=1.0,
                  const Size& size = Size(),
                  const Scalar& mean = Scalar(),
                  bool swapRB=false,
                  bool crop=false,
                  int ddepth=CV_32F);.

  Create area from image. You can choose to adjust and crop the image from the center.

  • Parameter 1 : Image input image (1, 3 or 4 channels);
  • Parameter 2 : Output image space;
  • Parameter 3 : The zoom factor multiplier of the image value;
  • Parameter 4 : The size of the output image space;
  • Parameter five : subtract the average scalar of the average value from the channel. The value is intentional. If the image has a BGR order and swapRB is true, it will be arranged in (mean-R, mean-G, mean-B) order;
  • Parameter six : swapRB mark, indicating the exchange of the first and last channel, it is necessary for the three-channel image;
  • Parameter seven : cropping flag, indicating whether to crop the image after resizing;
  • Parameter eight : the depth of the output blob, select CV_32F or CV_8U;

Set the neural network input function prototype

void cv::dnn::Net::setInput(InputArray blob,
                      const String& name = "",
                      double scalefactor = 1.0,
                      const Scalar& mean = Scalar());

  Set the new input value for the network.

  • Parameter one : a new blob. Should have CV_32F or CV_8U depth.
  • Parameter 2 : Enter the name of the layer.
  • Parameter 3 : Optional standardized scale.
  • Parameter 4 : Optional average subtraction value.

Return the names of all layers (in order of their own index)

std::vector<String> getLayerNames() const;

Returns the index of the layer with unconnected output.

std::vector<int> getUnconnectedOutLayers() const;

Deep detection and recognition (forward prediction) function prototype

void cv::dnn::Net::Mat forward(const String& outputName = String());

  Forward prediction, return the first output blob of the specified layer, generally return to the last layer, you can use cv::Net::getLayarNames() to get all the layer names.

  • Parameter 1 : outputName needs to obtain the name of the output layer

Demo

void OpenCVManager::testYoloV3()
{
    std::string classesFile = "E:/qtProject/openCVDemo/dnnData/" \
                              "yolov3/coco.names";
    std::string modelWeights = "E:/qtProject/openCVDemo/dnnData/" \
                          "yolov3/yolov3.weights";
    std::string modelCfg = "E:/qtProject/openCVDemo/dnnData/" \
                           "yolov3/yolov3.cfg";

    // 读入分类名称,存入缓存
    std::ifstream ifs(classesFile);
    std::vector<std::string> classes;
    std::string classLine;
    while(std::getline(ifs, classLine))
    {
        classes.push_back(classLine);
    }

    // 加载yolov3模型
    cv::dnn::Net net = cv::dnn::readNetFromDarknet(modelCfg, modelWeights);
    if(net.empty())
    {
        qDebug() << __FILE__ << __LINE__ << "net is empty!!!";
        return;
    }

    cv::Mat mat;
    cv::Mat blob;

    // 获得所有层的名称和索引
    std::vector<cv::String> layerNames = net.getLayerNames();
    int lastLayerId = net.getLayerId(layerNames[layerNames.size() - 1]);
    cv::Ptr<cv::dnn::Layer> lastLayer = net.getLayer(cv::dnn::DictValue(lastLayerId));
    qDebug() << __FILE__ << __LINE__
             << QString(lastLayer->type.c_str())
             << QString(lastLayer->getDefaultName().c_str())
             << QString(layerNames[layerNames.size()-1].c_str());

    // 获取输出的层
    std::vector<cv::String> outPutNames;
    std::vector<int> outLayers = net.getUnconnectedOutLayers();
    for(int index = 0; index < outLayers.size(); index++)
    {
        outPutNames.push_back(layerNames[outLayers[index] - 1]);
        qDebug() << __FILE__ << __LINE__ 
                 << QString(layerNames[outLayers[index] - 1].c_str());
    }

    while(true)
    {
        // 读取图片识别
        mat = cv::imread("E:/testFile/15.jpg");
        if(!mat.data)
        {
            qDebug() << __FILE__ << __LINE__ << "Failed to read image!!!";
            return;
        }

//        cv::dnn::blobFromImage(mat, blob);
        // 必须要设置,否则会跑飞
        cv::dnn::blobFromImage(mat,
                               blob,
                               1.0f/255,
                               cv::Size(320, 320),
                               cv::Scalar(0, 0, 0),
                               true,
                               false);
        net.setInput(blob);
        // 推理预测:可以输入预测的图层名称
        std::vector<cv::Mat> probs;
        net.forward(probs, outPutNames);

        // 显示识别花费的时间
        std::vector<double> layersTimes;
        double freq = cv::getTickFrequency() / 1000;
        double t = net.getPerfProfile(layersTimes) / freq;
        std::string label = cv::format("Inference time: %.2f ms", t);
        cv::putText(mat,
                  label,
                  cv::Point(0, 15),
                  cv::FONT_HERSHEY_SIMPLEX,
                  0.5,
                  cv::Scalar(255, 0, 0));
        // 置信度预制,大于执行度的将其使用rect框出来
        for(int index = 0; index < probs.size(); index++)
        {
            for (int row = 0; row < probs[index].rows; row++)
            {
                // 获取probs中一个元素里面匹配对的所有对象中得分最高的
                cv::Mat scores = probs[index].row(row).colRange(5, probs[index].cols);
                cv::Point classIdPoint;
                double confidence;
                // Get the value and location of the maximum score
                cv::minMaxLoc(scores, 0, &confidence, 0, &classIdPoint);
                if(confidence > 0.6)
                {
                    qDebug() << __FILE__ << __LINE__ << confidence << classIdPoint.x;
                    int centerX = (int)(probs.at(index).at<float>(row, 0) * mat.cols);
                    int centerY = (int)(probs.at(index).at<float>(row, 1) * mat.rows);
                    int width   = (int)(probs.at(index).at<float>(row, 2) * mat.cols);
                    int height  = (int)(probs.at(index).at<float>(row, 3) * mat.rows);
                    int left = centerX - width / 2;
                    int top = centerY - height / 2;
                    cv::Rect objectRect(left, top, width, height);
                    cv::rectangle(mat, objectRect, cv::Scalar(255, 0, 0), 2);
                    cv::String label = cv::format("%s:%.4f",
                                                  classes[classIdPoint.x].data(),
                                                  confidence);
                    cv::putText(mat,
                                label,
                                cv::Point(left, top - 10),
                                cv::FONT_HERSHEY_SIMPLEX,
                                0.4,
                                cv::Scalar(0, 0, 255));
                    qDebug() << __FILE__ << __LINE__
                            << centerX << centerY << width << height;
                }
            }
        }

        cv::imshow(_windowTitle.toStdString(), mat);
        cv::waitKey(0);
    }
}

Corresponding project template v1.65.0

  openCVDemo_v1.65.0_Basic template_yolov3 classification detection.rar.


Into the pit

Entry 1: Error when loading the model

error
  Insert picture description here

Cause The
  model file was loaded incorrectly.
Solve
  check whether the file exists, whether the path is correct, and whether the model file can be matched.

Into the pit 2: Error when entering blob

error
  Insert picture description here

The reason is
  that no parameters are entered when predicting, and parameters need to be entered (note: there is no problem if tensorflow is not entered).
solve
  Insert picture description here


Previous: " OpenCV Development Notes (72): Red Fat Man takes you to use opencv+dnn+tensorFlow to identify objects in 8 minutes "
Next: Continued to add...

Guess you like

Origin blog.csdn.net/qq21497936/article/details/109201809