OpenCV notes_5


Article Directory


note_5

feature point matching

DMatch stores matching results

insert image description here

The DMatch class is one of the data structures used to describe feature matching results in the OpenCV library. It provides member variables and methods for representing matching results. The following is a detailed explanation of the DMatch class:

class DMatch {
    
    
public:
    int queryIdx;      // 查询图像中的描述子索引
    int trainIdx;      // 训练图像中的描述子索引
    int imgIdx;        // 描述子所属的图像索引
    float distance;    // 描述子之间的距离

    DMatch();  // 默认构造函数
    DMatch(int _queryIdx, int _trainIdx, int _imgIdx, float _distance);  // 构造函数
};

The DMatch class has four member variables:

  1. queryIdx: Indicates the descriptor index in the query image. In feature matching, there is usually a query image and a training image, queryIdxrepresenting the index of the descriptor in the query image in the matching result.

  2. trainIdx: Denotes the descriptor index in the training image. In feature matching, trainIdxindicates the index of the descriptor in the training image in the matching result.

  3. imgIdx: Indicates the image index to which the descriptor belongs. In some specific scenarios, feature matching of multiple images may be involved imgIdxto indicate which image the descriptor belongs to.

  4. distance: Indicates the distance between descriptors. In feature matching, some distance metric is usually used to measure the similarity between descriptors, distancewhich represents this distance.

The DMatch class also provides two constructors:

  1. Default constructor DMatch(): Creates an uninitialized DMatch object.
  2. Constructor : Creates a DMatch object DMatch(int _queryIdx, int _trainIdx, int _imgIdx, float _distance)with given queryIdx, trainIdx, imgIdxand .distance

The DMatch class is used to store the result of feature matching, and can manage and process the matching information between descriptors by creating and manipulating DMatch objects.

DescriptorMatcher::match feature point descriptor (one-to-one) matching

insert image description here

cv::DescriptorMatcher::matchis a function in OpenCV for matching between a given two sets of feature descriptors. It can help to find correspondences with similar features between different images or point clouds.

The following is a detailed explanation of the function:

void DescriptorMatcher::match(
    InputArray queryDescriptors,     // 输入的查询描述符(特征)数组
    InputArray trainDescriptors,     // 输入的训练描述符(特征)数组
    std::vector<DMatch>& matches,    // 输出的匹配结果数组
    InputArray mask = noArray()      // 可选的掩码数组,用于指定哪些查询描述符应该与训练描述符进行匹配
)

Parameter Description:

  • queryDescriptors: Array of query descriptors (features) for input. Usually from query images or point clouds.
  • trainDescriptors: Array of input training descriptors (features). Usually from training images or point clouds.
  • matches: The output matching result array. matchesis a std::vector<DMatch>reference of type , where DMatchis a structure containing matching correspondences. Each DMatchcontains the following information:
    • queryIdx: The index of the query descriptor.
    • trainIdx: The index of the training descriptor.
    • imgIdx: Indicates the image index to which the descriptor belongs.
    • distance: A distance or similarity measure between descriptors.
  • mask: Optional array of masks specifying which query descriptors should be matched against the training descriptors. It must be of the same size as the number of query descriptors and of type CV_8UC1. If you don't need to use a mask, you can set it to the default noArray().

下面为BFMatcher 暴力匹配的结果matches,可以发现queryDescriptors(查询描述符:比如大小为n)每个索引都要和trainDescriptors(训练描述符:比如大小为m)的索引匹配一次,所以时间复杂度为 n*m
insert image description here

DescriptorMatcher::knnMatch feature point descriptor (one-to-many) matching

insert image description here

DescriptorMatcher::knnMatchis one of the functions used for feature matching in OpenCV. It can be used to match descriptors of images or feature points.

The following is knnMatcha detailed explanation of the function:

void DescriptorMatcher::knnMatch(
    InputArray queryDescriptors,
    InputArray trainDescriptors,
    std::vector<std::vector<DMatch>>& matches,
    int k,
    InputArray mask = noArray(),
    bool compactResult = false
)

parameter:

  • queryDescriptors: Query the descriptor of an image or feature point. Usually a descriptor matrix extracted by a feature point detector (such as SIFT, SURF, etc.).
  • trainDescriptors: Descriptors of training images or feature points. Usually a descriptor matrix extracted by a feature point detector.
  • matches: The output matching result is a two-dimensional vector. queryDescriptors[i]One to many trainDescriptors. Traverse matches.
  • k: Specifies the number of best matches to return. For each query descriptor, the function returns the top ktraining descriptors that best match it.
  • masks: Optional mask vector for filtering matches. If a mask is provided, only descriptors with non-zero positions corresponding to the mask will be matched.
  • compactResult: Parameters used only for masks. If compactResult is false, the size of the match vector is the same as the row size of queryDescriptors. If compactResult is true, the size of the matching vector is equal to the number of rows with a mask greater than 0.
    A boolean value indicating whether to store match results in compact form. If yes true, only the best matches per query descriptor are returned k, rather than all possible matches. The function uses the k-nearest neighbors algorithm to find the best match. Matches are sorted in ascending order of distance, best match first.
    insert image description here
	cv::Mat mask(descriptors1.rows, descriptors2.rows, CV_8U, cv::Scalar(0));
	mask.row(0) = 255;

	vector<vector<DMatch>> matches, matches1;//定义存放匹配结果的变量
	BFMatcher matcher(NORM_HAMMING);//定义特征点匹配的类-暴力匹配,使用汉明距离
	matcher.knnMatch(descriptors1, descriptors2,matches, 200, mask,true);//进行特征点匹配
	matcher.knnMatch(descriptors1, descriptors2, matches1, 200, mask, false);//进行特征点匹配

insert image description hereinsert image description here

DescriptorMatcher::radiusMatch feature point descriptor (one-to-many) specific range matching

insert image description here

DescriptorMatcher::radiusMatchis a function in OpenCV for radius matching in a given descriptor set. The main role of this function is to find the closest training descriptor to each query descriptor, and the distance is less than or equal to the given radius value.

The following is DescriptorMatcher::radiusMatcha detailed explanation of the function:

void DescriptorMatcher::radiusMatch(
    InputArray queryDescriptors,          // 查询描述子集
    InputArray trainDescriptors,
    std::vector<std::vector<DMatch>>& matches,     // 存储匹配结果的二维向量
    float maxDistance,                    // 最大距离阈值
    InputArrayOfArrays masks = noArray(),  // 可选的掩码
    bool compactResult = false            // 是否只存储最佳匹配
) const;

Parameter explanation:

  • queryDescriptors: An input array containing a collection of query descriptors. Each query descriptor is a floating-point vector, usually generated by a feature detector.
  • trainDescriptors: Descriptors of training images or feature points. Usually a descriptor matrix extracted by a feature point detector.
  • matches: A two-dimensional vector used to store the matching results. The matching result of each query descriptor will be stored as a vector of DMatch objects, representing the index and distance of the matched training descriptor.
  • maxDistance: Maximum distance threshold. Only matches with a distance less than or equal to this value will be accepted.
  • masks(optional parameter): An optional array of masks specifying which query and training descriptor pairs participate in matching. If a mask is provided, only descriptors corresponding to a non-zero mask will be matched.
  • compactResult(optional parameter): A boolean value that specifies whether to store only the best match (closest match) for each query descriptor. If true, matchesthe size of the vector will be the same as the number of query descriptors; if false, matchesthe size of the vector will not necessarily be the same as the number of query descriptors, depending on how many matches are within the given maximum distance.
`maxDistance`是`DescriptorMatcher::radiusMatch`函数的一个参数,它表示最大距离阈值。在执行半径匹配时,只有距离小于或等于`maxDistance`的匹配才会被接受。
具体来说,对于每个查询描述子,`radiusMatch`函数会在训练描述子集中找到所有距离小于或等于`maxDistance`的匹配。匹配结果将存储在`matches`参数指定的二维向量中。
通过调整`maxDistance`的值,可以控制匹配的严格程度。较小的`maxDistance`值将导致更严格的匹配条件,只有非常相似的描述子才会被匹配。较大的`maxDistance`值则允许更宽松的匹配条件,更多的描述子对可能被认为是匹配的。
需要根据具体应用场景和描述子的特点来选择合适的`maxDistance`值。一般来说,可以通过试验和调整来找到适合的阈值,以达到最佳的匹配效果。

BFMatcher violent matching - introduced in this article (Introduction of Feature2D Class)

insert image description here

drawMatches draw feature point matching results

insert image description here

drawMatchesfunction is a function in OpenCV for drawing matching feature points between two images. It connects the feature points in the two images and draws these connecting lines in the resulting image.

The function prototype is as follows:

void drawMatches(
    InputArray img1, // 第一幅图像
    const std::vector<KeyPoint>& keypoints1, // 第一幅图像的特征点
    InputArray img2, // 第二幅图像
    const std::vector<KeyPoint>& keypoints2, // 第二幅图像的特征点
    const std::vector<DMatch>& matches, // 特征点匹配结果
    OutputArray outImg, // 输出的匹配结果图像
    const Scalar& matchColor = Scalar::all(-1), // 连接线的颜色
    const Scalar& singlePointColor = Scalar::all(-1), // 特征点的颜色
    const std::vector<char>& matchesMask = std::vector<char>(), // 特征点匹配掩码
    int flags = DrawMatchesFlags::DEFAULT // 绘制匹配特征点的标志
);

Parameter Description:

  • img1: the first image (can be a grayscale image or a color image)
  • keypoints1: The feature points of the first image, the type is std::vector<KeyPoint>, each feature point contains its coordinates and other attributes.
  • img2: the second image (with img1the same type as
  • keypoints2: The feature points of the second image, with keypoints1the same type as .
  • matches: Feature point matching result, the type is std::vector<DMatch>, indicating the feature point matching between two images.
  • outImg: The output matching result image, the type is OutputArray.
  • matchColor: The color of the connection line, which can be specified as Scalara value of type, and the default is Scalar::all(-1)a random color.
  • singlePointColor: The color of the feature point, which can be specified as Scalara value of type, and the default is Scalar::all(-1)a random color.
  • matchesMask: Mask vector identifying which matches are valid. The length should be matches1to2the same as the length of and is empty by default.
  • flags: draws the matching flag, can be a combination of the following values:
    • DrawMatchesFlags::DEFAULT: Default flag, show all matches.
    • DrawMatchesFlags::DRAW_OVER_OUTIMG: outImgDraw matches on, the default is to draw on a blank image.
    • DrawMatchesFlags::NOT_DRAW_SINGLE_POINTS: Do not draw individual feature points.
    • DrawMatchesFlags::DRAW_RICH_KEYPOINTS: Draw the size and direction of feature points.
      Functions can be used drawMatchesto easily visualize the feature point matching results, helping us analyze and understand the effect of feature point matching.

Code demo:

void features(Mat &img, vector<KeyPoint> &keypoints, Mat &descriptors)
{
    
    
	Ptr<ORB> orb = ORB::create(1000);
	orb->detectAndCompute(img, Mat(), keypoints, descriptors);


	//cv::Ptr<cv::SIFT> detector = cv::SIFT::create();
	//detector->detectAndCompute(img, cv::noArray(), keypoints, descriptors);
}


Mat img1 = imread("box_in_scene.jpg");
	Mat img2 = imread("box.jpg");
	if (!(img1.data && img2.data))
	{
    
    
		cout << "图像打开失败。。。" << endl;
		return -1;
	}

	//提取ORB特征点和描述子
	vector<KeyPoint> keypoints1, keypoints2;
	Mat descriptors1, descriptors2;

	//计算特征点
	features(img1, keypoints1, descriptors1);
	features(img2, keypoints2, descriptors2);

	//特征点匹配
	vector<DMatch> matches;//定义存放匹配结果的变量
	BFMatcher matcher(NORM_HAMMING);//定义特征点匹配的类-暴力匹配,使用汉明距离
	matcher.match(descriptors1, descriptors2, matches);//进行特征点匹配




	//cv::Mat mask(descriptors1.rows, descriptors2.rows, CV_8U, cv::Scalar(0));
	//mask.row(0) = 255;
	//mask.at<uchar>(0, 0) = 0;
	//vector<vector<DMatch>> matches, matches1;//定义存放匹配结果的变量
	//BFMatcher matcher(NORM_HAMMING);//定义特征点匹配的类-暴力匹配,使用汉明距离
	//matcher.knnMatch(descriptors1, descriptors2,matches, 1000, mask,true);//进行特征点匹配
	//matcher.knnMatch(descriptors1, descriptors2, matches1, 1000, mask, false);//进行特征点匹配




	cout << "matches:" << matches.size() << endl;

	double min_dist = 1000, max_dist = 0;
	for (int i = 0; i < matches.size(); ++i)
	{
    
    
		double dist = matches[i].distance;
		if (dist < min_dist)
			min_dist = dist;
		else if (dist > min_dist)
			max_dist = dist;
	}

	//输出所有匹配结果中最大汉明距离和最小汉明距离
	cout << "min_dist:" << min_dist << endl;
	cout << "max_dist:" << max_dist << endl;


	vector<DMatch> good_matches;
	//将汉明距离最大的匹配点对删除
	for (int i = 0; i < matches.size(); ++i)
	{
    
    
		if (matches[i].distance <= max(2 * min_dist, 20.0))
		{
    
    
			good_matches.push_back(matches[i]);
		}
	}
	//剩余特征点数目
	cout << "good_min:" << good_matches.size() << endl;


	//绘制匹配结果
	Mat outimg1, outimg2;
	drawMatches(img1, keypoints1, img2, keypoints2, matches, outimg1);
	drawMatches(img1, keypoints1, img2, keypoints2, good_matches, outimg2);

	namedWindow("未删选结果",WINDOW_NORMAL);
	namedWindow("最小汉明距离筛选", WINDOW_NORMAL);
	imshow("未删选结果", outimg1);
	imshow("最小汉明距离筛选", outimg2);

insert image description here

RANSAC optimization feature point algorithm

insert image description here

findHmomography homography transformation: optimize feature points

insert image description here

cv::findHomographyThe function is the function used to calculate the homography matrix in the OpenCV library. The homography matrix is ​​a 3x3 matrix used to describe the projective transformation relationship between two planes. In computer vision, homography matrices are commonly used in tasks such as image rectification, feature matching, and image stitching.

Here is cv::findHomographya detailed explanation of the function:

cv::Mat cv::findHomography(
    InputArray srcPoints,   // 输入源点的坐标,可以是一个单通道浮点型点集,或者一个包含点集的矩阵
    InputArray dstPoints,   // 输入目标点的坐标,与srcPoints有相同的数据类型和大小
    int method,             // 单应性矩阵计算方法的标志位,常用的有RANSAC和LMEDS ,默认值为0:最小二乘法
    double ransacReprojThreshold = 3,   // RANSAC算法中的重投影阈值,用于判断内点和外点
    OutputArray mask = noArray(),       // 输出的掩码矩阵,标记计算出的内点和外点
    const int maxIters = 2000,          // 最大迭代次数,用于RANSAC和LMEDS算法
    const double confidence = 0.995     // RANSAC和LMEDS算法的置信度
);

Parameter explanation:

  • srcPointsand dstPoints: are the points in the source image and the corresponding points in the target image, respectively. These points can be obtained by feature matching algorithms (such as SIFT, SURF, etc.). srcPointsThe data type of and dstPointscan be a single-channel floating-point point set or a matrix containing a point set.
  • method: The flag bit of the homography matrix calculation method. There are two commonly used methods: cv::RANSACand cv::LMEDS. The RANSAC method is based on a random sampling consensus algorithm, while the LMEDS method uses the least median difference estimate.
  • ransacReprojThreshold: The reprojection threshold of the RANSAC algorithm, used to judge inliers and outliers. When the reprojection error of a point is less than the threshold, the point is considered to be an inlier, otherwise it is an outlier. The default value is 3.
  • mask: The output mask matrix used to label the computed inliers and outliers. Inner points correspond to non-zero values ​​in the mask matrix, and outer points correspond to zero values. If you don't need this mask matrix, you can set it to cv::noArray().
  • maxIters: The maximum number of iterations for the RANSAC and LMEDS algorithms. The default value is 2000.
  • confidence: Confidence scores for RANSAC and LMEDS algorithms. This value represents the confidence in the correctness of the computed homography matrix. The default value is 0.995, indicating a 99.5% confidence level.

return value:

  • The function returns a 3x3 double-precision floating-point matrix, which is the calculated homography matrix. If no homography matrix can be computed, an empty matrix is ​​returned.

method

cv::findHomographyThe parameters in the function methodspecify the method of homography calculation. OpenCV provides four commonly used methods, which are introduced below:

  1. 0The normal method using all points, i.e. least squares (default)

  2. cv::RANSAC = 8(RANdom SAmple Consensus):
    RANSAC is a statistically based iterative algorithm for estimating mathematical model parameters. When calculating the homography matrix, the RANSAC method randomly selects a small set of point pairs, and then calculates the homography matrix based on these point pairs. Then, it calculates the reprojection error of other points, and based on the reprojection error and the threshold value, it judges whether the point belongs to the inner point or the outer point. The algorithm iterates until it finds a set of homography estimates with the largest number of inliers.

  3. cv::LMEDS = 4(Least Median of Squares):
    The LMEDS method is also an iterative algorithm for estimating mathematical model parameters. Unlike RANSAC, the LMEDS method uses median differences to estimate model parameters instead of mean squared errors. The median difference can better resist the interference of outliers. The LMEDS method randomly selects a small set of point pairs and computes a homography matrix from these point pairs. Then, it calculates the reprojection error of other points, and based on the median difference and a threshold, it decides whether the point is an in-point or an out-point. The algorithm iterates until it finds a set of homography estimates with the largest number of inliers.

  4. cv::RHO = 16(Randomized HOmography):
    The RHO method is an improved random sampling consensus algorithm for estimating mathematical model parameters. It is optimized on the basis of the RANSAC algorithm by randomly selecting fewer point pairs for estimation. This speeds up calculations while maintaining good accuracy. The RHO method is similar to the RANSAC method in computing the homography matrix, but uses a different sampling strategy.

mask

内点的元素值为非零值为1,对应于外点的元素值为0
insert image description here

In cv::findHomographythe function, maskthe parameter is an output parameter that marks the computed inliers and outliers. It is a mask matrix corresponding to the input source and target points, marking which points are inliers and which are outliers.
The mask matrix is ​​a matrix of the same size as the source and destination points, where the elements are 0 or non-zero. Elements corresponding to interior points have a nonzero value of 1, and elements corresponding to exterior points have a value of 0. By examining the element values ​​in the mask matrix, it is possible to determine which points are inliers and which are outliers.
When using cv::findHomographythe function, if you need to get the information of the inner point and outer point, you can pass a cv::Mattype of parameter as it mask. For example:

cv::Mat mask;
cv::Mat homography = cv::findHomography(srcPoints, dstPoints, cv::RANSAC, 3.0, mask);

In the above example, maskit is the output mask matrix used to mark the computed inliers and outliers. The elements in the matrix corresponding to the interior points maskhave nonzero values, and the elements corresponding to the exterior points have the value 0.
It should be noted that if you do not need to obtain the information of the inner point and the outer point, you can set maskthe parameter to cv::noArray(), that is, do not specify the output mask matrix.
After the mask matrix is ​​obtained, inliers and outliers can be extracted according to the mask matrix for subsequent processing and analysis.

什么数内点和外点

When computing the homography matrix, inliers and outliers are determined based on the comparison of the reprojection error with a threshold.
Inliers refer to the points whose reprojection error is less than a given threshold, they are consistent with the calculated homography matrix, and can be considered as correctly matched point pairs. Inliers correspond to feature points or keypoints with good correspondence in the image.
Outliers refer to the points whose reprojection error is greater than or equal to a given threshold, they are inconsistent with the calculated homography matrix, and may be point pairs with matching errors or noise. Outliers correspond to feature points or keypoints in the image that do not have a correct correspondence.
In cv::findHomographythe function, by specifying the reprojection threshold ( ransacReprojThresholdparameter), the function calculates the reprojection error for each point and compares the reprojection error with the threshold. If the reprojection error is less than a threshold, the point is marked as an inlier, otherwise it is marked as an outlier.
The calculated inlier and outlier information can be maskobtained through the output mask matrix (parameter). The value of the element corresponding to the inner point in the mask matrix is ​​non-zero, and the value of the element corresponding to the outer point is 0. By examining the element values ​​in the mask matrix, it is possible to determine which points are inliers and which are outliers.
The distinction between internal and external points is very important for subsequent image processing and analysis, which can eliminate the interference of external points and improve the accuracy and robustness of the algorithm.

RANSAC 是什么样的算法,它和最小二乘法有什么不一样

RANSAC (RANdom SAmple Consensus) is a statistically based iterative algorithm for estimating mathematical model parameters. The RANSAC algorithm is suitable for situations where there are outliers in the data set, and can effectively resist the interference of outliers.

The basic idea of ​​the RANSAC algorithm is to estimate the model parameters by randomly sampling a small group of data points, then calculate the fitting degree of other data points according to the estimated model parameters, and judge whether the data points belong to the internal model according to the preset threshold (reprojection threshold). Points (inliers) or outside points (outliers). The algorithm iteratively executes the above process until the model parameter estimate with the largest number of interior points is found.

Compared with RANSAC, Least Squares is a classic optimization method for fitting data and estimating parameters. The method of least squares solves for optimal parameters by minimizing the sum of squared errors between the data points and the model. Least squares assumes that there are no outliers in the data, and its goal is to find parameter values ​​that minimize error.

One of the main characteristics of the least squares method is that it is very sensitive to outliers, that is, a single outlier can have a large impact on the fitting results. Least squares can lead to inaccurate model fitting when there are outliers in the dataset. The RANSAC algorithm can obtain a more robust model parameter estimation in the presence of outliers through the strategy of random sampling and judgment of inside and outside points.

Therefore, the RANSAC algorithm and the least squares method differ in handling data with outliers. RANSAC selects the optimal model parameters through iteration and judgment of inside and outside points, which can better resist the interference of outliers, while the least squares method will be affected by outliers, which will easily cause the fitting results to deviate from the real model.

RANSAC实现

The implementation steps of the RANSAC algorithm are as follows:

  1. Randomly selecting a small fraction of data points from a dataset as a sample is often referred to as the sampling step.
  2. Based on the selected samples, estimate the model parameters.
  3. For other unselected data points, the error between them and the estimated model is calculated and compared with the preset threshold. If the error is less than a threshold, the data point is marked as an inlier; otherwise, it is marked as an outlier.
  4. Count the number of inliers.
  5. Repeat the above steps several times, and select the model parameters with the most interior points as the final estimation result.

Below is a simple pseudocode example demonstrating the implementation of the RANSAC algorithm:

输入: 数据集 D,迭代次数 N,样本数量 K,阈值 T

bestModel = null
bestInliersCount = 0

for i = 1 to N do
    // 1. 从数据集中随机选择样本
    sample = 随机从数据集 D 中选择 K 个数据点
    
    // 2. 估计模型参数
    model = 根据样本计算模型参数
    
    inliersCount = 0
    inliers = 空集合
    
    for each 数据点 p in 数据集 D do
        // 3. 计算数据点与估计模型之间的误差
        error = 计算数据点 p 与模型之间的误差
        
        if error < T then
            // 4. 将误差小于阈值的数据点标记为内点
            inliersCount = inliersCount + 1
            inliers.add(p)
    
    if inliersCount > bestInliersCount then
        // 保存内点最多的模型参数
        bestModel = model
        bestInliersCount = inliersCount
    
    if inliersCount > 阈值(例如,大于数据集的一定百分比) then
        // 提前结束迭代,达到一定置信度
        
// 返回具有最多内点的模型参数
返回 bestModel

The above pseudo code is just a simple example, and may need to be appropriately modified and optimized according to specific situations in actual implementation. The core idea of ​​the RANSAC algorithm is to select the optimal model parameters through random sampling and judgment of interior and exterior points, which can be appropriately adjusted and improved according to specific problems and data set characteristics.

The following is a sample code that uses C++ to implement the RANSAC algorithm, taking the parameter estimation of a two-dimensional straight line as an example:

#include <iostream>
#include <vector>
#include <cmath>
#include <random>

struct Point {
    
    
    double x;
    double y;
};

struct Line {
    
    
    double a;
    double b;
};

Line estimateLineRANSAC(const std::vector<Point>& points, int iterations, double threshold) {
    
    
    std::random_device rd;
    std::default_random_engine gen(rd());
    std::uniform_int_distribution<int> dist(0, points.size() - 1);

    Line bestLine;
    int bestInliersCount = 0;

    for (int i = 0; i < iterations; i++) {
    
    
        // Step 1: Randomly select two points
        int index1 = dist(gen);
        int index2 = dist(gen);
        const Point& p1 = points[index1];
        const Point& p2 = points[index2];

        // Step 2: Estimate line parameters
        double a = (p2.y - p1.y) / (p2.x - p1.x);
        double b = p1.y - a * p1.x;

        int inliersCount = 0;

        // Step 3: Count inliers
        for (const Point& p : points) {
    
    
            double distance = std::abs(a * p.x - p.y + b) / std::sqrt(a * a + 1);
            if (distance < threshold) {
    
    
                inliersCount++;
            }
        }

        if (inliersCount > bestInliersCount) {
    
    
            // Update best line parameters
            bestLine.a = a;
            bestLine.b = b;
            bestInliersCount = inliersCount;
        }
    }

    return bestLine;
}

int main() {
    
    
    // Generate some sample points
    std::vector<Point> points = {
    
    {
    
    1.0, 1.2}, {
    
    2.0, 2.8}, {
    
    3.0, 3.6}, {
    
    4.0, 4.4}, {
    
    5.0, 5.2},
                                 {
    
    6.0, 6.8}, {
    
    7.0, 7.6}, {
    
    8.0, 8.4}, {
    
    9.0, 9.2}, {
    
    10.0, 10.8},
                                 {
    
    11.0, 11.6}, {
    
    12.0, 12.4}, {
    
    13.0, 13.2}, {
    
    14.0, 14.8}, {
    
    15.0, 15.6},
                                 {
    
    16.0, 16.4}, {
    
    17.0, 17.2}, {
    
    18.0, 18.8}, {
    
    19.0, 19.6}, {
    
    20.0, 20.4}};

    // Estimate line parameters using RANSAC
    int iterations = 1000;
    double threshold = 0.5;
    Line line = estimateLineRANSAC(points, iterations, threshold);

    // Print the estimated line parameters
    std::cout << "Estimated line: y = " << line.a << "x + " << line.b << std::endl;

    return 0;
}

In the above sample code, the Point structure is first defined to represent a two-dimensional point, and the Line structure is defined to represent a straight line parameter. Then, estimateLineRANSACthe RANSAC algorithm is implemented by the function to estimate the parameters of the straight line. In mainthe function, a set of sample points is generated and estimateLineRANSACthe function is called to estimate the parameters of the line. Finally, output the estimated line parameters.

The RANSAC algorithm in this sample code is used to estimate the parameters of a two-dimensional straight line. In practical applications, you can make corresponding modifications and adaptations according to specific problems and data types.

Please note that this is just a simple sample code, more parameter tuning and error handling may be required in real applications. In addition, the RANSAC algorithm in this example only considers the case of a single straight line, and the estimation of other models may need to be modified appropriately.

Code demo:

void features(Mat &img, vector<KeyPoint> &keypoints, Mat &descriptors)
{
    
    
	Ptr<ORB> orb = ORB::create(1000);
	orb->detectAndCompute(img, Mat(), keypoints, descriptors);


	//cv::Ptr<cv::SIFT> detector = cv::SIFT::create();
	//detector->detectAndCompute(img, cv::noArray(), keypoints, descriptors);
}

void ransac(vector<DMatch> matches, vector<KeyPoint> queryKeyPoint,
	vector<KeyPoint> trainKeyPoint, vector<DMatch> &matches_ransac)
{
    
    
	//定义保存匹配点坐标
	vector<Point2f> srcPoints(matches.size()), dstPoints(matches.size());
	//保存从关键点中提取到的匹配点对坐标
	for (int i = 0; i < matches.size(); ++i)
	{
    
    
		srcPoints[i] = queryKeyPoint[matches[i].queryIdx].pt;
		dstPoints[i] = trainKeyPoint[matches[i].trainIdx].pt;
	}

	//匹配点对进行RANSAC过滤
	vector<int> inliersMask(srcPoints.size());
	findHomography(srcPoints, dstPoints, RANSAC, 5, inliersMask);


	//Mat inliersMask;
	//findHomography(srcPoints, dstPoints, RANSAC, 5, inliersMask);


	for(int i=0;i<inliersMask.size();++i)
	{
    
     
		if (inliersMask[i])
		{
    
    
			matches_ransac.push_back(matches[i]);
		}
	}

}

int main()
{
    
    
	Mat img1 = imread("box_in_scene.jpg");
	Mat img2 = imread("box.jpg");
	if (!(img1.data && img2.data))
	{
    
    
		cout << "图像打开失败。。。" << endl;
		return -1;
	}

	//提取ORB特征点和描述子
	vector<KeyPoint> keypoints1, keypoints2;
	Mat descriptors1, descriptors2;

	//计算特征点
	features(img1, keypoints1, descriptors1);
	features(img2, keypoints2, descriptors2);

	//特征点匹配
	vector<DMatch> matches;//定义存放匹配结果的变量
	BFMatcher matcher(NORM_HAMMING);//定义特征点匹配的类-暴力匹配,使用汉明距离
	matcher.match(descriptors1, descriptors2, matches);//进行特征点匹配




	//cv::Mat mask(descriptors1.rows, descriptors2.rows, CV_8U, cv::Scalar(0));
	//mask.row(0) = 255;
	//mask.at<uchar>(0, 0) = 0;
	//vector<vector<DMatch>> matches, matches1;//定义存放匹配结果的变量
	//BFMatcher matcher(NORM_HAMMING);//定义特征点匹配的类-暴力匹配,使用汉明距离
	//matcher.knnMatch(descriptors1, descriptors2,matches, 1000, mask,true);//进行特征点匹配
	//matcher.knnMatch(descriptors1, descriptors2, matches1, 1000, mask, false);//进行特征点匹配




	cout << "matches:" << matches.size() << endl;

	double min_dist = 1000, max_dist = 0;
	for (int i = 0; i < matches.size(); ++i)
	{
    
    
		double dist = matches[i].distance;
		if (dist < min_dist)
			min_dist = dist;
		else if (dist > min_dist)
			max_dist = dist;
	}

	//输出所有匹配结果中最大汉明距离和最小汉明距离
	cout << "min_dist:" << min_dist << endl;
	cout << "max_dist:" << max_dist << endl;


	vector<DMatch> good_matches;
	//将汉明距离最大的匹配点对删除
	for (int i = 0; i < matches.size(); ++i)
	{
    
    
		if (matches[i].distance <= max(2 * min_dist, 20.0))
		{
    
    
			good_matches.push_back(matches[i]);
		}
	}
	//剩余特征点数目
	cout << "good_min:" << good_matches.size() << endl;

	vector<DMatch> good_ransac;
	ransac(good_matches, keypoints1, keypoints2, good_ransac);

	//绘制匹配结果
	Mat outimg1, outimg2,outimg3;
	drawMatches(img1, keypoints1, img2, keypoints2, matches, outimg1);
	drawMatches(img1, keypoints1, img2, keypoints2, good_matches, outimg2);
	drawMatches(img1, keypoints1, img2, keypoints2, good_ransac, outimg3);

	namedWindow("未删选结果",WINDOW_NORMAL);
	namedWindow("最小汉明距离筛选", WINDOW_NORMAL);
	namedWindow("ransac筛选", WINDOW_NORMAL);
	imshow("未删选结果", outimg1);
	imshow("最小汉明距离筛选", outimg2);
	imshow("ransac筛选", outimg3);

	
	waitKey(0);
	

	return 0;
}

insert image description here

Monocular Camera Calibration

Monocular camera calibration implementation – Zhang Zhengyou calibration method

insert image description here

findChessboardCorners Calibration board corner point extraction – checkerboard corner point search

insert image description here

The function findChessboardCornersis a function in the OpenCV library that detects and locates the corners of the checkerboard in a given image. It is implemented in C++ and can be used for tasks such as camera calibration and pose estimation in computer vision and image processing applications. The following is a detailed explanation of the function:

bool findChessboardCorners(InputArray image,
                            Size patternSize,
                            OutputArray corners,
                            int flags = CALIB_CB_ADAPTIVE_THRESH + CALIB_CB_NORMALIZE_IMAGE
                            );

parameter:

  • image: Input image, usually a grayscale image (single channel).
  • patternSize: The number of rows and columns of corner points in the checkerboard. SizeThe in the object patternSize.widthindicates the number of columns and patternSize.heightthe number of rows.
  • corners: Output parameter, the coordinates of the detected corners. Point2fIt is a vector array of type containing the coordinates of the corner points .
  • flags(Optional): Additional flags to modify the behavior of the function. Can be a combination of the following flags:
    • CALIB_CB_ADAPTIVE_THRESH = 1: Use an adaptive threshold for corner detection.
    • CALIB_CB_NORMALIZE_IMAGE = 2: Normalize the input image.
    • CALIB_CB_FILTER_QUADS = 4: Use a square filter to filter corner points.
      In OpenCV's findChessboardCornersfunction, flagsthe argument can be a combination of the following flags:
  1. CALIB_CB_ADAPTIVE_THRESH = 1: Use an adaptive threshold for corner detection. This means that the function automatically adjusts the threshold based on the local pixel values ​​around each pixel to enhance the detection of corners.
  2. CALIB_CB_NORMALIZE_IMAGE = 2: Normalize the input image. By normalizing the image, the function can improve the stability of corner detection under different lighting conditions.
  3. CALIB_CB_FILTER_QUADS = 4: Use a square filter to filter corner points. This flag can be used to filter out corners that do not conform to the checkerboard feature, thereby improving the accuracy of corner detection.
  4. CALIB_CB_FAST_CHECK = 8: Use the quick check strategy. This flag can speed up corner detection at the expense of some accuracy.
  5. CALIB_CB_ADAPTIVE_THRESH + CALIB_CB_NORMALIZE_IMAGE: A combination of two flags using adaptive thresholding and normalizing images.
  6. CALIB_CB_ADAPTIVE_THRESH + CALIB_CB_FILTER_QUADS: Use a combination of the two flags Adaptive Threshold and Square Filter.
  7. CALIB_CB_NORMALIZE_IMAGE + CALIB_CB_FILTER_QUADS: Use a combination of the normalize image and square filter two flags.
  8. CALIB_CB_ADAPTIVE_THRESH + CALIB_CB_NORMALIZE_IMAGE + CALIB_CB_FILTER_QUADS: Use a combination of the three flags Adaptive Threshold, Normalize Image, and Square Filter.

These flags can be combined according to actual needs to adjust findChessboardCornersthe behavior of the function to obtain the best corner detection results.
return value:

  • Returns if the corners of the checkerboard were successfully found, trueotherwise false.

Function:
This function is used to detect the corner points of the checkerboard in the given image, and output its coordinates to cornersthe parameter. It uses a corner detection algorithm to find checkerboards in an image and returns the detection results.

Before using this function, it is usually necessary to preprocess the input image, such as converting to a grayscale image. Then, by providing the size of the checkerboard, the function will try to find a checkerboard with the corresponding size in the image.
If the function successfully finds the corners of the checkerboard, it stores the coordinates of those corners in cornersthe parameter and returns it true. Otherwise, it will return if no corner is found false.
When performing tasks such as camera calibration, multiple images are generally used for corner detection to improve the accuracy and stability of detection.

findCirclesGrid Calibration board corner point extraction – circle center search

insert image description here

findCirclesGridis a function in the OpenCV library that detects a grid of circles in a given image. It is commonly used in camera calibration and machine vision applications to detect circular checkerboards or other types of circular objects.

The following is a detailed explanation of the function:

bool findCirclesGrid(InputArray image, Size patternSize, OutputArray centers,
                     int flags = CALIB_CB_SYMMETRIC_GRID,
                     const Ptr<FeatureDetector>& blobDetector = SimpleBlobDetector::create(),
                     const Ptr<CirclesGridFinderParameters>& parameters = Ptr<CirclesGridFinderParameters>()
                     )

parameter:

  • image: The input image, which can be a grayscale image or a color image.
  • patternSize: The desired circle grid size, that is, the number of rows and columns of the circle.
  • centers: Output parameter, the two-dimensional point vector of the detected circle center position. Each element is an object containing the coordinates of a detected circle center Point2f.
  • flags: Flag parameter to specify the grid type and other options. Can be one of the following flags:
    • CALIB_CB_SYMMETRIC_GRID: Detects symmetric circular meshes.
    • CALIB_CB_ASYMMETRIC_GRID: Detects asymmetric circular meshes.
  • parameters: Optional parameter, used to specify the parameters of the circle grid detection. Can be empty, default parameters will be used.
  • blobDetector: Optional parameter to specify the feature detector used to detect circles. Can be empty, the default simple blob detector will be used.
    return value:
  • Returns if the circle grid was detected successfully, trueotherwise false.
    abnormal:
  • An exception may be thrown if the image is invalid or if a sufficient number of circles are not detected.
`flags` 可以是以下三个选项之一:

1. `CALIB_CB_SYMMETRIC_GRID`:该标志指定要检测的圆网格是对称的。在对称圆网格中,每个圆周围都有相同数量的圆。例如,棋盘格就是一种对称圆网格。
2. `CALIB_CB_ASYMMETRIC_GRID`:该标志指定要检测的圆网格是非对称的。在非对称圆网格中,每个圆周围的圆的数量可以不同。例如,在一些标定模式中,中心的圆周围可能没有圆。
3. `CALIB_CB_CLUSTERING`:该标志指定使用聚类算法进行圆网格的检测。聚类算法可以在噪声较大或检测到的圆点数量较少的情况下提供更稳定的检测结果。使用该标志时,应将圆网格图像转换为灰度图像。
示例用法:

```cpp
bool patternFound = findCirclesGrid(image, patternSize, centers, CALIB_CB_SYMMETRIC_GRID);

在上述示例中,我们使用 `CALIB_CB_SYMMETRIC_GRID` 标志来指定检测对称圆网格。你可以根据你的需求选择适当的标志。
`findCirclesGrid` 函数的第六个参数是 `blobDetector`,它是一个可选参数,用于指定用于检测圆的特征检测器。`blobDetector` 是一个 `FeatureDetector` 类型的指针,用于在图像中检测圆。

OpenCV 提供了几种特征检测器,例如 `SimpleBlobDetector`,可以用于检测圆形目标。你可以使用默认的 `SimpleBlobDetector`,也可以创建自定义的特征检测器。

示例用法:
Ptr<SimpleBlobDetector> blobDetector = SimpleBlobDetector::create();
bool patternFound = findCirclesGrid(image, patternSize, centers, CALIB_CB_SYMMETRIC_GRID, nullptr, blobDetector);

在上述示例中,我们使用默认的 `SimpleBlobDetector` 创建了一个特征检测器,并将其传递给 `findCirclesGrid` 函数。你还可以根据需要创建自定义的特征检测器,并将其传递给函数。

请注意,如果你不提供 `blobDetector` 参数,`findCirclesGrid` 函数将使用默认的简单 Blob 检测器。
`findCirclesGrid` 函数的第五个参数是 `parameters`,它是一个可选参数,用于指定圆网格检测的参数。`parameters` 是一个 `CirclesGridFinderParameters` 类型的指针,用于控制检测算法的行为。

`CirclesGridFinderParameters` 类提供了以下可用参数:
1. `CirclesGridFinderParameters::densityNeighborhoodSize`:密度邻域大小。用于设置圆网格点密度的邻域大小,以决定是否在该区域内检测到圆点。默认值为 11。
2. `CirclesGridFinderParameters::minDistanceToAddKeypoint`:最小添加关键点距离。在添加关键点时,该参数指定新关键点与已添加关键点之间的最小距离。默认值为 1.0。
3. `CirclesGridFinderParameters::keypointScale`:关键点尺度。用于设置关键点的尺度,以便在不同尺度的图像中检测圆点。默认值为 1.0。
4. `CirclesGridFinderParameters::minGraphConfidence`:最小图形置信度。用于设置图形(圆网格)在计算中的最小置信度。默认值为 0.85。
5. `CirclesGridFinderParameters::doCornerRefinement`:是否进行角点细化。设置为 `true` 以启用角点细化,默认为 `false`。
6. `CirclesGridFinderParameters::cornerRefinementWinSize`:角点细化窗口大小。用于设置角点细化的窗口大小,默认为 5。
7. `CirclesGridFinderParameters::cornerRefinementMaxIterations`:角点细化的最大迭代次数。用于设置角点细化的最大迭代次数,默认为 30。
8. `CirclesGridFinderParameters::cornerRefinementMinAccuracy`:角点细化的最小准确度。用于设置角点细化的最小准确度阈值,默认为 0.1。
9. `CirclesGridFinderParameters::useDiamondDetection`:是否使用钻石检测。设置为 `true` 以启用钻石检测算法,默认为 `false`。

示例用法:
Ptr<CirclesGridFinderParameters> parameters = CirclesGridFinderParameters::create();
parameters->densityNeighborhoodSize = 15;
parameters->doCornerRefinement = true;
bool patternFound = findCirclesGrid(image, patternSize, centers, CALIB_CB_SYMMETRIC_GRID, parameters);

在上述示例中,我们创建了一个 `CirclesGridFinderParameters` 对象,并设置了 `densityNeighborhoodSize` 和 `doCornerRefinement` 参数。然后,我们将该对象传递给 `findCirclesGrid` 函数,以便使用自定义参数执行圆网格的检测。你可以根据需要设置其他参数。

find4QuadCornerSubpis corner position optimization

insert image description here

find4QuadCornerSubpix()is a function in the OpenCV library to find the corners of a quadrilateral with sub-pixel accuracy in a given binary image. The C++ interface definition of this function is as follows:

void find4QuadCornerSubpix(InputArray img, InputOutputArray corners, Size region_size)

Parameter Description:

  • img: The input binary image. Must be a single-channel (grayscale) image of data type CV_8UC1.
  • corners: Input and output parameters, containing the pixel coordinates of the corner points of the quadrilateral. At input, an initial corner estimate must be provided, which will be updated to sub-pixel-level corner coordinates at output. This is a CV_32FC2matrix of type where each row contains (x, y)the coordinates of a corner point.
  • region_size: Defines the size of the subpixel search area. It is a Sizeobject that specifies the width and height of the search area.
    For example, Size(5, 5)indicates that the width and height of the search area are both 5 pixels. The size of the subpixel search area determines the range of pixels considered during the search. Larger search regions can provide more accurate sub-pixel corner estimation, but also increase the computational cost. Usually, the size of the search area should be adjusted according to the specific application scenario.

find4QuadCornerSubpix()The main role of the function is to correct the roughly estimated corner coordinates at the sub-pixel level. goodFeaturesToTrack()Typically, this function can be used to improve the accuracy of corner coordinates after obtaining an initial corner estimate using a corner detection algorithm such as .
The function works as follows:

  1. First, the function converts the input image to a floating-point image and calculates the gradients in the horizontal and vertical directions.
  2. For each initial corner estimate, the function iteratively searches on a sub-pixel level to find a more precise corner location. The size of the search area is region_sizedefined by the parameter.
  3. During the search, the function uses image gradients to estimate subpixel offsets for corner locations. This offset represents the sub-pixel offset from the initial corner position.
  4. The function iteratively applies subpixel offsets to update the corner positions until the maximum number of iterations is reached or until convergence.
  5. Eventually, the corrected corner coordinates will be stored in output parameters corners, which can be accessed after the function call.

drawChessboardCorners Draw the inner corners of the calibration board

insert image description here

drawChessboardCornersis a function in the OpenCV library for drawing corners on a checkerboard image. Its function prototype is as follows:

void drawChessboardCorners(InputOutputArray image, Size patternSize, InputArray corners, bool patternWasFound)

The parameters are explained as follows:

  • image: input/output image, must be an 8-bit color image.
  • patternSize: The number of interior corners in each dimension in the checkerboard image (for example, if the checkerboard is 7x6, it patternSizeshould be set to Size(7, 6)).
  • corners: The input array of corner points, usually findChessboardCornersthe result obtained through the function detection. It is a Point2fvector of type containing the detected corner coordinates.
  • patternWasFound: trueWhen : draw connecting lines, otherwise, set to false: no connecting lines.
    insert image description here

calibrateCamera camera calibration function

insert image description here

calibrateCamerais a function used for camera calibration in OpenCV. Camera calibration is the process of determining the intrinsic and extrinsic parameters of a camera for accurate measurement and analysis in images.

The function prototype is as follows:

double cv::calibrateCamera(
    InputArrayOfArrays objectPoints,   // 世界坐标系中的三维点
    InputArrayOfArrays imagePoints,    // 图像中的二维点
    Size imageSize,                    // 图像尺寸
    InputOutputArray cameraMatrix,     // 输出的相机内部参数矩阵
    InputOutputArray distCoeffs,       // 输出的畸变系数
    OutputArrayOfArrays rvecs,         // 输出的旋转矢量
    OutputArrayOfArrays tvecs,         // 输出的平移矢量
    int flags = 0,                     // 可选标志
    TermCriteria criteria = TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 30, DBL_EPSILON)              // 迭代终止条件
)

//double calibrateCamera(const vector<vector<Point3f>>& objectPoints,// 世界坐标系中的三维点
//                       const vector<vector<Point2f>>& imagePoints, // 图像中的二维点
//                       const Size& imageSize,                      // 图像尺寸
//                       Mat& cameraMatrix,                            // 输出的相机内部参数矩阵
//                       Mat& distCoeffs,                            // 输出的畸变系数
//                       vector<Mat>& rvecs,                        // 输出的旋转矢量
//                       vector<Mat>& tvecs,                        // 输出的平移矢量
//                       int flags = 0,                            // 可选标志
//                       TermCriteria criteria = TermCriteria(TermCriteria::COUNT + TermCriteria::EPS, 30, DBL_EPSILON) // 迭代终止条件
//                       );

parameter:

  • objectPoints: 3D point coordinates in object space. It is a vector<vector<Point3f>>parameter of type where each element is a Point3fvector containing points of type . Each Point3frepresents a point in the object coordinate system.
  • imagePointsobjectPoints: 2D point coordinates corresponding to in the image plane . It is a vector<vector<Point2f>>parameter of type where each element is a Point2fvector containing points of type . Each Point2frepresents a point on the image plane.
  • imageSize: The dimensions (width and height) of the input image, which can be Sizepassed using the type.
  • cameraMatrix: The internal parameter matrix of the camera. It is an Matoutput parameter of type that contains internal parameters of the camera, such as focal length, image origin, etc.
  • distCoeffs: Distortion coefficient of the camera. It is an Matoutput parameter of type that contains radial and tangential distortion coefficients.
  • rvecs: output parameter, containing the rotation vector of each image. It is a vector<Mat>parameter of type and each element is a rotation vector.
  • tvecs: output parameter, containing the translation vector of each image. It is a vector<Mat>parameter of type and each element is a translation vector.
  • flags: Additional options for calibration. It can be set using the following constants:
    • CALIB_USE_INTRINSIC_GUESS: Use the input's cameraMatrixinitial guess as an internal parameter.
    • CALIB_FIX_PRINCIPAL_POINT: Fix the position of the image origin.
    • CALIB_FIX_ASPECT_RATIO: The aspect ratio of the fixed focal length.
    • CALIB_ZERO_TANGENT_DIST: Sets the tangential distortion factor to zero.
    • CALIB_FIX_K1, CALIB_FIX_K2, CALIB_FIX_K3, CALIB_FIX_K4, CALIB_FIX_K5, CALIB_FIX_K6: Fixed radial distortion coefficients.
    • CALIB_RATIONAL_MODEL: Use a rational radial distortion model.
    • CALIB_THIN_PRISM_MODEL: Use a thin lens model for calibration.
    • CALIB_FIX_S1_S2_S3_S4: Fixed parameters for the thin lens model.
    • CALIB_FIX_INTRINSIC: Fixed internal parameters of the camera.
    • CALIB_FIX_FOCAL_LENGTH: Fixed focus.
       CALIB_NINTRINSIC          = 18,
       CALIB_USE_INTRINSIC_GUESS = 0x00001,
       CALIB_FIX_ASPECT_RATIO    = 0x00002,
       CALIB_FIX_PRINCIPAL_POINT = 0x00004,
       CALIB_ZERO_TANGENT_DIST   = 0x00008,
       CALIB_FIX_FOCAL_LENGTH    = 0x00010,
       CALIB_FIX_K1              = 0x00020,
       CALIB_FIX_K2              = 0x00040,
       CALIB_FIX_K3              = 0x00080,
       CALIB_FIX_K4              = 0x00800,
       CALIB_FIX_K5              = 0x01000,
       CALIB_FIX_K6              = 0x02000,
       CALIB_RATIONAL_MODEL      = 0x04000,
       CALIB_THIN_PRISM_MODEL    = 0x08000,
       CALIB_FIX_S1_S2_S3_S4     = 0x10000,
       CALIB_TILTED_MODEL        = 0x40000,
       CALIB_FIX_TAUX_TAUY       = 0x80000,
       CALIB_USE_QR              = 0x100000, //!< use QR instead of SVD decomposition for solving. Faster but potentially less precise
       CALIB_FIX_TANGENT_DIST    = 0x200000,
       // only for stereo
       CALIB_FIX_INTRINSIC       = 0x00100,
       CALIB_SAME_FOCAL_LENGTH   = 0x00200,
       // for stereo rectification
       CALIB_ZERO_DISPARITY      = 0x00400,
       CALIB_USE_LU              = (1 << 17), //!< use LU instead of SVD decomposition for solving. much faster but potentially less precise
       CALIB_USE_EXTRINSIC_GUESS = (1 << 22)  //!< for stereoCalibrate
当将 `flags` 参数设置为 0 时,表示没有额外的标志被应用,即标定过程将使用默认设置进行。这意味着相机的内部参数和畸变系数将在优化过程中被调整,以获得最佳的标定结果。默认情况下,所有的内部参数和畸变系数都是可调的,以便在标定过程中进行优化。
使用标志为 0 的默认设置时,`calibrateCamera` 函数将尝试估计相机的内部参数矩阵、畸变系数以及每个图像的旋转向量和平移向量。这将通过最小化重投影误差来完成,以使投影点与实际图像点之间的差异最小化。
当您不需要应用任何特定的约束或限制时,将 `flags` 参数设置为 0 是常见的做法。这样可以利用完整的标定过程,并得到相机的准确内部参数和畸变系数的估计结果。
  • criteria: The termination criterion for the iterative process. Can be passed using TermCriteriathe type, which includes the number of iterations and convergence criteria.
  • Return Value: This function returns a double representing the calibrated reprojection error.

When using calibrateCamerathe function for camera calibration, a set of known object point coordinates and corresponding image point coordinates need to be provided. These points can be measured using an object of known shape such as a checkerboard. By calibrating the points of multiple images, the function will estimate the internal parameter matrix of the camera, the distortion coefficient, and the rotation vector and translation vector of each image.
The process of camera calibration is to optimize the camera parameters by minimizing the reprojection error. The reprojection error refers to the distance between the projected point obtained by reprojecting the 3D point onto the image plane and the actual image point.
After using calibrateCamerathe function, the intrinsic parameters and distortion coefficients of the camera will be stored in cameraMatrixand distCoeffs, while the rotation vector and translation vector of each image will be stored in rvecsand tvecs.

Note: Before performing camera calibration, it is necessary to ensure that there is sufficient correspondence between the provided object points and image points, and that the order of these points is consistent.
calibrateCameraThe workflow of the function is as follows:

  1. Collect images for calibration and corresponding object point coordinates.
  2. Define a vector<vector<Point3f>>variable of type objectPointswhere each element is a Point3fvector of object points of type. These object points are known, usually obtained by measuring the actual object or using a calibration plate of known shape (such as a checkerboard).
  3. Define a vector<vector<Point2f>>variable of type imagePointswhere each element is a Point2fvector of image points of type. These image points are obtained by detecting feature points (such as checkerboard corners) in each image.
  4. Initialize the camera internal parameter matrix cameraMatrixand the distortion coefficient matrix distCoeffs.
  5. Call calibrateCamerathe function and pass objectPoints, imagePoints, image dimensions imageSize, cameraMatrix, distCoeffs, rvecsand tvecsas parameters.
  6. The function performs the calibration process, estimates the camera's intrinsic parameters and distortion coefficients, and computes the rotation and translation vectors for each image.
  7. Returns the scaled reprojection error.
  8. After the function is executed, cameraMatrixand distCoeffswill contain the internal parameters of the camera and distortion coefficients, rvecsand tvecswill contain the rotation vector and translation vector of each image.

Camera calibration is a complex process that requires sufficient data and accurate images of the calibration board. By accurately calibrating the camera, accurate measurements and calculations can be performed in the image, such as calculating the actual size of the object or performing 3D reconstruction and other applications.
After calling calibrateCamerathe function, you can use the returned result for further camera correction and image processing. Here are some possible next steps:

  1. Reprojection Error Analysis: By calculating the reprojection error of a calibration, you can evaluate the quality of the calibration. The reprojection error is to reproject the calibrated object points onto the image and calculate the difference from the actual image points. A lower reprojection error indicates a higher calibration accuracy.
  2. Camera rectification: Using the obtained camera intrinsic parameter matrix cameraMatrixand distortion coefficient matrix distCoeffs, you can rectify other images. By applying distortion correction, distortions in the image are removed, making the shape of the object closer to reality.
  3. 3D reconstruction: Using calibrated camera parameters, you can convert 2D points in the image to 3D points in the object coordinate system. By matching points from multiple images, 3D reconstruction and point cloud generation can be performed, enabling 3D measurement and scene reconstruction.
  4. Camera pose estimation: Using the resulting rotation vector rvecsand translation vector tvecs, you can estimate the pose (position and orientation) of the camera in different images. This is useful for applications such as vision-based navigation, pose estimation, and augmented reality.
  5. Video correction: If you have a video stream, you can apply camera corrections to each frame for real-time distortion correction and image processing.

These are calibrateCamerasome typical next steps after camera calibration using the function. The exact application depends on your needs and the image data being processed. Depending on the situation, you may need to further understand and use other functions and techniques in OpenCV to accomplish specific tasks.

    // 定义标定参数
    int calibration_flags =
        cv::CALIB_FIX_ASPECT_RATIO |
        cv::CALIB_FIX_PRINCIPAL_POINT |
        cv::CALIB_ZERO_TANGENT_DIST |
        cv::CALIB_FIX_FOCAL_LENGTH;

    // 进行相机标定
    cv::Mat camera_matrix, distortion_coeffs;
    std::vector<cv::Mat> rvecs, tvecs;
    double ret = cv::calibrateCamera(
        object_points, image_points, image.size(), camera_matrix, distortion_coeffs,
        rvecs, tvecs, calibration_flags
    );

projectPoints model projection – projected to image coordinates according to world coordinates according to internal parameters and distortion coefficients

insert image description here

projectPointsis a function in OpenCV for projecting a 3D point onto a 2D plane. It is mainly used to calculate the position of a point in the camera projection model.

The following is projectPointsa detailed explanation and example of the function:

void cv::projectPoints(
    InputArray objectPoints, // 输入的三维点坐标,类型为CV_32FC3或CV_64FC3
    InputArray rvec,         // 旋转向量(旋转矩阵的罗德里格斯表示),类型为CV_32F或CV_64F,大小为3x1
    InputArray tvec,         // 平移向量,类型为CV_32F或CV_64F,大小为3x1
    InputArray cameraMatrix, // 相机内参数矩阵,类型为CV_32F或CV_64F,大小为3x3
    InputArray distCoeffs,   // 畸变系数,类型为CV_32F或CV_64F,大小为4x1或5x1
    OutputArray imagePoints, // 输出的二维点坐标,类型为CV_32FC2或CV_64FC2
    OutputArray jacobian = noArray(), // 可选参数,输出的Jacobi矩阵
    double aspectRatio = 0  // 可选参数,焦距的y和x方向的比例因子
)
  • objectPointsis an input parameter representing the point coordinates in 3D space. Its type can be CV_32FC3or CV_64FC3, i.e. each point has 3 float or double precision float components.
  • rvecIs the rotation vector, representing the rotation transformation from the camera coordinate system to the target coordinate system. It can be of type CV_32For CV_64Fand has a size of 3x1.
  • tvecIs the translation vector, representing the translation transformation from the camera coordinate system to the target coordinate system. It can be of type CV_32For CV_64Fand has a size of 3x1.
  • cameraMatrixis the camera intrinsic parameter matrix, representing the intrinsic parameters of the camera, such as focal length and principal point. It can be of type CV_32For CV_64Fand has a size of 3x3.
  • distCoeffsis the distortion coefficient, which is used to correct the distortion in the image. It can be of type CV_32For CV_64Fand of size 4x1 or 5x1.
  • imagePointsis the output parameter representing the point coordinates projected onto the 2D image plane. Its type can be CV_32FC2or CV_64FC2, i.e. each point has 2 float or double precision float components.
  • jacobianis an optional output parameter representing the Jacobi matrix used to compute the reverse derivative.
  • aspectRatiois an optional parameter that adjusts the scaling factor in the y and x directions of the focal length. The default value is 0, which means that the scale factor is not considered.

projectPointsThe role of the function is to project the 3D point coordinates from the world coordinate system to the 2D point coordinates on the camera image plane. It uses the camera's intrinsic parameter matrix, distortion coefficients, rotation vector, and translation vector to perform projection calculations.

undistort de-distortion function

insert image description here

OpenCVThe library provides undistortfunctions for undistorting images. The purpose of this function is to convert the distorted image into an undistorted image according to the internal parameters of the camera and the distortion coefficient.

The following is undistortthe C++ interface of the function:

void cv::undistort(
    InputArray src,       // 输入图像
    OutputArray dst,      // 输出图像
    InputArray cameraMatrix,    // 相机内部参数矩阵
    InputArray distCoeffs,      // 畸变系数
    InputArray newCameraMatrix = noArray()  // 新的相机内部参数矩阵
)

A detailed description of these parameters follows:

  1. src: Input image, which can be a single-channel or multi-channel image.

  2. dst: The output image, with the same size and type as the input image.

  3. cameraMatrix: The internal parameter matrix of the camera, usually obtained by camera calibration. It is a 3x3 matrix of floats.

  4. distCoeffs: Distortion coefficient, usually obtained through camera calibration. It is a 1xN or Nx1 floating-point vector, where N represents the number of distortion coefficients.

  5. newCameraMatrix(optional): new camera intrinsic parameter matrix. If a new matrix of camera intrinsics is provided, the function will use it to reproject the image, resulting in a new perspective. If no new camera intrinsic parameter matrix is ​​provided, the function will use the default parameters, keeping the original view angle of the image.

The function converts pixel locations in the input image to undistorted pixel locations according to the camera intrinsics and distortion coefficients, and stores the result in the output image.

Note that undistortthe function only removes the radial distortion of the lens, not the tangential distortion. If the image has tangential distortion, it is necessary to calibrate the camera first, and then use other functions (such as initUndistortRectifyMapand remap) to perform more complex distortion correction operations.

insert image description here

solvePnP pose estimation function – estimates rotation and translation vectors

insert image description here

solvePnPis a function in OpenCV for solving the pose problem of the camera. It can calculate the rotation and translation vectors of the camera based on known 3D points and corresponding 2D image points, so as to determine the position and orientation of the camera in the world coordinate system.

The following is a detailed description of the solvePnP function:

bool solvePnP(InputArray objectPoints, InputArray imagePoints, InputArray cameraMatrix,
              InputArray distCoeffs, OutputArray rvec, OutputArray tvec, bool useExtrinsicGuess = false, 
              int flags = SOLVEPNP_ITERATIVE);

Parameter Description:

  • objectPoints: The coordinates of the 3D point, which can be a single Mat object or a std::vector<Point3f>.
  • imagePoints: The coordinates of the corresponding 2D image point, which can be a single Mat object or a std::vector<Point2f>.
  • cameraMatrix: The internal reference matrix of the camera, a 3x3 floating-point matrix.
  • distCoeffs: The distortion coefficient of the camera, which can be a single Mat object or a std::vector<double>. Input vector of distortion coefficients (k1,k2,p1,p2[,k3[,k4,k5,k6[,s1,s2,s3,s4[,τx,τy]]]]) 4, 5, 8, 12 or 14 elements. If the vector is empty, the distortion coefficients are assumed to be zero.
  • rvec: The output rotation vector, a 3x1 floating-point matrix.
  • tvec: The output translation vector, a 3x1 floating-point matrix.
  • useExtrinsicGuess: Whether to use externally estimated rotation and translation vectors as initial values, defaults to false.
  • flags: The method to solve the PnP problem, it can be SOLVEPNP_ITERATIVE or SOLVEPNP_EPNP, the default is SOLVEPNP_ITERATIVE.
  • The return value of the function is bool type, indicating whether the solution is successful.

solvePnPThe working principle of the function is to map the coordinates of 3D points and the coordinates of corresponding 2D image points to the normalized camera coordinate system (homogeneous coordinates), and then calculate the rotation vector and translation vector of the camera using methods such as iteration or EPnP. These vectors can be further transformed into rotation matrices and world coordinates of the camera center.

Precautions:

  • objectPointsThe number imagePointsof sums must be the same, and at least 4 pairs of matching points are required.
  • cameraMatrixThe and distCoeffsparameters must match the actual camera parameters used, otherwise the calculation results may be inaccurate.
  • flagsThe choice of parameters depends on the specific needs and scenarios, and in general SOLVEPNP_ITERATIVE(the default value) can already meet most of the needs.

When using functions in OpenCV solvePnP, flagsparameters are used to specify methods and options for solving PnP problems. Here is a detailed overview of these flag parameters:

  1. SOLVEPNP_ITERATIVE: Solve the PnP problem using an iterative method. This is the default option and is suitable for most situations. It strikes a balance between computational accuracy and computational efficiency.
  2. SOLVEPNP_EPNP: Use the EPnP method to solve the PnP problem. EPnP is a non-iterative method that improves computational efficiency by using additional camera intrinsic matrix information. It performs well in camera pose recovery and is generally faster than iterative methods.
  3. SOLVEPNP_P3P: Use the P3P method to solve the PnP problem. P3P is a four-point based algorithm that requires matching pairs of at least four points. It is very efficient in calculation accuracy and calculation speed, but has a limit on the number of point pairs.
  4. SOLVEPNP_DLS: Use the DLS (Damped Least Squares) method to solve the PnP problem. DLS is a numerically stable method that handles singular cases of the problem by introducing a damping term. It performs well in the presence of measurement errors or multiple solutions.
  5. SOLVEPNP_UPNP: Solve the PnP problem using the UPnP method. UPnP is a method based on nonlinear optimization, which can handle large viewing angle changes and large number of point pairs. It has a good performance in calculation accuracy and calculation efficiency.
  6. SOLVEPNP_AP3P: Use the AP3P (Absolute Pose Problem with Planar Targets) method to solve the PnP problem. AP3P is an improved P3P algorithm, especially suitable for the problem of pose estimation of planar targets.
  7. SOLVEPNP_IPPE: Use the IPPE (Inverse Perspective-n-Point) method to solve the PnP problem. IPPE is an iterative based method to estimate camera pose by minimizing backprojection error. It is suitable for close range applications and where the camera's field of view is small.
  8. SOLVEPNP_IPPE_SQUARE: Solve the PnP problem using the IPPE method and assuming the target has a square shape. This option is suitable for pose estimation of square targets and provides higher computational accuracy.
  9. SOLVEPNP_SQPNP: Use the SQPnP (Sequential Quadratic Programming for PnP) method to solve the PnP problem. SQPnP is a numerical optimization method that estimates the camera pose by iteratively solving a nonlinear minimization problem. It is able to handle complex scenes and a large number of point pairs, and is robust to noise and mismatches. However, it is usually more time-consuming than other methods because of the iterative optimization required.
  10. SOLVEPNP_MAX_COUNT: Solve the PnP problem using the maximum count method. This method limits the number of iterations to the specified maximum count value, which can be used to control computation time.

These flag parameters can be selected according to specific requirements and scenarios. In general, SOLVEPNP_ITERATIVEis a reasonable default choice, which provides good computational accuracy and efficiency in most cases. Other methods such as EPnP, P3P, DLS, UPnP, AP3P, IPPE, etc. have better performance or applicability for specific problems.

Code demo:


int main()
{
    
    
	//读取所有图像
	vector<Mat> imgs;
	string imgName;
	ifstream fin("calibdata1.txt");
	while (getline(fin, imgName))
	{
    
    
		Mat img = imread(imgName);
		if (img.empty())
		{
    
    
			cout << "图像打开失败。。。" << endl;
			return -1;
		}
		imgs.push_back(img);
	}

	//定义角点大小和图像上的角点一致
	Size boardSize = Size(8, 6);
	vector<vector<Point2f>> imgsPoints;
	//获取n张图像像素角点
	for (int i = 0; i < imgs.size(); ++i)
	{
    
    
		Mat img1 = imgs[i];
		Mat gray1;
		cvtColor(img1, gray1, COLOR_BGR2GRAY);
		vector<Point2f> img1Points;
		//计算棋盘格角点
		bool patternFound = findChessboardCorners(gray1, boardSize, img1Points);

		//计算圆心
		//bool patternFound = findCirclesGrid(gray1, boardSize, img1Points);
		cout << patternFound << endl;
		//优化角点,精确到亚像素
		find4QuadCornerSubpix(gray1, img1Points, Size(5, 5));

		//绘制角点
		bool isOK = true;
		drawChessboardCorners(img1, boardSize, img1Points, isOK);
		cout << "isOK:"<< isOK << endl;
		namedWindow("img1", WINDOW_NORMAL);
		imshow("img1", img1);
		waitKey(0);



		imgsPoints.push_back(img1Points);
	}


	//生成棋盘格每个内角点三维坐标
	Size squareSize = Size(15, 15);
	vector<vector<Point3f>> objectPoints;

	vector<Point3f> tempPoints;
	for (int j = 0; j < boardSize.height; ++j)
	{
    
    
		for (int k = 0; k < boardSize.width; ++k)
		{
    
    
			Point3f realPoint;
			//假设标定板为世界坐标系的z平面,即z=0
			realPoint.x = k * squareSize.width;
			realPoint.y = j * squareSize.height;
			realPoint.z = 0;
			tempPoints.push_back(realPoint);
		}
	}
	for (int i = 0; i < imgsPoints.size(); ++i)
	{
    
    
		objectPoints.push_back(tempPoints);
	}

	//图像尺寸
	Size imageSize(imgs[0].cols,imgs[0].rows);

	//相机内参 , 畸变系数
	Mat cameraMatrix , distCoeffs;

	//每张图形的旋转向量
	vector<Mat> rvecs;
	//每张图像的平移向量
	vector<Mat> tvecs;
	double reprojError = calibrateCamera(objectPoints, imgsPoints, imageSize, cameraMatrix, distCoeffs, rvecs, tvecs, 0);

	cout << "重投影误差:" << reprojError << endl;
	cout << "相机内参矩阵:" << endl << cameraMatrix << endl;
	cout << "相机畸变系数:" << endl << distCoeffs << endl;
	cout << "旋转向量:" << endl;
	for (int i = 0; i < rvecs.size(); ++i)
	{
    
    
		cout << rvecs[i] << endl;
	}
	cout << "平移向量:" << endl;
	for (int i = 0; i < tvecs.size(); ++i)
	{
    
    
		cout << tvecs[i] << endl;
	}

	vector<Point2f> imgPoints;
	Mat Jac;
	//模型投影 -- 根据世界坐标 根据 内参和畸变系数 投影到图像坐标
	projectPoints(tempPoints, rvecs[0], tvecs[0], cameraMatrix, distCoeffs, imgPoints, Jac);

	//cout << "雅可比矩阵:" << endl << Jac << endl;

	//根据内参和畸变系数对图像去畸变
	for (int i = 0; i < imgs.size(); ++i)
	{
    
    
		Mat srcImg = imgs[i];
		Mat dstImg;
		undistort(srcImg, dstImg, cameraMatrix, distCoeffs);

		//显示每一个图像
		string srcS = "srcImg" + to_string(i);
		string dstS = "dstImg" + to_string(i);
		namedWindow(srcS, WINDOW_NORMAL);
		namedWindow(dstS, WINDOW_NORMAL);
		imshow(srcS, srcImg);
		imshow(dstS, dstImg);
	}


	//求世界坐标和当前图像坐标的相机的位姿
	//旋转向量
	Mat rvecs1;
	//平移向量
	Mat tvecs1;
	solvePnP(objectPoints[0], imgsPoints[0], cameraMatrix, distCoeffs, rvecs1, tvecs1);
	
	//solvePnP(objectPoints[0], imgsPoints[0], cameraMatrix, distCoeffs, rvecs[0], tvecs[0], true);
	cout << "单个旋转向量:" << rvecs1<<endl;
	cout << "单个平移向量:" << tvecs1 << endl;
	waitKey(0);
	

	return 0;
}

insert image description here
insert image description here

    重投影误差:0.446279
    相机内参矩阵:
    [1380.047335439607, 0, 585.4103520906031;
     0, 1379.989355210238, 475.9677047375;
     0, 0, 1]
    相机畸变系数:
    [0.2036744384549596, -0.508938131816573, -0.01576823311387853, -0.02936405545649798, 0.4166020102554809]
    旋转向量:
    [-0.09129932211443254;
     0.2662418807908897;
     -0.01557101796559691]
    
    平移向量:
    [137.7743795996079;
     -34.53398471523743;
     652.9504462173998]

    单个旋转向量:[-0.09127154819176657;
     0.266243519517998;
     -0.01557500307480123]
    单个平移向量:[137.7742898976761;
     -34.53403038353336;
     652.9500732855978]

moving object detection

Introduction to Optical Flow

insert image description here insert image description here

insert image description hereinsert image description here

difference method

insert image description here

The difference method (Differencing Method) is a commonly used image motion detection method, which detects motion in an image by calculating the difference between the current frame and the previous frame. The following is the general process of detecting image movement by difference method:

  1. Reading video frames or image sequences: First, you need to read video files or image sequences as input data. You can use the functions in the OpenCV library to read the video, eg cv::VideoCapture.
  2. Convert to grayscale image: For each image frame, it is usually converted to a grayscale image. A function can be used cv::cvtColorto convert a color image to a grayscale image.
  3. Difference calculation: Calculate the difference between the current frame and the previous frame. The function can be used cv::absdiffto calculate the difference image between two grayscale images.
  4. Thresholding: Apply a thresholding operation to the difference image to separate moving regions from stationary regions in the image. Functions can be used cv::thresholdto perform thresholding operations.
  5. Motion region extraction: For the thresholded image, the connected components of the motion region can be further extracted through morphological operations (such as dilation and erosion). It can be used cv::dilatefor expansion operation and cv::erodecorrosion operation.
  6. Draw bounding boxes or mark motion regions: Depending on your needs, you can draw bounding boxes or mark motion regions on the raw image to visualize detected motion. This can be achieved using OpenCV's drawing functions such as cv::rectangle.
  7. Loop processing: For video sequences, the above steps are repeated until all frames are processed.

absdiff difference absolute value function

insert image description here

absdiffis a function in the OpenCV library that calculates the difference between two images. It subtracts corresponding pixel values ​​of two input images and returns a new image where each pixel value represents the absolute value of the difference between the pixel values ​​of the two input images at corresponding locations.
The following is absdiffa detailed description of the function:

void absdiff(InputArray src1, InputArray src2, OutputArray dst);

Parameter Description:

  • src1: The first input image, which can be a single-channel or multi-channel image, and the data type can be CV_8U, CV_16U, CV_16S, CV_32For CV_64F.

  • src2: The second input image src1must have the same size and number of channels as .

  • dst: The output image has the same size and number of channels as the input image, and the data type is CV_8U.
    Precautions:

  • The input images must have the same dimensions and number of channels, otherwise an error will be raised.

  • The data type of the output image CV_8Uis unsigned 8-bit integer, which is to ensure that the pixel value of the output image can represent the absolute value of the difference.

Here is a simple sample code that demonstrates the basic flow of detecting image movement using the difference method:

int main()
{
    
    
	
	VideoCapture capture("vtest.avi");
	Mat prevFrame, prevGray;
	if (!capture.read(prevFrame))
	{
    
    
		cout << "请确认视频文件名称是否正确" << endl;
		return -1;
	}

	int fps = capture.get(CAP_PROP_FPS);
	int width = capture.get(CAP_PROP_FRAME_WIDTH);
	int height = capture.get(CAP_PROP_FRAME_HEIGHT);
	int num_of_frames = capture.get(CAP_PROP_FRAME_COUNT);

	cout << "fps:" << fps << endl;
	cout << "视频宽度:" << width << endl;
	cout << "视频高度:" << height << endl;
	cout << "视频总帧数:" << num_of_frames << endl;

	capture.read(prevFrame);
	cvtColor(prevFrame, prevGray, COLOR_BGR2GRAY);

	GaussianBlur(prevGray, prevGray, Size(0, 0), 15);

	Mat binary;
	Mat frame, gray;
	//形态学操作的矩形模板
	Mat k = getStructuringElement(MORPH_RECT, Size(7, 7));
	

	while (capture.read(frame)) //读取完结束
	{
    
    
		cv::cvtColor(frame, gray, cv::COLOR_BGR2GRAY);
		GaussianBlur(gray, gray, Size(0, 0), 15);
		//计算当前帧与前一帧的差值的绝对值
		cv::absdiff(prevGray, gray, binary);

		//二值化 --   使用大津法自动阈值
		double th = cv::threshold(binary, binary, 0, 255, cv::THRESH_BINARY | THRESH_OTSU);
		
		cout << "th:" << th << endl;
		//并进行开运算 - 消除小的噪点
		morphologyEx(binary, binary, MORPH_OPEN, k);


		cv::imshow("input", frame);
		cv::imshow("result", binary);
		cv::waitKey(0);

		prevFrame = gray;
	}

	waitKey(0);
	
	return 0;
}

calcOpticalFlowFarneback dense optical flow method function

insert image description here

calcOpticalFlowFarnebackis a function in the OpenCV library for computing optical flow. Optical flow is a pattern that describes the motion of objects in an image, and the motion of objects is estimated by analyzing the pixel intensity changes between adjacent frames.

The following is calcOpticalFlowFarnebacka detailed explanation of the function:

void calcOpticalFlowFarneback(InputArray prev, 
                            InputArray next, 
                            InputOutputArray flow,
                            double pyr_scale,
                            int levels, 
                            int winsize, 
                            int iterations, 
                            int poly_n,
                            double poly_sigma,
                            int flags);

Parameter description :

  • prev: The input previous frame image (single-channel grayscale image).
  • next: The image of the next input frame (the image corresponding to the previous frame, which is also a single-channel grayscale image).
  • flow: Calculated optical flow (float vector image with two channels).
  • pyr_scale: The scaling factor of the image pyramid, used to build the image pyramid. Usually set to 0.5.
  • levels: The number of layers of the image pyramid.
  • winsize: The size of the window in the optical flow algorithm. Larger windows can handle greater motion, but may result in a loss of motion detail. In general, the value is 13-21.
  • iterations: The number of iterations in each pyramid layer.
  • poly_n: The order of the polynomial expansion used to expand the pixel neighborhood. Commonly used values ​​are 5 or 7.
  • poly_sigma: Standard deviation of Gaussian function in polynomial expansion. Usually set to 1.1.
  • flags: Additional calculation flags, which can be a combination of the following values:
    • OPTFLOW_USE_INITIAL_FLOW: Use flowparameters as initial optical flow.
    • OPTFLOW_FARNEBACK_GAUSSIAN: Use Gaussian filtering to smooth the image.

Function description :

calcOpticalFlowFarnebackThe function uses the algorithm based on Gunnar Farneback to calculate the optical flow. It utilizes image pyramid and polynomial expansion to estimate the motion vector from each pixel in the previous frame image to the corresponding position in the next frame image.
The basic idea of ​​optical flow calculation is that in two temporally adjacent frames of images, the pixels on the same object will be displaced in space. This function takes the previous frame image and the next frame image as input, and estimates the displacement vector of each pixel by calculating the intensity change of the pixel.
The result of the optical flow calculation is a two-channel floating-point image, where the displacement vector of each pixel is stored in the pixel value at the corresponding position. These displacement vectors can be further processed as desired, e.g. to visualize motion continued above:

cartToPolar The coordinates in the Cartesian coordinate system are converted to the polar coordinate system

insert image description hereinsert image description here

cv::cartToPolaris a function in OpenCV to convert coordinates in Cartesian coordinate system to polar coordinate system. Its function prototype is as follows:

void cv::cartToPolar(
    InputArray x,
    InputArray y,
    OutputArray magnitude,
    OutputArray angle,
    bool angleInDegrees = false
)

Function parameters:

  • x: The input x-coordinate array, of type InputArray. Can be a single-channel array of floats (for example CV_32FC1) or a two-channel array of complex numbers (for example CV_32FC2).
  • y: The input y-coordinate array, of type InputArray, xhas the same requirements as .
  • magnitude: Vector magnitude (also known as modulus or magnitude) in polar coordinates of the output, of type OutputArray. Has the same size and type as the input array.
  • angle: Vector angle (also known as phase angle) in polar coordinates of the output, of type OutputArray. Has the same size and type as the input array.
  • angleInDegrees: A Boolean value indicating whether the angle is expressed in degrees (true) or radians (false). The default is false, which means angles are expressed in radians.

This function treats the input array of x and y coordinates as pixel positions in the image and converts them to vector magnitudes and angles in polar coordinates. If the input array is complex, the function uses the real part as the x-coordinate and the imaginary part as the y-coordinate.

Code demo:


int main()
{
    
    
	
	VideoCapture capture("vtest.avi");
	Mat prevFrame, prevGray;
	if (!capture.read(prevFrame))
	{
    
    
		cout << "请确认视频文件名称是否正确" << endl;
		return -1;
	}

	//将彩色图像转换成灰度图像
	cvtColor(prevFrame, prevGray, COLOR_BGR2GRAY);
	while (1) 
	{
    
    
		Mat nextFrame, nextGray;
		//所有图像处理完成后退出程序
		if (!capture.read(nextFrame))
		{
    
    
			break;
		}
		imshow("视频图像", nextFrame);
		//计算稠密光流
		cvtColor(nextFrame, nextGray, COLOR_BGR2GRAY);

		//Mat_是个模板类继承Mat  每个元素是Point2f类型
		//Mat_<Point2f> flow;//两个方向的运动速度

		//双通道
		Mat flow;
		calcOpticalFlowFarneback(prevGray, nextGray, flow, 0.5, 3, 15, 3, 5, 1.2, 0);

		vector<Mat> xyV;
		split(flow, xyV);
		//x方向移动速度
		Mat xV = xyV[0];
		//y方向移动速度
		Mat yV = xyV[1];

		x方向移动速度
		//Mat xV = Mat::zeros(prevFrame.size(), CV_32FC1);
		y方向移动速度
		//Mat yV = Mat::zeros(prevFrame.size(), CV_32FC1);

		提取两个方向的速度
		//for (int row = 0; row < flow.rows; ++row)
		//{
    
    
		//	for (int col = 0; col < flow.cols; ++col)
		//	{
    
    
		//		const Point2f& flow_xy = flow.at<Point2f>(row, col);
		//		xV.at<float>(row, col) = flow_xy.x;
		//		yV.at<float>(row, col) = flow_xy.y;
		//	}
		//}

		//计算向量角度和赋值
		Mat magnitude, angle;
		//笛卡尔坐标转极坐标
		cartToPolar(xV, yV, magnitude, angle,true);
		//对于HSV颜色空间中的色调(H),OpenCV使用范围为0到179,而不是0到360。
		//在将HSV颜色转换为BGR颜色时,需要将H值除以2。
		angle /= 2.0;

		//cartToPolar(xV, yV, magnitude, angle, false);
		//angle = angle * 180 / CV_PI / 2.0;
		
		//把幅值归一化到0-255,便于显示结果
		normalize(magnitude, magnitude, 0, 255, NORM_MINMAX);

		//计算角度和幅值的绝对值 输出类型CV_8U整数
		convertScaleAbs(magnitude, magnitude);
		convertScaleAbs(angle, angle);

		//运动的幅值和角度生成HSV颜色空间的图像
		Mat HSV = Mat::zeros(prevFrame.size(), prevFrame.type());
		vector<Mat> result;
		split(HSV, result);

		//角度决定颜色
		result[0] = angle;
		//饱和度设置最深
		//在OpenCV中,当我们使用8位无符号整数(`CV_8U`)来表示像素值时,通常将其缩放到0到255范围内。
		//这是因为8位无符号整数的范围是0到255,所以将0到1的浮点数映射到这个范围内。
		result[1] = Scalar(255);
		//用幅值(模长)来充当亮度
		result[2] = magnitude;

		merge(result, HSV);

		Mat rgbImg;
		cvtColor(HSV, rgbImg, COLOR_HSV2BGR);

		imshow("运动检测结果", rgbImg);
		int ch = waitKey(0);
		if (ch == 27)
		{
    
    
			break;
		}
	}

	waitKey(0);
	
	return 0;
}

insert image description here

calcOpticalFlowPyrLK sparse optical flow function

insert image description here

calcOpticalFlowPyrLKis a function in OpenCV for computing optical flow in an image. Optical flow is a method of describing the motion pattern of pixels in an image between successive frames. calcOpticalFlowPyrLKOptical flow is estimated using the pyramidal Lucas-Kanade method.

The following is calcOpticalFlowPyrLKa detailed explanation of the function, including the parameters and usage of the function.

cv::calcOpticalFlowPyrLK(
    cv::InputArray prevImg,             // 先前的图像帧
    cv::InputArray nextImg,             // 下一个图像帧
    cv::InputArray prevPts,             // 先前图像中的特征点
    cv::InputOutputArray nextPts,       // 下一个图像中的特征点(输出)
    cv::OutputArray status,              // 跟踪状态(输出)
    cv::OutputArray err,                 // 跟踪误差(输出)
    cv::Size winSize = cv::Size(21, 21), // 窗口大小
    int maxLevel = 3,                    // 金字塔层数
    cv::TermCriteria criteria = cv::TermCriteria(
        cv::TermCriteria::COUNT + cv::TermCriteria::EPS, 30, 0.01),
    int flags = 0,                       // 额外标志
    double minEigThreshold = 1e-4        // 最小特征值阈值
);

Parameter explanation:

  • prevImg: The previous image frame, usually a grayscale image.
  • nextImg: The next image frame, prevImgcorresponding to .
  • prevPts: The feature points in the previous image, usually goodFeaturesToTrackthe corner points extracted using the etc. function.
  • nextPts: The feature point in the next image, the function will output the position of the tracked feature point in this parameter.
  • status: The output status array, indicating the tracking status of the feature points. If the feature point is tracked successfully, the corresponding status value is 1, otherwise it is 0.
  • err: The output tracking error array, representing the tracking error of each feature point.
  • winSize: The window size used to calculate the optical flow.
  • maxLevel: The number of layers of the pyramid, used for multi-scale optical flow calculation.
  • criteria: Iteration termination criterion, used to control the stopping condition of the algorithm.
  • flags: Additional flag parameter that can optionally be used cv::OPTFLOW_USE_INITIAL_FLOWto use prevPtsas the initial tracking position.
- 在`calcOpticalFlowPyrLK`函数中,`flags`是一个额外的标志参数,用于控制光流计算的行为。该参数可以使用以下常量进行设置:
- `cv::OPTFLOW_USE_INITIAL_FLOW`:使用`prevPts`作为初始跟踪位置。如果将该标志设置为`flags`参数的值,函数将使用`prevPts`作为初始特征点位置,并计算它们在`nextImg`中的新位置。默认情况下,该标志为0,表示不使用初始流。
- `cv::OPTFLOW_LK_GET_MIN_EIGENVALS`:计算特征点的最小特征值。如果将该标志设置为`flags`参数的值,函数将计算特征点的最小特征值并存储在`err`参数中。默认情况下,该标志为0,表示不计算最小特征值。
可以通过按位或运算符`|`将多个标志组合在一起。例如,如果想同时使用初始流和计算最小特征值,可以将`flags`设置为`cv::OPTFLOW_USE_INITIAL_FLOW | cv::OPTFLOW_LK_GET_MIN_EIGENVALS`。
  • minEigThreshold: The minimum eigenvalue threshold, used to judge whether the feature points are reliable.

nextPtsThe number of outputs does not necessarily prevPtsmatch the number of . It depends on whether the algorithm can successfully track all previous feature points when calculating the optical flow.

  1. Successful tracking: The feature point has found the corresponding position in the next image, and the calcOpticalFlowPyrLKcorresponding feature point position in the next image will be stored in , and the value in nextPtsthe corresponding state array is 1. statusThis means that the feature points are successfully tracked to the next image.
  2. Unsuccessful tracking: The feature point does not find the corresponding position in the next image, and the corresponding position in calcOpticalFlowPyrLKwill be set to an invalid value (may be NaN or other default value), and the value in nextPtsthe corresponding state array is 0. statusThis indicates that the feature points were not successfully tracked to the next image.

Code demo:


//绘制所有直线
void  draw_lines(Mat& image, vector<Point2f> pt1, vector<Point2f> pt2)
{
    
    
	RNG rng(1008);
	vector<Scalar> color_lut;
	if (color_lut.size() < pt1.size())
	{
    
    
		for (int t = 0; t < pt1.size(); ++t)
		{
    
    
			color_lut.push_back(Scalar(rng.uniform(0, 255), rng.uniform(0, 255), rng.uniform(0, 255)));

		}
	}
	for (int t = 0; t < pt1.size(); ++t)
	{
    
    
		arrowedLine(image, pt1[t], pt2[t], color_lut[t], 2, 8, 0);
	}
}
int main()
{
    
    
	
	VideoCapture capture("vtest.avi");
	Mat prevFrame, prevGray;
	if (!capture.read(prevFrame))
	{
    
    
		cout << "请确认视频文件名称是否正确" << endl;
		return -1;
	}

	int fps = capture.get(CAP_PROP_FPS);
	int width = capture.get(CAP_PROP_FRAME_WIDTH);
	int height = capture.get(CAP_PROP_FRAME_HEIGHT);
	int num_of_frames = capture.get(CAP_PROP_FRAME_COUNT);

	cout << "fps:" << fps << endl;
	cout << "视频宽度:" << width << endl;
	cout << "视频高度:" << height << endl;
	cout << "视频总帧数:" << num_of_frames << endl;

	cvtColor(prevFrame, prevGray, COLOR_BGR2GRAY);


	//角点检测
	vector<Point2f> points;
	goodFeaturesToTrack(prevGray, points, 2000, 0.01, 10);


	//稀疏光流检测相关参数设置
	vector<Point2f> prevPts;//前一帧图像角点坐标
	vector<Point2f> nextPts;//当前帧图像角点坐标
	vector<uchar> status;//角点检测到的状态
	vector<float> err;

	//初始状态的角点
	vector<Point2f> initPoints;
	initPoints.insert(initPoints.end(), points.begin(), points.end());

	prevPts.insert(prevPts.end(), points.begin(), points.end());

	while (true)
	{
    
    
		Mat nextFrame, nextGray;
		if (!capture.read(nextFrame))
		{
    
    
			break;
		}
		imshow("nextFrame", nextFrame);

		cvtColor(nextFrame, nextGray, COLOR_BGR2GRAY);
		//光流跟踪
		calcOpticalFlowPyrLK(prevGray, nextGray, prevPts, nextPts, status, err);

		size_t i, k;
		for (i = k = 0; i < nextPts.size(); ++i)
		{
    
    
			//距离与状态测量
			double dist = abs(prevPts[i].x - nextPts[i].x) + abs(prevPts[i].y - nextPts[i].y);

			//cout << "status:" << to_string( status[i] )<< "  " << nextPts[i].x << nextPts[i].y << endl;
			//筛选出有效特征点
			if (status[i] && dist > 2)
			{
    
    
				
				prevPts[k] = prevPts[i];
				initPoints[k] = initPoints[i];
				nextPts[k] = nextPts[i];
				++k;
				circle(nextFrame, nextPts[i], 3, Scalar(0, 255, 0), -1, 8);

			}
		}

		//重置数组大小  ---  更新移动角点数目
		nextPts.resize(k);
		prevPts.resize(k);
		initPoints.resize(k);

		draw_lines(nextFrame, initPoints, nextPts);
		imshow("result", nextFrame);

		char c = waitKey(50);
		if (c == 27)
		{
    
    
			break;
		}

		//更新角点坐标和前一帧图像
		std::swap(nextPts, prevPts);
		nextGray.copyTo(prevGray);

		//如果角点数目少于300,就重新检测角点
		if (initPoints.size() < 300)
		{
    
    
			goodFeaturesToTrack(prevGray, points, 2000, 0.01, 10);
			initPoints.insert(initPoints.end(), points.begin(), points.end());

			prevPts.insert(prevPts.end(), points.begin(), points.end());
			printf("total feature points:%d\n", prevPts.size());

		}
	}

	
	return 0;
}

insert image description here

Supervised Learning Algorithms

TraninData::create training data storage class – used by other learning classes

insert image description here

In OpenCV, cv::ml::TrainData::createis a static function used to create cv::ml::TrainDataobjects. It provides an easy way to initialize and organize datasets for training machine learning models. createThe signature of the function is as follows:

cv::Ptr<cv::ml::TrainData> cv::ml::TrainData::create(
                                        const cv::Mat& samples, 
                                        int layout, 
                                        const cv::Mat& responses, 
                                        const cv::Mat& varIdx = cv::Mat(), 
                                        const cv::Mat& sampleIdx = cv::Mat(), 
                                        const cv::Mat& sampleWeights = cv::Mat(),
                                        const cv::Mat& varType = cv::Mat());

The parameters and meanings accepted by this function TrainDataare the same as those of the class constructor, and the details are as follows:

  • samples: matrix containing training samples. Each row represents a sample, and each column represents a feature. The data type must be CV_32F.
  • layout: Specifies samplesthe layout of the matrix, which can be cv::ml::ROW_SAMPLEor cv::ml::COL_SAMPLE. cv::ml::ROW_SAMPLEIndicates that each row is a sample, cv::ml::COL_SAMPLEand that each column is a sample.
  • responses: A matrix containing the response values ​​corresponding to the training samples. Each row or column corresponds to a sample response. The data type is CV_32For CV_32S.
  • varIdx: Optional parameter specifying the feature index to use. It is a 1-row or 1-column matrix of integers, where each element is an index of a sample feature. The data type is CV_32S.
  • sampleIdx: Optional parameter specifying the sample index to use. It is a matrix of integers with 1 row or 1 column, each element is the index of the sample. The data type is CV_32S.
  • sampleWeights: Optional parameter specifying the weight of each sample. It is a 1-row or 1-column matrix of floating-point numbers, each element corresponding to the weight of the sample. The data type is CV_32F.
  • varType: Optional parameter specifying the type of each feature. It is a 1-row or 1-column matrix of integers, each element corresponding to the type of feature.

cv::ml::TrainData::createThe function returns a cv::ml::TrainDatasmart pointer ( cv::Ptr<cv::ml::TrainData>) to the object.

cv::ml::StatModel::train training function

insert image description here

cv::ml::StatModel::trainThe function is a function of machine learning model training in OpenCV, which is used to train statistical models. The full definition of this function is as follows:

bool cv::ml::StatModel::train(const cv::Ptr<cv::ml::TrainData>& trainData, int flags = 0)

The function accepts two parameters:

  1. trainData: Training data, usually a cv::Ptr<cv::ml::TrainData>type of object, which contains information required for training such as sample data and label data.
  2. flags: Training flag, used to set some options and parameters in the training process.

The function returns a boolean indicating whether the training was successful or not.

`flags`参数是用于设置训练过程中的选项和标志的整数参数。以下是一些常用的`flags`选项:

- `cv::ml::StatModel::RAW_OUTPUT`:用于返回原始输出,而不是进行概率估计。对于某些模型,可以通过设置此标志来获取原始输出。
- `cv::ml::StatModel::UPDATE_MODEL`:用于更新现有模型,而不是重新训练。如果你已经有一个训练好的模型,并希望使用新的数据来更新模型,可以使用此标志。
- `cv::ml::StatModel::COMPRESSED_INPUT`:用于指示输入数据是否压缩。当输入数据较大且内存有限时,可以使用此标志以减少内存消耗。
- `cv::ml::StatModel::PREPROCESSED_INPUT`:用于指示输入数据是否已经进行了预处理。如果输入数据已经进行了特征提取或其他预处理步骤,可以使用此标志来跳过预处理步骤。
- `cv::ml::StatModel::UPDATE_MODEL`:用于更新现有模型,而不是重新训练。如果你已经有一个训练好的模型,并希望使用新的数据来更新模型,可以使用此标志。
这些标志可以根据具体的模型和训练需求进行组合使用。你可以通过按位或(`|`)操作符将多个标志组合在一起,例如:`int flags = cv::ml::StatModel::RAW_OUTPUT | cv::ml::StatModel::UPDATE_MODEL`。

另外,如果不需要使用任何标志选项,可以将`flags`参数设置为默认值0。
bool cv::ml::StatModel::train(cv::InputArray samples, int layout, cv::InputArray responses)

The function accepts three parameters:

  1. samples: Input sample data, usually a cv::Mattype of two-dimensional array, each row represents a sample, and each column represents a feature of the sample. The data type must be CV_32F.
  2. layout: Data layout parameter, indicating the layout of the input sample data. There are two commonly used layouts:
    • cv::ml::ROW_SAMPLE: Each row represents a sample.
    • cv::ml::COL_SAMPLE: Each column represents a sample.
  3. responses: Response variable data, usually a cv::Matone-dimensional array of type, containing the output or label associated with each sample. The data type is CV_32For CV_32S.

The function returns a boolean indicating whether the training was successful or not.

cv::ml::StatModel::predict prediction function

insert image description here

cv::ml::StatModel::predict is the prediction function of the machine learning model in OpenCV, which is used to predict the input samples. The details of this function are as follows:

float StatModel::predict(InputArray samples, OutputArray results, int flags = 0) const;

parameter:

  • samples: The input sample data, of type cv::InputArray. Can be a sample (single input sample) or a sample set (multiple input samples). The data type must be CV_32F.
  • results: The output prediction result, the type is cv::OutputArray. For a single input sample, the result is a float; for a sample set, the result is a vector of floats.
  • flags: Predicted flag, optional parameter. It can affect the behavior of predictions and the format of output results. The default value is 0, which means no special flags.
`flags`参数是用于设置训练过程中的选项和标志的整数参数。以下是一些常用的`flags`选项:

- `cv::ml::StatModel::RAW_OUTPUT`:用于返回原始输出,而不是进行概率估计。对于某些模型,可以通过设置此标志来获取原始输出。
- `cv::ml::StatModel::UPDATE_MODEL`:用于更新现有模型,而不是重新训练。如果你已经有一个训练好的模型,并希望使用新的数据来更新模型,可以使用此标志。
- `cv::ml::StatModel::COMPRESSED_INPUT`:用于指示输入数据是否压缩。当输入数据较大且内存有限时,可以使用此标志以减少内存消耗。
- `cv::ml::StatModel::PREPROCESSED_INPUT`:用于指示输入数据是否已经进行了预处理。如果输入数据已经进行了特征提取或其他预处理步骤,可以使用此标志来跳过预处理步骤。
- `cv::ml::StatModel::UPDATE_MODEL`:用于更新现有模型,而不是重新训练。如果你已经有一个训练好的模型,并希望使用新的数据来更新模型,可以使用此标志。
这些标志可以根据具体的模型和训练需求进行组合使用。你可以通过按位或(`|`)操作符将多个标志组合在一起,例如:`int flags = cv::ml::StatModel::RAW_OUTPUT | cv::ml::StatModel::UPDATE_MODEL`。

另外,如果不需要使用任何标志选项,可以将`flags`参数设置为默认值0。

return value:

  • float: Confidence or distance of the prediction result. The meaning of the specific return value depends on the machine learning model used and the predicted flag.

cv::Algorithm::save is used to save the state of the algorithm object to a file

cv::Algorithm::saveis a function in OpenCV that saves the state of an algorithm object to a file. It is one of the member functions of the base class in OpenCV cv::Algorithm.

void cv::Algorithm::save(const cv::String& filename) const;

parameter:

  • filename: The path and name of the saved file.

Function description:

  • cv::Algorithm::savefunction to save the state of an algorithm object to a file so that it can be restored at some point in the future.
  • The saved file contains all parameters and internal state of the algorithm object.
  • Saved files use OpenCV-specific XML or YAML formats.

cv::Algorithm::load loads an algorithm model from a file and returns a smart pointer to the loaded object

In OpenCV, cv::Algorithm::loadthe function is used to load the algorithm model from a file and return a smart pointer to the loaded object.

The prototype of the function is as follows:

cv::Ptr<cv::Algorithm> cv::Algorithm::load(const cv::String& filename, const cv::String& objname = cv::String())

Parameter Description:

  • filename: The path and name of the file to load.
  • objname(Optional): The name of the object to load. For most cases, this parameter can be ignored. In some specific cases, such as when the file contains multiple models, this parameter can be used to specify the object to load.
当使用`cv::Algorithm::load`函数加载包含多个对象的文件时,可以使用`objname`参数来指定要加载的对象的名称。这在以下情况下特别有用:
假设你有一个XML文件,其中包含多个对象,如人脸检测器和眼睛检测器。你可以使用`objname`参数来指定要加载的对象。以下是一个示例:

cv::Ptr<cv::CascadeClassifier> faceDetector = cv::Algorithm::load<cv::CascadeClassifier>("path/to/model.xml", "face");
if (faceDetector.empty())
{
    // 加载人脸检测器失败,处理错误
}
else
{
    // 人脸检测器加载成功,可以使用它进行人脸检测
}

cv::Ptr<cv::CascadeClassifier> eyeDetector = cv::Algorithm::load<cv::CascadeClassifier>("path/to/model.xml", "eye");
if (eyeDetector.empty())
{
    // 加载眼睛检测器失败,处理错误
}
else
{
    // 眼睛检测器加载成功,可以使用它进行眼睛检测
}


在这个示例中,我们假设XML文件中有两个对象:一个是人脸检测器,另一个是眼睛检测器。我们通过在`cv::Algorithm::load`函数中使用`objname`参数来指定要加载的对象的名称。对于人脸检测器,我们将`objname`设置为"face",对于眼睛检测器,我们将`objname`设置为"eye"。通过这种方式,我们可以选择性地加载文件中的特定对象,并将它们赋值给适当的变量进行进一步的操作。
请注意,具体的`objname`值取决于文件中对象的命名。你需要查看文件中对象的名称,然后在`cv::Algorithm::load`函数中提供正确的`objname`值来加载所需的对象。
对于YAML文件的加载,objname参数在大多数情况下是不需要的,因为YAML文件通常只包含一个对象的信息。objname参数主要在加载包含多个对象的文件时使用,例如保存了多个算法模型的YAML文件。

This function returns a smart pointer ( ) to the loaded algorithm object cv::Ptr<cv::Algorithm>. Returns a null pointer if the load fails or the specified object cannot be found in the file.

Ptr<SVM> svm = Algorithm::load<SVM>("my_svm_model.xml");

Introduction to K-Nearest Neighbor Principle

insert image description here

cv::ml::KNearest::findNearest Find labels of K neighbors — K nearest neighbors (supervised learning)

insert image description here

cv::ml::KNearest::findNearestIt is one of the member functions of the K-Nearest Neighbors classifier in OpenCV. It is used to classify a given input sample and returns the labels of the closest K neighbors.

Here is a detailed explanation of the function:

float cv::ml::KNearest::findNearest(
    InputArray samples,
    int k,
    OutputArray results = noArray(),
    OutputArray neighborResponses = noArray(),
    OutputArray dist = noArray()
)

Parameter Description:

  • samples: input sample. Can be a single sample or a matrix containing multiple samples. Each sample is a matrix of floats containing eigenvectors.
  • k: Specifies the number of nearest neighbors to find.
  • results: optional parameter, used to store the output matrix of the classification result. If the result is not required, it can be passed noArray().
  • neighborResponses: optional parameter to store the output matrix of response labels for each nearest neighbor. May be passed if the nearest neighbor's response label is not desired noArray().
  • dist: optional parameter, an output matrix storing the distance between each nearest neighbor and the input samples. If distance information is not required, it can be passed noArray().

return value:

  • float: If resultsthe parameter is not empty, the function returns the error of the classification result; otherwise, it returns a negative number.

The basic flow for using cv::ml::KNearest::findNearestis as follows:

  1. Create a K-Nearest Neighbor classifier object and use trainthe function to train it, setting the training data and labels.
  2. Prepare input sample data for classification.
  3. Call findNearestthe function, passing the input sample and other optional parameters.
  4. Obtain the classification results, response labels of nearest neighbors, and distance information from the output parameters as needed.
knn->setDefaultK(K);  // 设置K值,即选择K个最近邻居进行投票
knn->setIsClassifier(true);  // 设置为分类器
knn->setAlgorithmType(cv::ml::KNearest::BRUTE_FORCE);  // 设置算法类型
BRUTE_FORCE和KDTREE是KNearest算法中的两种不同的算法类型。它们在计算最近邻居时使用不同的策略和数据结构。
1. BRUTE_FORCE(暴力法):
BRUTE_FORCE算法类型是KNearest的默认算法类型。它采用了一种简单直接的方法,在计算最近邻居时遍历训练数据集中的所有样本,并计算它们与待分类数据之间的距离。然后根据距离选择最近的K个邻居进行投票决策。这种算法类型适用于数据集较小的情况,但对于大规模数据集,它的计算效率较低。

2. KDTREE(KD树):
KDTREE算法类型采用了一种基于树结构的方法来加速最近邻居的搜索。它将训练数据集构建成一个KD树(也称为K维树),其中每个节点代表一个样本,根据样本在特征空间中的划分选择合适的分割超平面。在进行最近邻搜索时,通过比较待分类数据与KD树节点的划分超平面的距离,可以有效地减少搜索空间,加速搜索过程。KDTREE适用于高维数据集和大规模数据集,它的计算效率相对较高。

在使用KNearest算法时,你可以根据数据集的规模和维度选择适合的算法类型。对于小型数据集或低维数据,使用默认的BRUTE_FORCE算法即可。对于大型数据集或高维数据,使用KDTREE算法能够提供更高的搜索效率。

Introduction to the principle of support vector machine

Support vector machines (support vector machines, SVM) is a kind of 二分类模型, its basic model is a linear classifier with the largest interval defined in the feature space, the largest interval makes it different from the perceptron; SVM also includes kernel techniques, which make It becomes essentially a non-linear classifier. The learning strategy of SVM is to maximize the interval, which can be formalized as a problem of solving convex quadratic programming, which is also equivalent to the problem of minimizing the regularized hinge loss function. The learning algorithm of SVM is the optimization algorithm for solving convex quadratic programming.说白了就是在一个空间中,怎么把空间中的数据一分为二。
insert image description here
insert image description here

    //SVM类型
	svm1->setType(cv::ml::SVM::C_SVC);
	//内核模型
	svm1->setKernel(cv::ml::SVM::CHI2);

cv::ml::SVM::Types

  1. cv::ml::SVM::C_SVC

    • C-Support Vector Classification (C-SVC) is a common support vector machine classification algorithm.
    • It divides the training data into different categories by finding the optimal hyperplane.
    • C-SVC uses the regularization parameter C to control the degree of penalty for misclassification. The larger the value of C, the lower the tolerance of the model to misclassification.
  2. cv::ml::SVM::NU_SVC

    • Nu-Support Vector Classification (Nu-SVC) is also a support vector machine classification algorithm.
    • It uses a parameter nu to control the choice of support vectors.
    • Compared with C-SVC, Nu-SVC is more adaptive in the selection of support vectors, which can deal with unbalanced data sets to a certain extent.
  3. cv::ml::SVM::ONE_CLASS

    • One-Class SVM is a support vector machine algorithm for anomaly detection or single-class classification problems.
    • Its goal is to confine the data to a hyperplane that separates normal samples from abnormal ones.
    • One-Class SVM does not depend on class labels, but only focuses on the distribution of data.
  4. cv::ml::SVM::EPS_SVR

    • ε-Support Vector Regression (ε-SVR) is a support vector machine regression algorithm.
    • It approximates the output value of the training data by finding the optimal hyperplane.
    • ε-SVR uses a regularization parameter C and a fault tolerance parameter ε to control the complexity and fault tolerance of the model.
  5. cv::ml::SVM::NU_SVR

    • Nu-Support Vector Regression (Nu-SVR) is also a support vector machine regression algorithm.
    • It uses the parameter nu to control the choice of support vectors.
    • Compared with ε-SVR, Nu-SVR is more adaptive in the selection of support vectors.
      These different types represent different variants and application scenarios of SVMs in classification and regression tasks. According to the specific problem requirements and data characteristics, choosing the appropriate type can optimize the performance and generalization ability of the model.

cv::ml::SVM::KernelTypes

insert image description here insert image description here

  1. Linear kernel function ( cv::ml::SVM::KernelTypes::LINEAR):

    • The linear kernel function is the simplest kernel function.
    • It implements a linear map in feature space, suitable for linearly separable problems.
    • Linear kernel functions work better with high-dimensional data.
    • The mathematical formula of the linear kernel function is: K(x, y) = x^T * y, where x and y are the feature vectors of the input samples, respectively, and ^T represents the transpose operation.
  2. Polynomial kernel function ( cv::ml::SVM::KernelTypes::POLY):

    • A polynomial kernel function maps data to a high-dimensional feature space via a polynomial map.
    • It can handle nonlinear problems, but the degree of polynomial needs to be set by parameters.
    • Polynomial kernel functions perform well in some nonlinear problems, but are prone to overfitting at too high an order.
    • The mathematical formula of the polynomial kernel function is: K(x, y) = (gamma * x^T * y + coef0)^degree, where x and y are the feature vectors of the input samples, gamma is the kernel parameter, and coef0 is the constant term , degree is the order of the polynomial.
  3. Radial basis function (RBF) kernel function ( cv::ml::SVM::KernelTypes::RBF):

    • The RBF kernel function is a commonly used nonlinear kernel function.
    • It handles nonlinear problems by mapping data into an infinite-dimensional feature space.
    • The RBF kernel function has a smooth decision boundary and can be applied to complex classification problems.
    • An important parameter of the RBF kernel function is gamma, which is used to control the flexibility of the decision boundary.
    • The mathematical formula of the RBF kernel function is: K(x, y) = exp(-gamma * ||x - y||^2), where x and y are the feature vectors of the input samples, gamma is the kernel parameter, || .|| represents the Euclidean norm.
  4. Sigmoid kernel function ( cv::ml::SVM::KernelTypes::SIGMOID):

    • The Sigmoid kernel function is a nonlinear kernel function.
    • It simulates the Sigmoid activation function in neural networks.
    • The sigmoid kernel function may be useful in some specific problems, but in general it tends not to perform as well as other kernel functions.
    • The mathematical formula of the Sigmoid kernel function is: K(x, y) = tanh(gamma * x^T * y + coef0), where x and y are the feature vectors of the input samples, gamma is the kernel parameter, and coef0 is the constant term. tanh stands for the hyperbolic tangent function.
  5. Chi-square kernel function ( cv::ml::SVM::KernelTypes::CHI2):

    • The chi-square kernel function is based on the chi-square statistic and is often used for classification problems with histogram data.
    • Mathematical formula: K(x, y) = exp(-gamma * D(x, y))
    • where x and y are the feature vectors of the input samples, gamma is the parameter of the kernel function, and D(x, y) is the chi-square statistic.
  6. Interpolation kernel function ( cv::ml::SVM::KernelTypes::INTER):

    • The interpolation kernel function is a custom kernel function type used in the cross-training strategy of SVM.

Code demo:


int main()
{
    
    

	system("color F0");
	{
    
    
		Mat img = imread("digits.png");
		Mat gray;
		cvtColor(img, gray, COLOR_BGR2GRAY);
		//分割为5000个cells  20×20
		Mat images = Mat::zeros(5000, 400, CV_8UC1);
		Mat labels = Mat::zeros(5000, 1, CV_8UC1);

		int index = 0;
		for (int row = 0; row < 50; row++)
		{
    
    
			//从图像中分割出20×20的图像作为独立数字图像
			int label = row / 5;
			int datay = row * 20;
			for (int col = 0; col < 100; col++)
			{
    
    
				int datax = col * 20;

				//Mat number = Mat::zeros(Size(20, 20), CV_8UC1);
				//for (int x = 0; x < 20; x++)
				//{
    
    
				//	for (int y = 0; y < 20; y++)
				//	{
    
    
				//		number.at<uchar>(x, y) = gray.at<uchar>(x + datay, y + datax);

				//	}
				//}


				//抠图 20*20
				Mat number = Mat(gray, Range(datay, datay + 20), Range(datax, datax + 20));
				Mat temp;
				number.copyTo(temp);
				//将二维图像数据转成行数据  1通道1行
				Mat tempRow = temp.reshape(1, 1);
				cout << "提取第" << index + 1 << "个数据" << endl;

				tempRow.copyTo(images(Range(index, index + 1), Range(0, 400)));

				labels.at<uchar>(index, 0) = label;
				++index;

			}
		}
		imwrite("所有数据按行排列结果.png", images);
		imwrite("标签.png", labels);

		//加载训练数据集
		images.convertTo(images, CV_32FC1);
		labels.convertTo(labels, CV_32SC1);
		Ptr<ml::TrainData>tdata = ml::TrainData::create(images, ml::ROW_SAMPLE, labels);

		//创建K近邻类
		Ptr<ml::KNearest> knn = KNearest::create();
		//每个类别拿出5个数据
		knn->setDefaultK(5);
		//进行分类
		knn->setIsClassifier(true);
		//设置算法 -KD树
		//knn->setAlgorithmType(cv::ml::KNearest::KDTREE);

		//训练数据
		//knn->train(images, cv::ml::ROW_SAMPLE, labels);
		knn->train(tdata);
		//保存训练结果
		knn->save("knn_model.yml");

		waitKey(0);



		//加载KNN分类器
		Mat data = imread("所有数据按行排列结果.png", IMREAD_ANYDEPTH);
		labels = imread("标签.png", IMREAD_ANYDEPTH);

		bool isss = data.isContinuous();
		data.convertTo(data, CV_32F);
		labels.convertTo(labels, CV_32SC1);

		knn = Algorithm::load<KNearest>("knn_model.yml");

		//查看分类结果
		Mat result;
		knn->findNearest(data, 5, result);

		//统计分类结果与真实结果相同的数目
		int count = 0;
		for (int row = 0; row < result.rows; row++)
		{
    
    
			if (labels.at<int>(row, 0) == result.at<float>(row, 0))
				++count;
		}
		//正确比率
		float rate = 1.0*count / result.rows;

		cout << "分类的正确性:" << rate << endl;

		//测试新图像是否能够识别数字
		Mat testImg1 = imread("handWrite01.png", IMREAD_GRAYSCALE);
		Mat testImg2 = imread("handWrite02.png", IMREAD_GRAYSCALE);

		imshow("testImg1", testImg1);
		imshow("testImg2", testImg2);
		//缩放到20*20的尺寸
		resize(testImg1, testImg1, Size(20, 20));
		resize(testImg2, testImg2, Size(20, 20));


		Mat testdata = Mat::zeros(2, 400, CV_8UC1);
		Rect rect;
		rect.x = 0;
		rect.y = 0;
		rect.height = 1;
		rect.width = 400;
		Mat oneData = testImg1.reshape(1, 1);
		Mat twoData = testImg2.reshape(1, 1);

		oneData.copyTo(testdata(rect));
		rect.y = 1;
		twoData.copyTo(testdata(rect));
		//数据类型转换
		testdata.convertTo(testdata, CV_32F);


		//进行估计识别
		Mat result2;
		float confidence = knn->findNearest(testdata, 5, result2);
		//查看预测结果 0 - 1.0,越高越好
		cout << "置信度:" << confidence << endl;
		int predict1 = result2.at<float>(0, 0);
		cout << "图像预测结果:" << predict1 << "  真实结果:" << 1 << endl;
		int predict2 = result2.at<float>(1, 0);
		cout << "图像预测结果:" << predict2 << "  真实结果:" << 2 << endl;

	}
	Mat samples, labls;
	FileStorage fread("point.yml", FileStorage::READ);
	fread["samples"] >> samples;
	fread["responses"] >> labls;
	fread.release();

	//不同种类坐标点拥有不同的颜色
	vector<Vec3b> colors;
	colors.push_back(Vec3b(0, 255, 0));
	colors.push_back(Vec3b(0, 0, 255));


	//创建空白图像用于显示坐标点
	Mat img(480, 640, CV_8UC3, Scalar(255, 255, 255));
	Mat img2;
	img.copyTo(img2);
	//在空白图像中绘制坐标点
	for (int i = 0; i < samples.rows; ++i)
	{
    
    
		Point2f point;
		point.x = samples.at<float>(i, 0);
		point.y = samples.at<float>(i, 1);
		Scalar color = colors[labls.at<int>(i, 0)];
		circle(img, point, 3, color, -1);
		circle(img2, point, 3, color, -1);
	}
	imshow("两类像素图像", img);

	// 创建SVM对象 --  解决二分类问题
	cv::Ptr<cv::ml::SVM> svm = cv::ml::SVM::create();

	// 设置参数
	//SVM类型
	svm->setType(cv::ml::SVM::C_SVC);
	//内核模型  -- CHI2(用于处理直方图类型数据)
	svm->setKernel(cv::ml::SVM::CHI2);
	//svm->setTermCriteria(TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 100, 0.01));
	//svm->setC(1);
	//svm->setGamma(0.50625);
	//svm->setDegree(3);

	// 训练SVM模型
	svm->train(TrainData::create(samples, ROW_SAMPLE, labls));


	// 保存模型
	svm->save("svm_model.yml");

	Ptr<SVM> svm1 = Algorithm::load<SVM>("svm_model.yml");
	// 设置参数
	//SVM类型
	svm1->setType(cv::ml::SVM::C_SVC);
	//内核模型
	svm1->setKernel(cv::ml::SVM::CHI2);

	//用模型对图像中全部像素点进行分类
	Mat imagePoint(1, 2, CV_32FC1);
	for (int y = 0; y < img2.rows; y += 2)
	{
    
    
		for (int x = 0; x < img2.cols; x += 2)
		{
    
    
			imagePoint.at<float>(0) = (float)x;
			imagePoint.at<float>(1) = (float)y;

			Mat result;  //保存0或1
			int colorIndex = (int)svm1->predict(imagePoint, result);
			img2.at<Vec3b>(y, x) = colors[(int)result.at<float>(0, 0)];
		}
	}
	imshow("图像所有像素点分类结果", img2);
	waitKey(0);

	return 0;
}

最近邻分类knn
insert image description here

支持向量机SVM
insert image description here

Unsupervised Learning Algorithms

聚类和分类的区别: 最大区别是:聚类是无监督的;分类是有监督学习。
insert image description here

Algorithm steps

The specific steps of the K-Means algorithm are as follows:

  1. First of all, we need to determine a k value (random), that is, we hope that the data will be clustered to obtain k different sets
  2. Randomly select K data points from a given dataset as centroids
  3. For each point in the data set, calculate its distance from each centroid (such as Euclidean distance); whichever centroid is closest to the data point, it is divided into the set to which the centroid belongs
  4. After the first round of grouping all the data into sets, there are a total of K sets, and then recalculate the centroid of each set
  5. If the distance between the newly calculated centroid and the original centroid is less than a certain set threshold, it means that the position of the recalculated centroid does not change much, and the data as a whole tends to be stable, or the data has converged. In such a case, we believe that the clustering effect has reached the desired result, and the algorithm can be terminated.
  6. Conversely, if the distance between the new centroid and the original centroid changes greatly, it is necessary to repeat steps 3-5 until the position does not change much and the convergence state is reached.
    insert image description here
    insert image description here

在K-Means算法中一般采用的是欧式距离

Advantages and disadvantages of the algorithm Advantages
1. The principle is very simple, and it is very easy to implement, and the algorithm converges quickly.
2. The clustering effect is excellent and the interpretability is strong. When the data finally converges, we can finally clearly see the effect of clustering.
3. Few constraints. The only parameter that needs to be controlled in the algorithm is the cluster number k. The best clustering effect can be obtained by continuously adjusting k

Disadvantage
1. The selection of k value is not easy to grasp. In many cases, the estimation of K value is very difficult, and sometimes it can be obtained through cross-validation.
2. The result obtained by the iterative method can only be a local optimal solution, but cannot obtain a global optimal solution.
3. Sensitive to noise and outliers. Outliers have a great influence on the determination of the centroid. Can be used to detect outliers.

kmeans K-means clustering algorithm

insert image description here

OpenCV is a popular computer vision library that provides various image processing and analysis functions. One of them is the k-means clustering algorithm, which can be used for image segmentation, feature extraction and other tasks. In C++, OpenCV provides a function kmeansto perform k-means clustering.

The following is kmeansa detailed explanation of the OpenCV function:

void cv::kmeans(
    InputArray data,                        // 输入数据,可以是N行M列的浮点型矩阵(N个样本,M维特征)
    int K,                                  // 聚类的数目
    InputOutputArray bestLabels,            // 输出的聚类标签
    TermCriteria criteria,                  // 算法终止的条件
    int attempts = 3,                       // 重复尝试次数,以获得最佳结果
    int flags = KMEANS_RANDOM_CENTERS,       // 初始化聚类中心的方法
    OutputArray centers = noArray()         // 输出的聚类中心
);

Parameter Description:

  • data: Input data, which can be a floating-point matrix with N rows and M columns, each row represents a sample, and each column represents a feature.
    Mat points(count, 2, CV_32F);
    Mat points(count, 1, CV_32FC2);
    Mat points(1, count, CV_32FC2);
    std::vector<cv::Point2f> points(sampleCount);
  • K: The number of clusters, that is, the number of expected cluster centers.
  • bestLabels: The output cluster label, an Nx1 integer matrix, indicating the cluster to which each sample belongs. The serial number starts from 0, such as K=3, then bestLabels are: 0, 1, 2.
  • criteria: The condition for the termination of the algorithm can be cv::TermCriteriaspecified through the structure. It is commonly used to control the number of iterations and convergence accuracy by setting cv::TermCriteria::MAX_ITERand . cv::TermCriteria::EPSAlgorithm termination criteria, i.e. maximum number of iterations and/or required precision. The precision is specified as criteria.epsilon. The algorithm stops once each cluster center moves less than criteria.epsilon in some iteration.
  • attempts: The number of repeated attempts, the algorithm will be executed multiple times and return the best result.

attemptsis kmeansan optional parameter to the function that specifies the number of repeated attempts to get the best result.

k-means算法的初始聚类中心对最终的聚类结果具有影响。为了得到更好的聚类结果,可以多次运行k-means算法,每次使用不同的初始聚类中心,并选择最佳的聚类结果。
参数`attempts`指定了重复尝试的次数,默认值为3。这意味着算法将运行3次,并返回其中最佳的聚类结果。可以根据实际情况适当调整这个参数的值。
在每次尝试中,算法使用不同的初始聚类中心。具体的初始聚类中心选择方法由参数`flags`指定,可以是`cv::KMEANS_RANDOM_CENTERS`(随机选取数据点)或`cv::KMEANS_PP_CENTERS`(k-means++算法选择)。
通过多次尝试并选择最佳结果,可以减少k-means算法受初始聚类中心选择的影响,提高聚类结果的准确性。但请注意,较大的`attempts`值会增加计算时间。因此,需要在时间和结果质量之间做出权衡。
  • flags: The method to initialize the cluster centers, which can be one of the following values:
`flags`是`kmeans`函数的一个可选参数,用于指定初始化聚类中心的方法。下面是`flags`参数的三个常用标志:

1. `cv::KMEANS_RANDOM_CENTERS`:
   - 值:0
   - 含义:使用随机选取的数据点作为初始聚类中心。
   - 说明:该方法是一种简单而快速的初始化聚类中心的方式,它随机选择输入数据中的K个样本作为初始聚类中心。

2. `cv::KMEANS_PP_CENTERS`:
   - 值:2
   - 含义:使用k-means++算法选择初始聚类中心。
   - 说明:k-means++算法通过迭代地选择与已选中聚类中心距离较远的数据点作为下一个聚类中心,从而更好地初始化聚类中心。这种方法通常能够得到比随机选取更好的聚类结果。

3. `cv::KMEANS_USE_INITIAL_LABELS`:
   - 值:1
   - 含义:使用提供的初始聚类标签来初始化聚类中心。
   - 说明:当使用该标志时,函数会根据提供的初始聚类标签计算每个聚类的中心,并将其作为初始聚类中心。这可以用于进一步优化聚类结果,尤其是在已经有一些关于聚类的先验知识或已经进行了一些预处理的情况下。
  • centers: The output cluster center, a floating-point matrix with K rows and M columns, representing the center point of each cluster.

The general steps to use kmeansa function are as follows:

  1. Prepare the input data and construct a floating-point matrix with N rows and M columns, where N is the number of samples and M is the number of features.
  2. Create an output matrix bestLabelsthat holds the cluster labels for each sample.
  3. Define the conditions under which the algorithm terminates, for example by setting the maximum number of iterations and the convergence accuracy.
  4. Call kmeansthe function and pass in parameters such as input data, number of clusters, cluster labels, and termination conditions.
  5. Check 6. Check the return value of the function to ensure that the clustering operation completed successfully. If the return value is negative, an error has occurred.
  6. If you need to get the cluster centers, you can create an output matrix centersand pass it into kmeansthe function as a parameter.
  7. bestLabelsThe results of the checksum centerscan be used to group samples according to the cluster labels and use the cluster centers for further analysis and processing.
int main()
{
    
    
	
	//生成一个500*500的图像用于显示特征点和分类结果
	Mat img(500, 500, CV_8UC3, Scalar(255, 255, 255));
	RNG rng(10000);

	//设置三种颜色
	Scalar colorLut[3]=
	{
    
    
		Scalar(0,0,255),
		Scalar(0,255,0),
		Scalar(255,0,0)
	};

	//设置三个点集,并且每个点集中点的数目随机
	int number = 3;
	int Points1 = rng.uniform(20, 200);
	int Points2 = rng.uniform(20, 200);
	int Points3 = rng.uniform(20, 200);
	cout << "Points1: " << Points1 << endl;
	cout << "Points2: " << Points2 << endl;
	cout << "Points3: " << Points3 << endl;
	Mat Points(Points1 + Points2 + Points3, 1, CV_32FC2);

	int i = 0;
	for (; i < Points1; ++i)
	{
    
    
		Point2f pts;
		pts.x = rng.uniform(100, 200);
		pts.y = rng.uniform(100, 200);
		Points.at<Point2f>(i, 0) = pts;
	}

	for (; i < Points1+Points2; ++i)
	{
    
    
		Point2f pts;
		pts.x = rng.uniform(300, 400);
		pts.y = rng.uniform(100, 300);
		Points.at<Point2f>(i, 0) = pts;
	}

	for (; i < Points1 + Points2+Points3; ++i)
	{
    
    
		Point2f pts;
		pts.x = rng.uniform(100, 200);
		pts.y = rng.uniform(390, 490);
		Points.at<Point2f>(i, 0) = pts;
	}
	//每个点所属的种类
	Mat labels;
	//每类点的中心位置坐标
	Mat centers;
	kmeans(Points, number, labels, TermCriteria(TermCriteria::EPS + TermCriteria::COUNT, 10, 0.1), 3, KMEANS_PP_CENTERS, centers);


	//根据分类为每个点设置不同的颜色
	img = Scalar::all(255);

	for (int i = 0; i < Points1 + Points2 + Points3; i++)
	{
    
    
		int index = labels.at<int>(i);
		Point point = Points.at<Point2f>(i);
		circle(img, point, 2, colorLut[index], -1, 4);

	}
	for (int i = 0; i < centers.rows; i++)
	{
    
    
		int x = centers.at<float>(i, 0);
		int y = centers.at<float>(i, 1);
		cout << "第" << i + 1 << "类的中心坐标:x=" << x << " y=" << y << endl;
		circle(img, Point(x, y), 50, colorLut[i], 1, LINE_AA);
		circle(img, Point(x, y), 5,Scalar(0,0,0), -1);
	}
	imshow("K均值聚类分类结果", img);

	
	waitKey(0);

	//kmeans 用于图像分割处理
	Mat img2 = imread("fly.jpg");
	if (!img2.data)
	{
    
    
		cout << "请确认图像文件是否输入正确";
		return -1;

	}
	//k这里选值  0-5
	Vec3b colorLut2[5] =
	{
    
    
		Vec3b(0,0,255),
		Vec3b(0,255,0),
		Vec3b(2550,0,0),
		Vec3b(0,255,255),
		Vec3b(255,0,255)
	};
	//图像尺寸,用于计算图像中像素点的数目
	int width = img2.cols;
	int  height = img2.rows;
	//初始化定义
	int sampleCount = width * height;
	 //将图像矩阵数据成每行数据的特征形式,用于k均值聚类处理
	Mat sample_data = img2.reshape(0, sampleCount);
	Mat data;
	sample_data.convertTo(data, CV_32F);
	//k均值聚类
	int number2 = 3;
	Mat labels2;
	kmeans(data,number2,labels2, TermCriteria(TermCriteria::EPS + TermCriteria::COUNT, 10, 0.1), 3, KMEANS_PP_CENTERS, centers);

	//图像显示分割结果
	Mat result = Mat::zeros(sample_data.size(), img2.type());

	for (int row = 0; row < height*width; row++)
	{
    
    
		int label = labels2.at<int>(row, 0);
		result.at<Vec3b>(row, 0) = colorLut2[label];
	}
	result = result.reshape(0, height);

	namedWindow("原图", WINDOW_NORMAL);
	imshow("原图", img2);

	namedWindow("kmeans", WINDOW_NORMAL);
	imshow("kmeans", result);
	waitKey(0);
	return 0;
}

insert image description here

deep neural network

cv::dnn::readNet read deep learning model

insert image description here

cv::dnn::readNetIs the function used to load the deep learning model in OpenCV. It can read a trained model from a file and return a network object that can be used for inference.

The function prototype is as follows:

cv::dnn::Net cv::dnn::readNet(
    const cv::String& model,
    const cv::String& config = "",
    const cv::String& framework = ""
);

Parameter Description:

  • model: Specifies the path to the model to load. Usually the model's weights file (eg .caffemodel, , .pb, .t7etc.).
  • config: Specifies the configuration file path for the model. For some frameworks, the structural information of the model may be stored separately in a configuration file, and this parameter is used to specify the path of the configuration file. For other frameworks, this can be an empty string.
  • framework: Specifies the framework used to train the model. Normally, OpenCV can automatically detect the frame type, so this parameter can be an empty string. Can be specified if the model is trained with Caffe or can "Caffe"be specified if the model is trained with TensorFlow "TensorFlow". Depending on the framework type, the function will use the corresponding backend to load the model.

-The return value is an cv::dnn::Netobject representing the loaded deep learning network model.

An example of loading a model using cv::dnn::readNetthe function is as follows:

cv::dnn::Net net = cv::dnn::readNet("model.weights", "model.cfg", "Caffe");

This will load the model.weightsand model.cfgfiles and build the deep learning network using the Caffe backend.

After loading the model, you can use cv::dnn::Netthe object for image inference, such as feeding images into the network through a forward pass, and getting the output of the network.

Please note that before using cv::dnn::readNetthe function, you need to make sure that OpenCV's deep learning module has been properly installed and configured.

cv::dnn::blobFromImage Image conversion to the input format accepted by the deep learning network (single image)

cv::dnn::blobFromImages(multiple images)

insert image description here
insert image description here

cv::dnn::blobFromImageIt is a function in the deep learning module ( dnn) in OpenCV, which is used to convert the image into the input format accepted by the deep learning network, namely blob (Binary Large Object). Its detailed explanation is as follows:


Mat cv::dnn::blobFromImage(
    InputArray image,           // 输入单个图像
    double scalefactor=1.0,     // 缩放因子
    const Size& size=Size(),    // 目标大小
    const Scalar& mean=Scalar(),// 均值
    bool swapRB=false,          // 交换红蓝通道
    bool crop=false ,            // 裁剪图像
    int ddepth = CV_32F           // 输出 blob 的深度
)

Mat cv::dnn::blobFromImages(
    InputArrayOfArrays  images,           // 输入n张图像
    double scalefactor=1.0,     // 缩放因子
    const Size& size=Size(),    // 目标大小
    const Scalar& mean=Scalar(),// 均值
    bool swapRB=false,          // 交换红蓝通道
    bool crop=false ,            // 裁剪图像
    int ddepth = CV_32F           // 输出 blob 的深度
)
  • image: Input image. Can be an object of cv::Mattype or cv::UMat, representing the raw image to be processed.
  • scalefactor: scaling factor. The input image is scaled by this factor, and the default is 1.0, which means no scaling. If you want to scale the image, you can set a value less than 1, for example, 0.5 means shrinking by half, and 2.0 means zooming in twice.
  • size: Target size. Specifying the target size of the image, this function will resize the input image to the specified size. If no target size is specified (default constructed cv::Size()), the function does not resize the image. . For most current state-of-the-art neural networks, this is 224×224, 227×227, or 299×299.
  • mean: This is the mean we want to subtract. It can be a triplet of R, G, and B means, or a value, which is subtracted for each channel. If performing mean subtraction, the channel order is R, G, B. If the input image channel order is B, G, R, then please make sure swapRB = True to exchange channels. If mean is not specified (default constructed cv::Scalar()), the function does not average the image.
  • swapRB: Swap red and blue channels. By default, OpenCV thinks that the order of image channels is B, G, R, and the order of subtracting the mean value is R, G, B. In order to solve this contradiction, set swapRB=True.
  • crop: Crop the image. By default, the function will center crop the image so that it fits the network's input dimensions. If crop is true, the input image is resized so that one side of the resized is equal to the corresponding dimension and the other side is equal to or larger than. Then, crop from the center. If 'crop' is 'false', resize directly without cropping and preserve aspect ratio.
  • ddepth: Optional parameter specifying the depth of the output blob. The default is CV_32F, which means the depth of the output blob is a 32-bit floating point number.

The function returns a 4D matrix (with no row/column values ​​defined, so these values ​​are -1). Represents the converted blob.

返回的blob是一个四维矩阵,其形状为 `(N, C, H, W)`,其中:
- `N` 表示 blob 的数量,通常为 1。
- `C` 表示通道数,即图像的颜色通道数,例如 RGB 图像的通道数为 3。
- `H` 表示 blob 的高度(或行数)。
- `W` 表示 blob 的宽度(或列数)。

这个函数的作用是将输入图像进行预处理,并将其转换为适合深度学习模型输入的格式。它执行以下操作:
1. 图像的大小调整:根据指定的 `size` 参数,将图像的尺寸调整为指定大小。
2. 图像的均值减法:根据指定的 `mean` 参数,对图像的每个像素进行均值减法操作。
3. 图像的像素值缩放:根据指定的 `scalefactor` 参数,对图像的像素值进行缩放操作。
4. 图像的通道交换:根据指定的 `swapRB` 参数,交换图像的颜色通道。

这样处理后的图像被存储在返回的blob中,可以直接作为深度学习模型的输入。通常情况下,将返回的blob作为输入传递给深度学习模型的`forward`函数进行推理或训练。

Notice:

  1. When the scalefactor, size, mean, and swapRB operations are performed at the same time, swap the channel according to swapRB first, then scale according to the scalefactor ratio, then subtract according to the mean, and finally perform the resize operation according to the size

  2. When performing mean subtraction, ddepth cannot select CV_8U, otherwise an error will be reported:

    OpenCV(4.1.2) D:\Build\OpenCV\opencv-4.1.2\modules\dnn\src\dnn.cpp:251: error: (-215:Assertion failed) mean_ == Scalar() && "Mean subtraction is not supported for CV_8U blob depth" in function 'cv::dnn::dnn4_v20190902::blobFromImages'

  3. When crop=True, scale first until one of the width and height is equal to the corresponding size, and the other is greater than or equal to the corresponding size, and then crop from the center

Use blobFromImagethe function to convert the input image to the input format accepted by the deep learning network, which is convenient for prediction or inference. The transformed blob can be passed as input to the deep learning model to cv::dnn::Net::forwardthe function for prediction.

Note: Before using this function, you need to ensure that the corresponding deep learning model has been loaded and cv::dnn::Netinitialized with the class.

Net class

cv::dnn::Net::setInput The input data is set as the input of the neural network model

In OpenCV, cv::dnn::Net::setInputfunctions are used to set input data as input to a neural network model. It has several different overloaded forms, but the main role is to load input data into the network for inference.

The following is cv::dnn::Net::setInputthe C++ function signature and parameter details of the function:

void cv::dnn::Net::setInput(InputArrayOfArrays blob, const std::string& name = "")

Parameter explanation:

  • blob: The blob (Binary Large Object) of the input data. It can be an array of multiple blobs. Blob is a multidimensional array in OpenCV, which is used to store data such as images and features.
  • name: (optional parameter) Specifies the name of the input layer. If the model has multiple input layers, this parameter can be used to specify the specific layer to set the input to.

The steps to use cv::dnn::Net::setInputthe function are as follows:

  1. Create an cv::dnn::Netobject representing the neural network model.
  2. Load the trained model files (such as Caffe model, TensorFlow model, etc.) into cv::dnn::Netthe object.
  3. Prepare input data by converting it to an appropriate format such as a blob.
  4. Call cv::dnn::Net::setInputthe function to set the input data as input to the network model.

Here is an example showing how to use cv::dnn::Net::setInputa function to load an image into a neural network model for inference:

#include <opencv2/opencv.hpp>

int main()
{
    
    
    // 创建网络模型
    cv::dnn::Net net;

    // 加载训练好的模型文件
    net = cv::dnn::readNetFromCaffe("model.prototxt", "model.caffemodel");

    // 加载图像
    cv::Mat image = cv::imread("image.jpg");

    // 将图像转换为blob
    cv::Mat blob = cv::dnn::blobFromImage(image, 1.0, cv::Size(224, 224), cv::Scalar(104, 117, 123));

    // 设置输入数据
    net.setInput(blob);

    // 执行推理
    cv::Mat output = net.forward();

    // 处理输出结果...
    
    return 0;
}

In this example, we first create an cv::dnn::Netobject and then load a Caffe model file. Next, we cv::imreadloaded an image using a function and cv::dnn::blobFromImageconverted the image to blob format using a function. Finally, we call cv::dnn::Net::setInputthe function to set the blob as input to the network model. We can then call cv::dnn::Net::forwardfunctions to perform inference and process the output.
insert image description here

cv::dnn::Net::forward forward propagation

cv::dnn::Net::forwardis a function in OpenCV for performing the forward pass of a deep neural network (DNN). It passes input data to the network and returns the network output. The following is a detailed description of the function:

cv::Mat cv::dnn::Net::forward(const cv::String &outputName = cv::String())

Parameters :

  • outputName(Optional): Specifies the name of the network output layer to retrieve. If not provided outputName, results from all output layers are returned.

return value :

  • cv::Mat: Returns an cv::Matobject containing the output of the network.
函数返回的是一个四维矩阵(或称为四维张量)。
这个四维矩阵通常表示卷积神经网络(Convolutional Neural Network, CNN)的输出,也被称为特征图(feature maps)。
四维矩阵的维度通常被定义为`(batch_size, num_channels, height, width)`,其中:
- `batch_size`表示批处理的大小,即一次输入的样本数量。(如果为1 表示1个样本)
- `num_channels`表示特征图的通道数,也可以理解为特征图的深度或特征提取器的数量。
- `height`和`width`表示特征图的高度和宽度。
这样的四维矩阵可以同时处理多个输入样本,并且每个样本可以有多个通道的特征图。你可以通过对返回的四维矩阵使用相应的索引来获取特定样本和通道的特征图数据。

In the last dimension, the starting order is: image number, label, confidence, and the 4 coordinate information of the target position [xmin ymin xmax ymax]
contains the following information:

  1. Category label (label): Indicates the category to which the detected target belongs. This is usually an integer value, corresponding to the different classes recognized by the model.
  2. Confidence: Indicates the confidence or probability score of the model for the detection result. In general, this value is between 0 and 1.
  3. The bounding box information of the target position (bounding box): Indicates the position of the detected target in the image. This is usually described by four coordinate values, the coordinates [xmin, ymin, xmax, ymax] of the upper left and lower right corners of the bounding box.

If you are using a specific object detection model, you can refer to that model's documentation or network structure to understand what each specific output dimension means. Different models may have different output formats and meanings.
insert image description here

function :

  • Performs the forward pass of the network, and returns the network output.

Before using cv::dnn::Net::forwardthe function, the network model must be loaded and built. This usually involves loading the model's config and weight files and using them to build the network. Once the network is loaded and built, forwardfunctions can be used to do the forward pass.

#include <iostream>
#include <opencv2/core/core.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/opencv.hpp>
#include <fstream>

#include <opencv2/ximgproc.hpp> // 需要添加OpenCV扩展库
#include <opencv2/ml.hpp>
#include <cmath>
#include <locale>
#include <codecvt>
#include <string>



using namespace cv;
using namespace std;
using namespace ml;
using namespace cv::dnn;


int main()
{
    
    
	
	system("color F0");
	string model = "age_net.caffemodel";
	string config = "age_deploy.prototxt";

	//加载模型
	Net net = dnn::readNet(model, config);
	if (net.empty())
	{
    
    
		cout << "请确认是否输入空的模型文件" << endl;
		return -1;
	}

	//获取各层信息
	vector<String> layerNames = net.getLayerNames();

	for (int i = 0; i < layerNames.size(); i++)
	{
    
    
		//读取每层网络的ID
		int ID = net.getLayerId(layerNames[i]);
		//读区每层网络的信息
		Ptr<Layer> layer = net.getLayer(ID);
		//权重 和 偏置  vector<Mat> ddd = layer->blobs;
		//Mat weights = net.getParam(layerNames[i], 0);
		//Mat biases = net.getParam(layerNames[i], 1);

		cout << "网络层数:" << ID << "网络层名称:" << layerNames[i] << endl;
		cout << "网络层类型:" << layer->type << endl;
		
		vector<Mat> ddd = layer->blobs;

	}

	waitKey(0);
	return 0;
}

网络层数:1网络层名称:conv1
网络层类型:Convolution
网络层数:2网络层名称:relu1
网络层类型:ReLU
网络层数:3网络层名称:pool1
网络层类型:Pooling
网络层数:4网络层名称:norm1
网络层类型:LRN
网络层数:5网络层名称:conv2
网络层类型:Convolution
网络层数:6网络层名称:relu2
网络层类型:ReLU
网络层数:7网络层名称:pool2
网络层类型:Pooling
网络层数:8网络层名称:norm2
网络层类型:LRN
网络层数:9网络层名称:conv3
网络层类型:Convolution
网络层数:10网络层名称:relu3
网络层类型:ReLU
网络层数:11网络层名称:pool5
网络层类型:Pooling
网络层数:12网络层名称:fc6
网络层类型:InnerProduct
网络层数:13网络层名称:relu6
网络层类型:ReLU
网络层数:14网络层名称:drop6
网络层类型:Dropout
网络层数:15网络层名称:fc7
网络层类型:InnerProduct
网络层数:16网络层名称:relu7
网络层类型:ReLU
网络层数:17网络层名称:drop7
网络层类型:Dropout
网络层数:18网络层名称:fc8
网络层类型:InnerProduct
网络层数:19网络层名称:prob
网络层类型:Softmax

Code demo:

int main()
{
    
    

	system("color F0");
	Mat img = imread("face.jpg");
	Mat img1 = imread("persons.png");
	Mat img2 = imread("master.jpg");
	if (img.empty() || img1.empty() || img2.empty())
	{
    
    
		cout << "请确定是否输入正确的图像文件" << endl;
		return -1;
	}
	
	//读取人脸识别模型
	string model = "face_model/res10_300x300_ssd_iter_140000_fp16.caffemodel";
	string config = "face_model/deploy.prototxt";

	//加载模型
	Net faceNet = dnn::readNet(model, config);
	if (faceNet.empty())
	{
    
    
		cout << "请确认是否输入空的模型文件" << endl;
		return -1;
	}

	vector<Mat> imgs;
	imgs.push_back(img);
	imgs.push_back(img1);
	imgs.push_back(img2);
	//对整幅图像进行人脸检测
	Mat blobImage = blobFromImages(imgs, 1.0, Size(300, 300), Scalar(), false, false);
	/*
	返回的blob是一个四维矩阵,其形状为 `(N, C, H, W)`,其中:
	- `N` 表示 blob 的数量,通常为 1。1 :表示1张图片,3:表示3张图片
	- `C` 表示通道数,即图像的颜色通道数,例如 RGB 图像的通道数为 3。
	- `H` 表示 blob 的高度(或行数)。
	- `W` 表示 blob 的宽度(或列数)。
	*/
	//std::cout << "size[0]:" << blobImage.size[0] << std::endl;//3张图片
	//std::cout << "size[1]:" << blobImage.size[1] << std::endl;//3个通道
	//std::cout << "size[2]:" << blobImage.size[2] << std::endl;//300高度
	//std::cout << "size[3]:" << blobImage.size[3] << std::endl;//300宽度
	faceNet.setInput(blobImage, "data");
	Mat detect = faceNet.forward("detection_out");

	std::cout << "size[0]:" << detect.size[0] << std::endl;//1
	std::cout << "size[1]:" << detect.size[1] << std::endl;//1
	std::cout << "size[2]:" << detect.size[2] << std::endl;//200
	std::cout << "size[3]:" << detect.size[3] << std::endl;//7

	//detect.ptr<float>(0) == detect.ptr<float>();
	
	//人脸概率 人脸矩形区域的位置  ; 提取detect中的数据
	Mat detectionMat(detect.size[2], detect.size[3], CV_32F, detect.ptr<float>(0));

	//对每个人脸区域进行性别检测
	int exBoundray = 25;//对每个人脸区域四个方向扩充的尺寸
	float confidenceThreshold = 0.5; //判定为人脸的概率阈值,阈值越大准确性越高

	//人脸数据个数
	for (int i = 0; i < detectionMat.rows; i++)
	{
    
    

		//检测为人脸的概率
		float confidence = detectionMat.at<float>(i, 2);
		//只检测概率大于阈值区域的性别
		if (confidence > confidenceThreshold)
		{
    
    
			//imgs[(int)detectionMat.at<float>(i, 1)]  获取目前判断的图片
			//网络检测人脸区域的大小
			//矩形的左上角
			int imgsSerial = (int)detectionMat.at<float>(i, 0);
			int topLx = static_cast<int>(detectionMat.at<float>(i, 3) * imgs[imgsSerial].cols);
			int topLy = static_cast<int>(detectionMat.at<float>(i, 4) * imgs[imgsSerial].rows);

			//矩形的右上角
			int bottomRx = static_cast<int>(detectionMat.at<float>(i, 5) * imgs[imgsSerial].cols);
			int bottomRy = static_cast<int>(detectionMat.at<float>(i, 6) * imgs[imgsSerial].rows);
			//矩形
			Rect faceRect(topLx, topLy, bottomRx - topLx, bottomRy - topLy);

			rectangle(imgs[imgsSerial], faceRect, Scalar(0, 0, 255), 2, 8, 0);

		
		}
	}
	//显示结果
	for (int i = 0; i < imgs.size(); i++)
	{
    
    
		imshow("检测结果"+to_string(i), imgs[i]);
	}

	waitKey(0);
	return 0;
}

insert image description here

Guess you like

Origin blog.csdn.net/weixin_43763292/article/details/131252738