opencv practice project - image stitching

1 Introduction

Image stitching is one of the most successful applications in computer vision. These days, it's hard to find a phone or image processing API that doesn't include this functionality. In this
article, we will discuss how to use OpenCV for image stitching. That is, given two images that share some common areas
, the goal is to "stitch" them and create a panoramic image scene. Of course, multiple images can also be given, but it will always be converted into
a problem of stitching two images sharing some common areas.

2. Steps

Two pictures stitched together
insert image description here
insert image description here

2.1 Feature detection and extraction

Given the above pair of images, we wish to stitch them together to create a panoramic scene. It is important to note that both images need to
have some common areas. Of course, the two images we gave above are ideal. Sometimes, although the two images have a common
area, they may also be affected by factors such as scaling, rotation, and from different cameras. But in either case,
we need to detect feature points in the image.

2.2 Keypoint detection

An initial and possibly naive approach is to use algorithms such as Harris Corners to extract keypoints. We can then
try to match corresponding keypoints based on some similarity measure (e.g. Euclidean distance). As we all know, the corner points
have a nice property: the corner points do not change. This means that once a corner is detected, it will still exist even if the image is rotated
.
But what if we rotate and then scale the image? In this case, we will have a hard time because the size of the corner points does not
change. That is, if we zoom in on the image, the previously detected corner may turn into a line!
To summarize, we need rotation and scale invariant features. That's the more powerful methods (like SIFT, SURF and
ORB).

2.3 Keypoints and Descriptors

Methods such as SIFT and SURF attempt to address the limitations of corner detection algorithms. Typically, corner detector algorithms use a fixed-
size kernel to detect regions of interest (corners) on an image. It's not hard to see that as we scale the image, this kernel can
become too small or too large. To address this limitation, methods such as SIFT use Difference of Gaussian (DoD). The idea is to
apply DoD to different scaled versions of the same image. It also uses neighboring pixel information to find and refine keypoints and corresponding descriptors
.
First, we need to load 2 images, a query image and a training image.
Initially, we first extract keypoints and descriptors from both . We can do it in one step by using the OpenCV detectAndCompute() function. Note
that in order to use detectAndCompute() we need an instance of the keypoint detector and descriptor objects. It can
be ORB, SIFT or SURF etc.
Also, we convert the image to grayscale before feeding it to detectAndCompute() .
code:

void detectAndDescribe(const cv::Mat &image, Extract_Features_Method method, std::vector<KeyPoint> &keypoints, cv::Mat &descriptor)
{
    
    
	switch (method)
	{
    
    
	case Extract_Features_Method::METHOD_SIFT:
	{
    
    
		Ptr<cv::SIFT> detector = cv::SIFT::create(800);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_SURF:
	{
    
    
		int minHessian = 400;
		Ptr<cv::xfeatures2d::SURF> detector = cv::xfeatures2d::SURF::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_BRISK:
	{
    
    
		int minHessian = 400;
		
		Ptr<BRISK> detector = BRISK::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_ORB:
	{
    
    
		int minHessian = 400;

		Ptr<ORB> detector = ORB::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	default:
		break;
	}
}

We set a set of keypoints and descriptors for both images. If we use SIFT as the feature extractor, it will
return a 128-dimensional feature vector for each keypoint. If SURF is chosen, we will obtain a 64-dimensional feature vector.

2.4 Feature matching

Now, we want to compare two sets of features and show as many pairs of similar feature points as possible
. Using OpenCV, feature point matching requires a Matcher object. Here, we explore two approaches: brute force matchers (BruteForce) and KNN (k nearest neighbors).
The BruteForce (BF) Matcher does exactly what its name suggests. Given 2 sets of features (from image A and image B), compare each feature of set A with all features of set B. By default, BF Matcher calculates the Euclidean distance between two points. So for every feature in set A, it returns the closest feature in set B. For SIFT and SURF, OpenCV recommends Euclidean distance. For other feature extractors like ORB and BRISK, Hamming distance is recommended. We want to use OpenCV to create a BruteForce Matcher. In general, we only need to specify 2 parameters. The first is the distance metric. The second is a boolean parameter whether to perform intersection detection.
The specific code is as follows:

auto createMatcher(Extract_Features_Method method, bool crossCheck)
{
    
    

	if (method == Extract_Features_Method::METHOD_SIFT || method == Extract_Features_Method::METHOD_SURF)
	{
    
    
		return cv::BFMatcher(cv::NORM_L2, crossCheck);
	}

	return cv::BFMatcher(cv::NORM_HAMMING, crossCheck);
}

The crosscheck Boolean parameter indicates whether the two features have a mutual match to be considered valid. In other words, for a pair of features (f1, f2) to be considered valid, f1 needs to match f2, and f2 must also match f1 as the closest match. This process ensures a more robust matching feature set, which is described in the original SIFT paper.
However, for cases where multiple candidate matches are to be considered, a KNN-based matching procedure can be used. Instead of returning a single best match for a given feature
, KNN returns the k best matches. It should be noted that the value of k must be predefined by the user. As we expected, KNN provides
more candidate features. However, before proceeding further, we need to ensure that all these matching pairs are robust.

2.5 Ratio test

To ensure that the features returned by KNN are well comparable, the authors of the SIFT paper propose a technique called ratio testing. In general
, we traverse KNN to get matching pairs, and then perform distance test. For each pair of features (f1, f2), if the distance between f1 and f2
is within a certain ratio, it is kept, otherwise it is discarded. Again, the ratio value must be selected manually.
Essentially, the ratio test does the same thing as BruteForce Matcher's cross-check option. Both ensure that a pair of detected features is indeed close enough to be considered similar. The following two figures show the matching results of BF and KNN Matcher on SIFT features. We choose to show only 100 matching points for clarity.
insert image description here

It should be noted that even if multiple screenings are done to ensure the correctness of the matching, it is impossible to fully guarantee that the feature points are completely matched correctly. Still
, the Matcher algorithm will give us the best (more similar) feature set of the two images. Next, we use these points to compute
the transformation matrix that stitches together the matched points of the two images.
This transformation is called a homography. In a nutshell, a homography is a 3x3 matrix that can be used in many applications such as camera pose estimation, perspective
correction, and image stitching. It maps points from one plane (image) to another.

2.6 Estimating Homography

Random Sampling Consensus (RANSAC) is an iterative algorithm for fitting linear models. Unlike other linear regressors, RANSAC is designed to be robust to outliers.
Models like linear regression use least squares estimation to fit the best model to the data. However, ordinary least squares is very sensitive to outliers. It may fail if the number of outliers is large. RANSAC solves this problem by estimating parameters using only one set of data from the data. The figure below shows the comparison between Linear Regression and RANSAC. Note that the dataset contains quite a few outliers.
insert image description here
We can see that the linear regression model is easily affected by outliers. That's because it tries to reduce the mean error. Therefore, it tends to favor the model that minimizes the total distance of all data points to the model itself. including outliers. In contrast, RANSAC only fits the model to a subset of the points identified as points. This feature is very important for our use case. Here, we will use RANSAC to estimate the homography matrix. It turns out that the homography matrix is ​​very sensitive to the quality of the data we pass it. Therefore, it is important to have an algorithm (RANSAC) that can filter out points that clearly belong to the data distribution from points that do not.
After estimating the homography, we need to transform one of the images onto a common plane. Here we will apply a perspective transformation to one of the images. Perspective transformations can combine one or more operations such as rotation, scaling, translation or shearing. We can use
the OpenCV warpPerspective() function. It takes an image and a homography matrix as input.

3. Complete code

typedef enum
{
    
    
	METHOD_SIFT,
	METHOD_SURF,
	METHOD_BRISK,
	METHOD_ORB
}Extract_Features_Method;

void detectAndDescribe(const cv::Mat &image, Extract_Features_Method method, std::vector<KeyPoint> &keypoints, cv::Mat &descriptor)
{
    
    
	switch (method)
	{
    
    
	case Extract_Features_Method::METHOD_SIFT:
	{
    
    
		Ptr<cv::SIFT> detector = cv::SIFT::create(800);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_SURF:
	{
    
    
		int minHessian = 400;
		Ptr<cv::xfeatures2d::SURF> detector = cv::xfeatures2d::SURF::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_BRISK:
	{
    
    
		int minHessian = 400;
		
		Ptr<BRISK> detector = BRISK::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	case Extract_Features_Method::METHOD_ORB:
	{
    
    
		int minHessian = 400;

		Ptr<ORB> detector = ORB::create(minHessian);
		detector->detectAndCompute(image, cv::Mat(), keypoints, descriptor);
		break;
	}
	default:
		break;
	}
}

auto createMatcher(Extract_Features_Method method, bool crossCheck)
{
    
    

	if (method == Extract_Features_Method::METHOD_SIFT || method == Extract_Features_Method::METHOD_SURF)
	{
    
    
		return cv::BFMatcher(cv::NORM_L2, crossCheck);
	}

	return cv::BFMatcher(cv::NORM_HAMMING, crossCheck);
}

int main()//stich_demo()
{
    
    
	string imgPath1 = "E:\\code\\Yolov5_Tensorrt_Win10-master\\pictures\\stich1.jpg";
	string imgPath2 = "E:\\code\\Yolov5_Tensorrt_Win10-master\\pictures\\stich2.jpg";
	Mat img1 = imread(imgPath1, IMREAD_GRAYSCALE);
	Mat img2 = imread(imgPath2, IMREAD_GRAYSCALE);

	std::vector<cv::KeyPoint> keypoint1;
	cv::Mat describe1;
	detectAndDescribe(img1, Extract_Features_Method::METHOD_SIFT, keypoint1, describe1);

	std::vector<cv::KeyPoint> keypoint2;
	cv::Mat describe2;
	detectAndDescribe(img2, Extract_Features_Method::METHOD_SIFT, keypoint2, describe2);

	auto matcher = createMatcher(Extract_Features_Method::METHOD_SIFT, false);

	vector<DMatch> firstMatches;
	matcher.match(describe1, describe2, firstMatches);
	
	vector<cv::Point2f> points1, points2;
	for (vector<DMatch>::const_iterator it = firstMatches.begin(); it != firstMatches.end(); ++it)
	{
    
    
		points1.push_back(keypoint1.at(it->queryIdx).pt);
		points2.push_back(keypoint2.at(it->trainIdx).pt);
	}

	auto inliers = vector<uchar>(keypoint1.size(), 0);
	cv::Mat h12 = cv::findHomography(points1, points2, inliers, RANSAC, 1.0);
	Mat h21;
	invert(h12, h21, DECOMP_LU);
	
	Mat canvas;
	Mat img1_color = imread(imgPath1);
	Mat img2_color = imread(imgPath2);

	warpPerspective(img2_color, canvas, h21, Size(img1.cols * 2, img1.rows));
	imshow("warp", canvas);
	img1_color.copyTo(canvas(Range::all(), Range(0, img1.cols)));

	imshow("canvas", canvas);

	waitKey(0);

	return 0;
}

The resulting panoramic image is shown below. As we can see, the result contains content from both images. Also, we can see
some issues related to lighting conditions and edge effects at image borders. Ideally, we could perform some processing technique
to normalize brightness, such as histogram matching, which would make the result look a bit more realistic and natural.insert image description here

This article refers to Lecture 71 of the Python Visual Combat Project, and transplants the python code to c++

Guess you like

Origin blog.csdn.net/wyw0000/article/details/130498696