Analysis of HPatches data set and feature point detection matching evaluation index

1 Introduction

Dataset download:
Dataset GitHub address
HPatches [4.2GB] : hpatches-release
HPatches full sequences [1.3GB] : hpatches-sequences-release

  Since the 1.3GB data set is used in the related papers on feature point detection and matching, another data set is temporarily skipped. The dataset contains 116 folders, and the folder names are divided into two categories i_xxx and v_xxx, i_xxx corresponds to image pairs with changing illumination, and v_xxx corresponds to image pairs with changed viewing angles. The contents of the folder are as follows:
insert image description here

  1 is the original image, and the rest are target images; H_1_x is the homography matrix; txt records the light or viewpoint corresponding to the illumination and viewing angle. Some data are presented in Section III.

  There are 57 i_xxx and 59 v_xxx in the data set, but papers usually only use 52 i_xxx and 56 v_xxx for testing. Images with a resolution greater than 1200*1600 are filtered out because not all methods can handle images of this resolution (D2-Net).

2. Evaluation

  There are two common evaluations in papers (Patch2Pix, DFM): Image Matching and Homography Estimation. Many papers only evaluate the first item (D2-Net, Sparse-NCNet, Aslfeat, DualRC-Net).
  Image Matching is to calculate the MMA (mean matching accuracy) of the matching points output by the model, and Homography Estimation is to calculate the homography matrix based on the matching points and then calculate the accuracy.

2.1 Image Matching

  Just look at how MMA is calculated according to the code of DualRC-Net.

'''
match 是根据网络得到的特征点匹配结果 (n,4)
'''
# read in homography
H = np.loadtxt(H_file)
# project the query to reference
npts = matches.shape[0]
query = matches[:, :2] * (hA / hA_)
ref = matches[:, 2:] * (hB / hB_)
query_ = np.concatenate((query, np.ones((npts, 1))), axis=1)
projection = np.matmul(H, query_.T).T

# convert the projection from homogeneous coordinate to inhomogeneous coordinate
projection = projection / projection[:, 2:3]
projection = projection[:, :2]
# evaluate the result
result = np.linalg.norm(ref-projection, axis=1)

for thres in range(1, 11):
    idx = thres-1
    if change_type == 'v':
        MMA_v[idx] += np.sum(result <= thres) / result.shape[0]
    if change_type == 'i':
        MMA_i[idx] += np.sum(result <= thres) / result.shape[0]
    MMA[idx] += np.sum(result <= thres) / result.shape[0]

  The point coordinates of the corresponding target image can be obtained by using the points of the source image and the homography matrix, and the distance error from the actual coordinates can be calculated, and the percentage of the number of matches smaller than the threshold to the total number of matches can be obtained according to different thresholds.
  It can be seen here that this MMA is actually somewhat related to the number of matches output. Generally, it is sorted according to the matching score, and the threshold is used to screen or directly take the first n. If too many feature points are obtained, the accuracy will be reduced. However, it is not possible to only select a few matches for accuracy, because a small number of points are likely to be concentrated in places where local matching is easier. The distribution of matching points will affect the accuracy of the homography matrix. Usually, the detection and matching of feature points is ultimately for Do alignment, which is why some papers add a second homography matrix evaluation.

insert image description here
  The picture above is the picture in the Patch2Pix paper. Illumination corresponds to the result of illumination transformation, Viewpoint corresponds to the result of perspective change, and Overall is the result of all data. The number of feature points and matching points is also given on the right, and many papers only give the MMA comparison on the left. In fact, it can be seen that the number of matches is still sufficient. Generally, the depth method has an NMS link, and the points are not particularly concentrated, so only comparing MMA is basically enough.

  The following is the code of Patch2Pix, the calculation logic is exactly the same.

dist = eval_matches(matches[:, :2], matches[:, 2:], homography)
for thr in thres_range:
    if sname[0] == 'i':
        i_err[thr] += np.mean(dist <= thr)
    else:
        v_err[thr] += np.mean(dist <= thr)

def eval_matches(p1s, p2s, homography):
    # Compute the reprojection errors from im1 to im2 
    # with the given the GT homography
    p1s_h = np.concatenate([p1s, np.ones([p1s.shape[0], 1])], axis=1)  # Homogenous
    p2s_proj_h = np.transpose(np.dot(homography, np.transpose(p1s_h)))
    p2s_proj = p2s_proj_h[:, :2] / p2s_proj_h[:, 2:]
    dist = np.sqrt(np.sum((p2s - p2s_proj) ** 2, axis=1))
    return dist

2.2 Homography Estimation

  Look directly at the code of Patch2Pix

'''
篇幅有限,只挑了比较核心的语句
可以看出是利用两个单应性矩阵对图像的4个角点做变换,然后计算像素误差
其实就是把计算匹配点的 MMA 变成了计算4个角点的 MMA
'''
H_gt = np.loadtxt(os.path.join(seq_dir, 'H_1_{}'.format(im_idx)))
H_pred, inliers = pydegensac.findHomography(p1s, p2s, rthres)

im = Image.open(im1_path)
w, h = im.size
corners = np.array([[0, 0, 1],
	                [0, w - 1, 1],
	                [h - 1, 0, 1],
	                [h - 1, w - 1, 1]])
real_warped_corners = np.dot(corners, np.transpose(H_gt))
real_warped_corners = real_warped_corners[:, :2] / real_warped_corners[:, 2:]
warped_corners = np.dot(corners, np.transpose(H_pred))
warped_corners = warped_corners[:, :2] / warped_corners[:, 2:]
mean_dist = np.mean(np.linalg.norm(real_warped_corners - warped_corners, axis=1))
correctness = [float(mean_dist <= cthr) for cthr in corr_thres]

  Review the experimental results of Patch2Pix again:
insert image description here
  it feels similar to the results of MMA, and an additional speed comparison is given here, and the comparative experiment of Patch2Pix is ​​very sufficient. (However, according to his github, it was trained on a 48G GPU, and most of the tests were also done)

3. Personal Thoughts

Lighting change data:

insert image description here

insert image description here

insert image description here

insert image description here
Perspective change data:

insert image description here

insert image description here

insert image description here

insert image description here

insert image description here

  Randomly selected some data and looked at it, and some characteristics of HPatches can be found:
(1) There is no change in viewing angle in the image of the illumination change, and there is no change in the light in the image of the change in the viewing angle; in practical application scenarios, the two changes usually exist at the same time.
(2) Most images can be approximated as a plane, so that the transformation can be approximated as a homography matrix as the ground truth; in actual application scenarios, there are many places with different depths in the image, and some parts will be unclear and the details are blurred .

  It is precisely because the data set is relatively simple that the indicators of the paper basically need to use MMA within 3 pixels. It is not clear whether such a high precision is required in actual application, and whether the homography matrix can be used as the ground truth Perfect alignment is also hard to say. In addition, it is quite difficult for the model to distinguish the difference between adjacent pixels to make the match more accurate.

Guess you like

Origin blog.csdn.net/weixin_43605641/article/details/122329338