Feature, feature invariance, scale space, image pyramid

feature

In the field of computer vision, feature-related information in order to complete a specific task needs . For example, face detection, we need to extract the features in the image to determine which areas are the face, which area is not a human face, face verification, we need to extract features two face region respectively, to determine that they are not the same personal, as shown below, wherein the depth of the neural network to give a final dimension 128 for identifying tasks.

Common features: gray or gray-scale image, histogram, gradient, edge, texture, moment, the SIFT, features, etc. depth study.

Key features to, for example, refers to a specific key points may appear stable, the key points such as corners, local extrema, etc., such as human face image above the eye at the muzzle, the key point of the first image is detected and then extract the information in the neighborhood of the range point as a center, the feature point as described. Characterized in that the key benefit of key points can be stably reproduced while further description focuses at critical points in the neighborhood of blocking, deformation and the like have a good robustness.

Different features of the scope of application is different , some features sensitive to light, some features on the deformation sensitive, need to select the appropriate scene features according to the task - for the best, not what time in the chopper. If the task is simple scenarios, such as lighting conditions change clearly know, basically the front face makeup and facial expression did not change, it may be used directly or gradient gray template matching can be identified, but if the scene is complex, illumination may change, might change the expression , the face angle is not determined, in order to complete the task, select the features you need to have good adaptability, which would talk about features invariance .

Feature invariance

FIG above as an example, about two different figures toy vehicle posture, different sizes, different image brightness, corresponding local (FIG tile yellow) different from the absolute position in the respective figures, different sizes, different directions, different gray, in order to achieve the registration, it is required to have certain invariance of the extracted feature local, it can be matched.

  • Geometric invariance (Geometric invariance) : translation, rotation, scale ......
  • Photometric invariance (time photometric invariance) : brightness, exposure ......

The image pre-processing on the input image is normalized (normalized maximum and minimum, mean and variance normalization, histogram equalization, etc.), or the illumination brightness can be done to some extent robust. Further, in the design of the feature extraction algorithm, by considering the relative and statistical information (such as gradients, histograms, etc.) to reduce the sensitivity of the gradation (color) of the absolute value, it may be further done luminance or light-robust .

By using the pixel information and the relative position information within the local window , the local feature generally can do translate irrelevant . To do regardless of the rotation , it is necessary window aligned in a main direction and then extracts local features, as shown in yellow small inclination, the main direction may be the direction of the gradient within the window most concentrated.

Scale space

Scale , may be understood by the scale of the map, as shown below, if the observation (referred to FIG. 100), there is a corner at the arrow A, if the observation unit is 5 meters (100 meters into a 5 ), it is more of a near depression B a, B and this depression is not visible in Figure 100, why? Scale-space theory is that smoothed out, that can see the details in low scale, high scale details will only be smoothed out more "macro" feature, so clear

  • Characterized by a scale , the scale B is at a critical point (in FIG. 5), but at a larger scale may not up (FIG. 100)
  • Extraction features required at the corresponding scale , since only the B key in the scale of FIG. 5, characterized in naturally be extracted in the scale of FIG. 5

如果两幅图像的尺度不同该如何匹配?图100和图5中的A是同一个点,但因为尺度不同,邻域差异很大,在各自的邻域中提取的特征自然不同,为了让它们能匹配上,需要对图5构建尺度空间,获得不同尺度下的表达,具体怎么做呢?

保持图5图像尺寸不变,不断(高斯)平滑,直到B处的凹陷平滑没了(与图100相似),达到与图100相近的尺度,图100中在A邻域内(例如10x10)提取特征(比如SIFT),图5中在A邻域内(例如200x200)提取特征,两者在各自图像中提取的SIFT特征长度相同(将领域划分成同样数量的子区域,然后在子区域中统计梯度直方图,某种程度上讲是对邻域窗口的归一化后再提取特征),这样就可以匹配上了。

平滑类似如下过程,最下面一行为原始信号,每一行对应一个尺度:

一般来讲,在没有先验知识的情况下,对两幅图像分别在每个尺度上检测关键点并提取特征,总有某些关键点及其特征正好来自相同的尺度,如果它们恰好可以匹配上,则图像1和图像2匹配,反之,如果所有关键点都配不上,则图像1和图像2不匹配。

小结一下:尺度空间,是在信号长度不变的情况下(如上图f(x)f(x)到ft(x)ft(x)),通过(高斯)平滑,获得信号在不同尺度下的表达,然后使用尺度对应大小的窗口进行观测和提取特征。因为获得了原始信号在所有尺度下的特征,这些特征在整体上做到了尺度无关——因为原始信号各种尺度的特征都有了。

图像金字塔

尺度空间中,不同尺度下观测窗口的像素尺寸是不同,还有另外一种情况,保持观测窗口大小不变,让图像尺寸发生变化

以人脸检测为例,通常训练结束后,方法中用到的滤波器、卷积核的参数和尺寸就固定了,因此提取到的特征只适用于检测像素大小在某个范围内的人脸,超出这个范围的人脸就检测不出来了。但是,在没有先验知识的情况下,输入图像中人脸的像素大小是未知的,不同输入图像中人脸的像素尺寸也可能不同,怎么办?这是就要用到图像金字塔。

构建图像金字塔,是为了获得图像在不同尺寸(不同分辨率)下的表达,通过不断重复 平滑+下采样 的过程(也有通过插值Resize),获得了不同像素尺寸的人脸图像,其中只要有与网络适配的人脸就可以被检测出来。

小结一下:图像金字塔,是在保持观测窗口不变的情况下,获得输入图像在不同尺寸(分辨率)下的表达,在不同尺寸上提取到的特征在整体上做到了尺寸(分辨率)无关。实际使用中,一般采用2倍下采样,即金字塔中图像长宽逐层折半。

 

Guess you like

Origin www.cnblogs.com/pacino12134/p/11370379.html