OpenCV中图像特征提取与描述

图像特征提取与描述

主要内容是:

  • 图像的特征
  • Harris和Shi-Tomasi算法的原理及角点检测的实现
  • SIFT/SURF算法的原理及使用SIFT/SURF进行关键点的检测方法
  • Fast算法角点检测的原理角及其应用
  • ORB算法的原理,及特征点检测的实现

图像的特征

大多数人都玩过拼图游戏。首先拿到完整图像的碎片,然后把这些碎片以正确的方式排列起来从而重建这幅图像。如果把拼图游戏的原理写成计算机程序,那计算机就也会玩拼图游戏了。

在拼图时,我们要寻找一些唯一的特征,这些特征要适于被跟踪,容易被比较。我们在一副图像中搜索这样的特征,找到它们,而且也能在其他图像中找到这些特征,然后再把它们拼接到一起。我们的这些能力都是天生的。

那这些特征是什么呢?我们希望这些特征也能被计算机理解。

如果我们深入的观察一些图像并搜索不同的区域,以下图为例:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-CLCQPeEj-1646741526289) (F:\Python Learning\129 Dark Horse Artificial Intelligence 2.0 Course\Learning Essay\Phase 4 Computer Vision and Image Processing\Image Processing and OpenCV Essay\Chapter 10 Image Feature Extraction and Description Essay\Note Picture\image-20191008141826875.png)]

在图像的上方给出了六个小图。找到这些小图在原始图像中的位置。你能找到多少正确结果呢?

A 和 B 是平面,而且它们的图像中很多地方都存在。很难找到这些小图的准确位置。

C 和 D 也很简单。它们是建筑的边缘。可以找到它们的近似位置,但是准确位置还是很难找到。这是因为:沿着边缘,所有的地方都一样。所以边缘是比平面更好的特征,但是还不够好。

Finally E and F are some corners of the building. They can be found easily. Because at the corners, no matter which direction you move the thumbnail, the result will be very different. So think of them as a good feature. To better understand this concept let's take a simpler example.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-qdUDffVD-1646741526290)(note picture/image-20191008141945745.png)]

As shown in the image above, the area in the blue box is a plane that is difficult to find and track. No matter which direction you move the blue box, it's the same. For the area in the black box, it is an edge. It changes if moved vertically. But it doesn't change if you move horizontally. As for the corner point in the red box, no matter which direction you move, the results are different, which means it is unique. Therefore, we say that the corner point is a good image feature, which answers the previous question.

Corner is a very important feature of image, which plays a very important role in the understanding and analysis of image graphics. Corner points play a very important role in computer vision fields such as 3D scene reconstruction motion estimation, target tracking, target recognition, image registration and matching. In the real world, corner points correspond to corners of objects, road intersections, T-junctions, etc.

So how do we find these corners? Next we use various algorithms in OpenCV to find the features of the image and describe them.

summary:

Image features: Image features should be distinguishable and easy to be compared. It is generally believed that corners, spots, etc. are better image features

Feature detection: find features in an image

Feature description: describe the feature and its surrounding area

Harris and Shi-Tomas algorithm

learning target

  • Understand the principle of Harris and Shi-Tomasi algorithm
  • Capable of using Harris and Shi-Tomasi for corner detection

Harris corner detection

The idea of ​​Harris corner detection is to observe the image through a small local window of the image. The feature of the corner is that the window moves in any direction, which will cause obvious changes in the gray level of the image, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-yOXnbuXc-1646741526291)(note picture/image-20191008144647540.png)]

Transform the above idea into a mathematical form, that is, move the local window in all directions (u, v) and calculate the sum of all grayscale differences, the expression is as follows:

E ( u , v ) = ∑ x , y w ( x , y ) [ I ( x + u , y + v ) − I ( x , y ) ] 2 E(u, v)=\sum_{x, y} w(x, y)[I(x+u, y+v)-I(x, y)]^{2} E ( u ,v)=x,yw(x,y)[I(x+u,y+v)I(x,y)]2

Where I(x,y) is the image grayscale of the local window, I(x+u,y+v) is the image grayscale after translation, w(x,y) is the window function, which can be a rectangular window, or It can be a Gaussian window that assigns different weights to each pixel, as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-M5t4GaIT-1646741526291)(note picture/image-20191008153014984.png)]insert image description here

Maximize the value of E(u,v) in corner detection. Using the first-order Taylor expansion, we have:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-9BCm4U3s-1646741526292)(note picture/image-20220308154736341.png)]

Among them, Ix and Iy are derivatives along the x and y directions, which can be calculated by the sobel operator.

The derivation is as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-TxnKqc3x-1646741526293)(note picture/image-20191015180016665.png)]

The M matrix determines the value of E(u,v). Next, we use M to find the corner points. M is the quadratic function of Ix and Iy, which can be expressed as an ellipse. The semi-axis of the ellipse is determined by the characteristics of M. The values ​​λ1 and λ2 are determined, and the direction is determined by the eigenvector, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-57t0gy3p-1646741526293)(note picture/image-20191008160908338.png)]

The relationship between the eigenvalues ​​of the elliptic function and the corner points, straight lines (edges) and planes in the image is shown in the figure below.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-fYSghOel-1646741526294)(note picture/image-20191008161040473.png)]

There are three situations:

  • straight lines in the image. One eigenvalue is large and the other eigenvalue is small, λ1 >> λ2 or λ2 >> λ1. Elliptic function values ​​are large in one direction and small in other directions.
  • The plane in the image. Both eigenvalues ​​are small and approximately equal; elliptic function values ​​are small in all directions.
  • corners in the image. Both eigenvalues ​​are large and approximately equal, and the elliptic function increases in all directions

The corner point calculation method given by Harris does not need to calculate specific eigenvalues, but calculates a corner point response value R to judge the corner point. The calculation formula of R is:

R = det ⁡ M − α (  trace  M ) 2 R=\operatorname{det} M-\alpha(\text { trace } M)^{2} R=d e tMα( trace M)2

In the formula, detM is the determinant of matrix M; traceM is the trace of matrix M; α is a constant, and its value ranges from 0.04 to 0.06. In fact, signatures are implicit in detM and traceM because:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-EPP9DF8O-1646741526295)(note picture/image-20191015181643847.png)]

So how do we judge corners? As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-fnymxFOo-1646741526296)(note picture/image-20191008161904372.png)]

  • When R is a positive number with a large value, it is a corner point
  • Boundary when R is negative for large values
  • When R is a decimal, it is considered to be a flat area

The API used to implement Hariis detection in OpenCV is:

dst=cv.cornerHarris(src, blockSize, ksize, k)

parameter:

  • img: input image with data type float32.
  • blockSize: The size of the neighborhood to consider in corner detection.
  • ksize: the kernel size used for sobel derivation
  • k : Free parameter in the corner detection equation, the value parameter is [0.04, 0.06].
import cv2 as cv
import numpy as np 
import matplotlib.pyplot as plt
# 1 读取图像,并转换成灰度图像
img = cv.imread('./image/chessboard.jpg')
gray = cv.cvtColor(img, cv.COLOR_BGR2GRAY)
# 2 角点检测
# 2.1 输入图像必须是 float32
gray = np.float32(gray)

# 2.2 最后一个参数在 0.04 到 0.05 之间
dst = cv.cornerHarris(gray,2,3,0.04)
# 3 设置阈值,将角点绘制出来,阈值根据图像进行选择
img[dst>0.001*dst.max()] = [0,0,255]
# 4 图像显示
plt.figure(figsize=(10,8),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('Harris角点检测')
plt.xticks([]), plt.yticks([])
plt.show()

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-aejrQNvt-1646741526297)(note picture/image-20191008164344988.png)]

Advantages and disadvantages of Harris corner detection:

advantage:

  • Rotation invariance, the ellipse rotates through a certain angle but its shape remains unchanged (eigenvalues ​​remain unchanged)
  • The affine change of image grayscale has partial invariance. Since only one mediation number of the image is used, the translation change of image grayscale is unchanged; the change of image grayscale scale is unchanged.

shortcoming:

  • It is sensitive to scale and does not have geometric scale invariance.
  • Extracted corners are pixel-level

Shi-Tomasi corner detection

The Shi-Tomasi algorithm is an improvement to the Harris corner detection algorithm, and generally gets better corners than the Harris algorithm. The corner response function of the Harris algorithm is to subtract the determinant value of the matrix M from the trace of M, and use the difference to judge whether it is a corner. Later, Shi and Tomasi proposed an improved method, if the smaller of the two eigenvalues ​​of the matrix M is greater than the threshold, it is considered to be a corner point, namely:

R = min ⁡ ( λ 1 , λ 2 ) R = \min \left(\lambda_{1}, \lambda_{2}\right)R=min( l1,l2)

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-DFUZEQdN-1646741526298)(note picture/image-20191008171309192.png)]

From this figure, it can be seen that only when λ1 and λ2 are larger than the minimum value, it is considered as a corner point.

Implement Shi-Tomasi corner detection in OpenCV using API:

corners = cv2.goodFeaturesToTrack ( image, maxcorners, qualityLevel, minDistance )

parameter:

  • Image: input grayscale image
  • maxCorners : Get the number of corners.
  • qualityLevel: This parameter indicates the minimum acceptable corner quality level, between 0-1.
  • minDistance: The minimum Euclidean distance between corner points to avoid getting adjacent feature points.

return:

  • Corners: The searched corner points, where all the corner points below the quality level are excluded, and then the qualified corner points are sorted by quality, and then the corner points near the corner points with better quality (less than the minimum Euclidean distance) are sorted Delete, and finally find maxCorners corner points and return.
import numpy as np 
import cv2 as cv
import matplotlib.pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg') 
gray = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2 角点检测
corners = cv.goodFeaturesToTrack(gray,1000,0.01,10)  
# 3 绘制角点
for i in corners:
    x,y = i.ravel()
    cv.circle(img,(x,y),2,(0,0,255),-1)
# 4 图像展示
plt.figure(figsize=(10,8),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('shi-tomasi角点检测')
plt.xticks([]), plt.yticks([])
plt.show()

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-62FemMhq-1646741526299)(note picture/image-20191008174257711.png)]

summary

  1. Harris Algorithm

    Idea: Observe the image through a small local window of the image, and the feature of the corner point is that moving the window in any direction will cause obvious changes in the gray level of the image.

    API: cv.cornerHarris()

  2. Shi-Tomasi arithmetic

    Improvements to the Harris algorithm to better detect corners

    API: cv2.goodFeatureToTrack()

SIFT/SURF algorithm

learning target

  • Understand the principle of SIFT/SURF algorithm,
  • Can use SIFT/SURF for key point detection

SIFT principle

In the previous two sections, we introduced the Harris and Shi-Tomasi corner detection algorithms. These two algorithms have rotation invariance, but not scale invariance. Take the following figure as an example. Corner points can be detected in the small picture on the left, but After the image is enlarged, using the same window, the corners cannot be detected.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-C7BkR9K6-1646741526300)(note picture/image-20191008181535222.png)]

So, let's introduce a computer vision algorithm, scale-invariant feature transform or SIFT (Scale-invariant feature transform). It is used to detect and describe local features in images. It looks for extreme points in the spatial scale and extracts its position, scale, and rotation invariants. This algorithm was published by David Lowe in 1999 and perfected in 2004. Summarize. Applications include object recognition, robot map perception and navigation, image stitching, 3D model building, gesture recognition, image tracking, and motion comparison.

The essence of the SIFT algorithm is to find key points (feature points) in different scale spaces and calculate the direction of the key points. The key points found by SIFT are some very prominent points that will not change due to factors such as illumination, affine transformation and noise, such as corner points, edge points, bright spots in dark areas, and dark points in bright areas .

basic process

Lowe decomposes the SIFT algorithm into the following four steps :

  1. Scale-space extrema detection: Searches for image locations at all scales. Potential keypoints that are invariant to scale and rotation are identified by a Gaussian difference function.
  2. Keypoint positioning: At each candidate location, a fine-fitting model is used to determine the location and scale. Keypoints are chosen according to their stability.
  3. Keypoint orientation determination: assign one or more orientations to each keypoint position based on the local gradient orientation of the image. All subsequent operations on the image data are transformed relative to the orientation, scale, and position of the keypoints, thereby ensuring invariance to these transformations.
  4. Keypoint description: In the neighborhood around each keypoint, the local gradient of the image is measured at a selected scale. These gradients serve as keypoint descriptors, which allow relatively large local shape deformations or lighting changes.

Let's follow Lowe's steps to introduce the implementation process of the SIFT algorithm:

Scale Space Extrema Detection

It is not possible to use the same window to detect extreme points in different scale spaces, use a small window for small key points, and use a large window for large key points. In order to achieve the above goals, we use scale space filters.

The Gaussian kernel is the only kernel function that can generate a multi-scale space. - "Scale-space theory: A basic tool for analyzing structures at different scales".

The scale space L(x,y,σ) of an image is defined as the convolution operation of the original image I(x,y) with a variable-scale 2-dimensional Gaussian function G(x,y,σ), namely:

L ( x , y , σ ) = G ( x , y , σ ) ∗ I ( x , y ) L(x, y, \sigma)=G(x, y, \sigma) * I(x, y) L(x,y,s )=G(x,y,s )I(x,y)

in:

G ( x , y , σ ) = 1 2 π σ 2 e − x 2 + y 2 2 σ 2 G(x, y, \sigma)=\frac{1}{2 \pi \sigma^{2}} e^{-\frac{x^{2}+y^{2}}{2\sigma^{2}}}G(x,y,s )=2 p.s _21e2 p2x2+y2

σ is the scale space factor, which determines the degree of blurring of the image. On a large scale (with a large σ value), the general information of the image is represented, and on a small scale (with a small σ value), the detailed information of the image is represented.

When calculating the discrete approximation of the Gaussian function, the pixels outside the approximate 3σ distance can be regarded as ineffective, and the calculation of these pixels can be ignored. Therefore, in practical applications, only calculating the Gaussian convolution kernel of **(6σ+1)*(6σ+1)** can guarantee the influence of relevant pixels.

Next, we construct the Gaussian pyramid of the image, which is obtained by blurring and downsampling the image with the Gaussian function. During the construction of the Gaussian pyramid, the image is first doubled, and the Gaussian pyramid is constructed on the basis of the enlarged image, and then the The image under this size is Gaussian blurred, and several blurred images form an Octave, and then select an image under the Octave to downsample, the length and width are doubled, and the image area becomes a quarter of the original . This image is the initial image of the next Octave. On the basis of the initial image, the Gaussian blur processing belonging to this Octave is completed, and so on to complete all the octave constructions required by the entire algorithm, so that the Gaussian pyramid is constructed. The entire The process is shown in the figure below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-c06QPKDP-1646741526300)(note picture/image-20191009110944907.png)]

Using LoG (Laplacian of Gaussian method), that is, the second derivative of the image, the key point information of the image can be detected at different scales, so as to determine the feature points of the image. However, LoG is computationally intensive and inefficient. So we obtain DoG (difference of Gaussian) to approximate LoG by subtracting images of two adjacent Gaussian scale spaces.

In order to calculate DoG, we build a Gaussian difference pyramid, which is built on the basis of the above-mentioned Gaussian pyramid. The establishment process is: in the Gaussian pyramid, the subtraction of two adjacent layers in each Octave constitutes a Gaussian difference pyramid. As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-xuIzxCn4-1646741526300)(note picture/image-20191009113953721.png)]

The first group and the first layer of the Gaussian difference pyramid are obtained by subtracting the first group and the first layer from the first group and the second layer of the Gaussian pyramid. By analogy, each difference image is generated group by group and layer by layer, and all difference images form a difference pyramid. It is summarized that the image of the oth group l layer of the DOG pyramid is obtained by subtracting the oth group l layer from the oth group l+1 layer of the Gaussian pyramid. The extraction of subsequent Sift feature points is carried out on the DOG pyramid

After DoG is done, local maxima can be searched in different scale spaces. For a pixel in the image, it needs to be compared with the 8 neighbors around it and the adjacent 18 (2x9) points in the upper and lower layers in the scale space. If it's a local maximum, it might be a key point. Basically the keypoint is the best representation of the image in the corresponding scale space. As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-YUry630u-1646741526301)(note picture/image-20191009115023016.png)]

The search process starts from the second layer of each group, takes the second layer as the current layer, and takes a 3×3 cube for each point in the DoG image of the second layer, and the upper and lower layers of the cube are the first layer and the third layer . In this way, the searched extreme points have both position coordinates (DoG image coordinates) and spatial scale coordinates (layer coordinates). After the second layer search is completed, the third layer is used as the current layer, and the process is similar to the second layer search. When S=3, there are 3 layers to be searched in each group, so there are S+2 layers in the DOG, and each group has S+3 layers in the pyramid constructed at the beginning.

key point positioning

Since DoG is sensitive to noise and edges, the local extremum points detected in the above Gaussian difference pyramid need further inspection before they can be accurately positioned as feature points.

The exact position of the extremum is obtained by using the Taylor series expansion of the scale space. If the gray value of the extremum point is less than the threshold (usually 0.03 or 0.04), it will be ignored. This threshold is called contrastThreshold in OpenCV.

The DoG algorithm is very sensitive to the boundary, so we have to remove the boundary. In addition to being used for corner detection, the Harris algorithm can also be used for detecting boundaries. From the algorithm of Harris corner detection, when one eigenvalue is much larger than another eigenvalue, it is detected as a boundary. The key points that are not good in the DoG algorithm have a larger main curvature in the direction parallel to the edge, and a smaller curvature in the direction perpendicular to the edge. If the ratio of the two is higher than a certain threshold (called in OpenCV Boundary threshold), the key point is considered to be a boundary and will be ignored. Generally, the threshold is set to 10.

The key points of low contrast and boundaries are removed, and the key points we are interested in are obtained.

Determine the direction of key points

After the above two steps, the key points of the image are completely found, and these key points are scale invariant. In order to achieve rotation invariance, it is also necessary to assign a direction angle to each key point, that is, to obtain a direction reference based on the neighborhood structure of the Gaussian scale image where the detected key point is located.

For any key point, we collect the gradient features (magnitude and argument) of all pixels in the Gaussian pyramid image where the radius is r, and the radius r is: r = 3 × 1.5 σ r = 3\ times1.5 \sigmar=3×1 . 5 p

Where σ is the scale of the octave image where the key point is located, and the corresponding scale image can be obtained.

The formulas for calculating the magnitude and direction of the gradient are:

m ( x , y ) = ( L ( x + 1 , y ) − L ( x − 1 , y ) 2 + ( L ( x , y + 1 ) − L ( x , y − 1 ) ) 2 θ ( x , y ) = arctan ⁡ ( L ( x , y + 1 ) − L ( x , y − 1 ) L ( x + 1 , y ) − L ( x − 1 ) , y ) \begin{array}{c} m(x, y)=\sqrt{\left(L(x+1, y)-L(x-1, y)^{2}+(L(x, y+1)-L(x, y-1))^{2}\right.} \\ \theta(x, y)=\arctan \left(\frac{L(x, y+1)-L(x, y-1)}{L(x+1, y)-L(x-1), y}\right) \end{array} m(x,y)=(L(x+1,y)L(x1,y)2+(L(x,y+1)L(x,y1))2 θ ( x ,y)=arctan(L(x+1,y)L(x1),yL(x,y+1)L(x,y1))

The calculation result of the neighborhood pixel gradient is shown in the figure below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-YXDeXyta-1646741526302)(note picture/image-20191009143818527.png)]

完成关键点梯度计算后,使用直方图统计关键点邻域内像素的梯度幅值和方向。具体做法是,将360°分为36柱,每10°为一柱,然后在以r为半径的区域内,将梯度方向在某一个柱内的像素找出来,然后将他们的幅值相加在一起作为柱的高度。因为在r为半径的区域内像素的梯度幅值对中心像素的贡献是不同的,因此还需要对幅值进行加权处理,采用高斯加权,方差为1.5σ。如下图所示,为简化图中只画了8个方向的直方图。

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-aOMVF0Lf-1646741526303)(note picture/image-20191009144726492.png)]

每个特征点必须分配一个主方向,还需要一个或多个辅方向,增加辅方向的目的是为了增强图像匹配的鲁棒性。辅方向的定义是,当一个柱体的高度大于主方向柱体高度的80%时,则该柱体所代表的的方向就是给特征点的辅方向。

直方图的峰值,即最高的柱代表的方向是特征点邻域范围内图像梯度的主方向,但该柱体代表的角度是一个范围,所以我们还要对离散的直方图进行插值拟合,以得到更精确的方向角度值。利用抛物线对离散的直方图进行拟合,如下图所示:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-tvI16CIU-1646741526304)(note picture/image-20191009150008701.png)]

获得图像关键点主方向后,每个关键点有三个信息(x,y,σ,θ):位置、尺度、方向。由此我们可以确定一个SIFT特征区域。通常使用一个带箭头的圆或直接使用箭头表示SIFT区域的三个值:中心表示特征点位置,半径表示关键点尺度,箭头表示方向。如下图所示:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-SSSjCByR-1646741526304)(note picture/image-20191025112522974.png)]

关键点描述

通过以上步骤,每个关键点就被分配了位置,尺度和方向信息。接下来我们为每个关键点建立一个描述符,该描述符既具有可区分性,又具有对某些变量的不变性,如光照,视角等。而且描述符不仅仅包含关键点,也包括关键点周围对其有贡献的的像素点。主要思路就是通过将关键点周围图像区域分块,计算块内的梯度直方图,生成具有特征向量,对图像信息进行抽象。

描述符与特征点所在的尺度有关,所以我们在关键点所在的高斯尺度图像上生成对应的描述符。以特征点为中心,将其附近邻域划分为d∗d个子区域(一般取d=4),每个子区域都是一个正方形,边长为3σ,考虑到实际计算时,需进行三次线性插值,所以特征点邻域的为 3 σ ( d + 1 ) ∗ 3 σ ( d + 1 ) 3\sigma(d+1)*3\sigma(d+1) 3σ(d+1)3σ(d+1)的范围,如下图所示:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-UeUD2iKa-1646741526305)(note picture/image-20191009161647267.png)]

为了保证特征点的旋转不变性,以特征点为中心,将坐标轴旋转为关键点的主方向,如下图所示:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-6UiLjK3u-1646741526305)(note picture/image-20191009161756423.png)]

计算子区域内的像素的梯度,并按照σ=0.5d进行高斯加权,然后插值计算得到每个种子点的八个方向的梯度,插值方法如下图所示:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-41V1IvNb-1646741526305)(note picture/image-20191009162914982.png)]

每个种子点的梯度都是由覆盖其的4个子区域插值而得的。如图中的红色点,落在第0行和第1行之间,对这两行都有贡献。对第0行第3列种子点的贡献因子为dr,对第1行第3列的贡献因子为1-dr,同理,对邻近两列的贡献因子为dc和1-dc,对邻近两个方向的贡献因子为do和1-do。则最终累加在每个方向上的梯度大小为:  weight  = w ∗ d r k ( 1 − d r ) ( 1 − k ) d c m ( 1 − d c ) 1 − m d o n ( 1 − d o ) 1 − n \text { weight }=w * d r^{k}(1-d r)^{(1-k)} d c^{m}(1-d c)^{1-m} d o^{n}(1-d o)^{1-n}  weight =wdrk(1dr)(1k)dcm(1dc)1 m don(1do)1n

Where k, m, n are 0 or 1. The above statistics of 4∗4∗8=128 gradient information are the feature vectors of the key points, and the feature vectors of each key point are sorted according to the feature points to obtain the SIFT feature description vector.

Summary: SIFT has unparalleled advantages in image invariant feature extraction, but it is not perfect. There are still defects such as low real-time performance, sometimes fewer feature points, and inability to accurately extract feature points for smooth-edged targets. Since SIFT algorithm Since its inception, people have been optimizing and improving it, the most famous of which is the SURF algorithm.

Principle of SURF

The execution speed of keypoint detection and description using SIFT algorithm is relatively slow, and a faster algorithm is required. In 2006, Bay proposed the SURF algorithm, which is an enhanced version of the SIFT algorithm. It has a small amount of calculation and a fast operation speed. The extracted features are almost the same as SIFT. The comparison between it and the SIFT algorithm is as follows:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-RRi3cHRM-1646741526306)(note picture/image-20191016163330835.png)]

The process of using SIFT to detect key points in OpenCV is as follows:

Instantiate sift

sift = cv.xfeatures2d.SIFT_create()

Use sift.detectAndCompute() to detect key points and calculate

kp,des = sift.detectAndCompute(gray,None)

parameter:

  • gray: image for key point detection, note that it is a grayscale image

return:

  • kp: key point information, including position, scale, direction information
  • des: key point descriptor, each key point corresponds to 128 feature vectors of gradient information

Plot the keypoint detection results on the image

cv.drawKeypoints(image, keypoints, outputimage, color, flags)

parameter:

  • image: original image
  • keypoints:关键点信息,将其绘制在图像上
  • outputimage:输出图片,可以是原始图像
  • color:颜色设置,通过修改(b,g,r)的值,更改画笔的颜色,b=蓝色,g=绿色,r=红色。
  • flags:绘图功能的标识设置
    1. cv2.DRAW_MATCHES_FLAGS_DEFAULT:创建输出图像矩阵,使用现存的输出图像绘制匹配对和特征点,对每一个关键点只绘制中间点
    2. cv2.DRAW_MATCHES_FLAGS_DRAW_OVER_OUTIMG:不创建输出图像矩阵,而是在输出图像上绘制匹配对
    3. cv2.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS:对每一个特征点绘制带大小和方向的关键点图形
    4. cv2.DRAW_MATCHES_FLAGS_NOT_DRAW_SINGLE_POINTS:单点的特征点不被绘制

SURF算法的应用与上述流程是一致,这里就不在赘述。

利用SIFT算法在中央电视台的图片上检测关键点,并将其绘制出来:

import cv2 as cv 
import numpy as np
import matplotlib.pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2 sift关键点检测
# 2.1 实例化sift对象
sift = cv.xfeatures2d.SIFT_create()

# 2.2 关键点检测:kp关键点信息包括方向,尺度,位置信息,des是关键点的描述符
kp,des=sift.detectAndCompute(gray,None)
# 2.3 在图像上绘制关键点的检测结果
cv.drawKeypoints(img,kp,img,flags=cv.DRAW_MATCHES_FLAGS_DRAW_RICH_KEYPOINTS)
# 3 图像显示
plt.figure(figsize=(8,6),dpi=100)
plt.imshow(img[:,:,::-1]),plt.title('sift检测')
plt.xticks([]), plt.yticks([])
plt.show()

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-yJhRF91M-1646741526306)(note picture/image-20191009181525538.png)]

小结

SIFT原理:

  • 尺度空间极值检测:构建高斯金字塔,高斯差分金字塔,检测极值点。
  • 关键点定位:去除对比度较小和边缘对极值点的影响。
  • 关键点方向确定:利用梯度直方图确定关键点的方向。
  • 关键点描述:对关键点周围图像区域分块,计算块内的梯度直方图,生成具有特征向量,对关键点信息进行描述。

API:cv.xfeatures2d.SIFT_create()

SURF算法:

对SIFT算法的改进,在尺度空间极值检测,关键点方向确定,关键点描述方面都有改进,提高效率

Fast和ORB算法

学习目标

  • 理解Fast算法角点检测的原理,能够完成角点检测
  • 理解ORB算法的原理,能够完成特征点检测

Fast算法

我们前面已经介绍过几个特征检测器,它们的效果都很好,特别是SIFT和SURF算法,但是从实时处理的角度来看,效率还是太低了。为了解决这个问题,Edward Rosten和Tom Drummond在2006年提出了FAST算法,并在2010年对其进行了修正。

FAST (full name Features from accelerated segment test) is an algorithm for corner detection. The principle of this algorithm is to take the detection point in the image, and judge whether the detection point is a corner point by taking the point as the center of the surrounding neighborhood. In layman's terms, if a certain number of pixels around a pixel are different from the pixel value of the point, it is considered as a corner point .

The basic flow of the FAST algorithm

  1. Select a pixel point p in the image to judge whether it is a key point. Ip is equal to the gray value of pixel point p.
  2. Draw a circle with r as the radius to cover M pixels around point p. Usually, if you set r=3, then M=16, as shown in the figure below:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-TYSKKj1R-1646741526307) (note picture/image17.jpg)]

  1. Set a threshold t, if there are n consecutive pixels whose gray value is higher than Ip + t, or lower than Ip - t among the 16 pixels, then the pixel p is considered as a corner. As shown by the dotted line in the figure above, n generally takes the value of 12.

  2. Since it is necessary to detect all the pixels in the image when detecting feature points, but most of the points in the image are not feature points, if the above-mentioned detection process is performed on each pixel point, it will obviously waste a lot of time , so a method of non-feature point discrimination is adopted : first, test each 90-degree point around the candidate point: 1, 9, 5, 13 (test 1 and 19 first, and then test 5 if they meet the threshold requirements and 13). If p is a corner point, then at least 3 of these four points must meet the threshold requirements, otherwise they will be eliminated directly. Continue to test the remaining points (whether 12 points meet the threshold requirement).

Although this detector is very efficient, it has several disadvantages:

  • More candidate points are obtained
  • The selection of feature points is not optimal, because its effect depends on the problem to be solved and the distribution of corner points.
  • A large number of points are discarded when performing non-feature point discrimination
  • Many of the detected feature points are adjacent

The first 3 problems can be solved by machine learning methods, and the last problem can be solved by non-maximum suppression.

Corner detector for machine learning

  1. Choose a set of training images (preferably images related to the final application)
  2. Use the FAST algorithm to find out the feature points of each image, and store the 16 pixels around it to form a vector P for each feature point in the image.

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-2wDQSEL8-1646741526308)(note picture/image-20191010114459269.png)]

  1. The 16 pixels of each feature point belong to one of the following three categories

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-yTXjYytT-1646741526308) (note picture/image18.jpg)]

  1. According to the classification of these pixels, the feature vector P is also divided into three subsets: Pd , Ps , Pb,

  2. Define a new Boolean variable Kp that is set to True if p is a corner point, and to False if it is not.

  3. Using the eigenvalue vector p, the target value is K p K_pKp, to train an ID3 tree (decision tree classifier).

  4. Apply the built decision tree to other images for fast detection.

non-maximum suppression

Many of the screened candidate corners are close together, and this effect needs to be eliminated by non-maximum suppression.

Determine a scoring function V for all candidate corner points. The value of V can be calculated as follows: first calculate the pixel value difference between Ip and 16 points on the circle, take the absolute value, and then add the 16 absolute values, Then get the value of V V = ∑ i 16 ∣ I p − I i ∣ V=\sum_{i}^{16}\left|I_{p}-I_{i}\right|V=i16IpIi

Finally, compare the V values ​​of adjacent candidate corner points, and pass the candidate corner points with smaller V values.

The idea of ​​the FAST algorithm is very close to our intuitive understanding of corner points, simplifying the complexity. The FAST algorithm is faster than other corner point detection algorithms, but it is not stable enough when the noise is high, which requires setting an appropriate threshold.

The FAST detection algorithm in OpenCV is implemented in a traditional way,

Instantiate fast

fast = =cv.FastFeatureDetector_create( threshold, nonmaxSuppression)

parameter:

  • threshold: threshold t, with a default value of 10
  • nonmaxSuppression: Whether to perform non-maximum suppression, the default value is True

return:

  • Fast: the created FastFeatureDetector object

Using fast.detect to detect key points, there is no corresponding key point description

kp = fast.detect(grayImg, None)

parameter:

  • gray: image for key point detection, note that it is a grayscale image

return:

  • kp: key point information, including position, scale, direction information

Draw the key point detection results on the image, the same as in sift

cv.drawKeypoints(image, keypoints, outputimage, color, flags)

Example:

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
# 1 读取图像
img = cv.imread('./image/tv.jpg')
# 2 Fast角点检测
# 2.1 创建一个Fast对象,传入阈值,注意:可以处理彩色空间图像
fast = cv.FastFeatureDetector_create(threshold=30)

# 2.2 检测图像上的关键点
kp = fast.detect(img,None)
# 2.3 在图像上绘制关键点
img2 = cv.drawKeypoints(img, kp, None, color=(0,0,255))

# 2.4 输出默认参数
print( "Threshold: {}".format(fast.getThreshold()) )
print( "nonmaxSuppression:{}".format(fast.getNonmaxSuppression()) )
print( "neighborhood: {}".format(fast.getType()) )
print( "Total Keypoints with nonmaxSuppression: {}".format(len(kp)) )


# 2.5 关闭非极大值抑制
fast.setNonmaxSuppression(0)
kp = fast.detect(img,None)

print( "Total Keypoints without nonmaxSuppression: {}".format(len(kp)) )
# 2.6 绘制为进行非极大值抑制的结果
img3 = cv.drawKeypoints(img, kp, None, color=(0,0,255))

# 3 绘制图像
fig,axes=plt.subplots(nrows=1,ncols=2,figsize=(10,8),dpi=100)
axes[0].imshow(img2[:,:,::-1])
axes[0].set_title("加入非极大值抑制")
axes[1].imshow(img3[:,:,::-1])
axes[1].set_title("未加入非极大值抑制")
plt.show()

result:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-VI81ALuY-1646741526309)(note picture/image-20191010120822413.png)]

ORB algorithm

SIFT and SURF algorithms are protected by patents, we have to pay when using them, but ORB (Oriented Fast and Rotated Brief) does not need it, it can be used to quickly create feature vectors for key points in the image, and use these features vector to identify objects in an image.

ORB algorithm process

The ORB algorithm combines the Fast and Brief algorithms, and proposes to construct a pyramid, adding directions to the Fast feature points, so that the key points have scale invariance and rotation invariance. The specific process is described as follows:

  • Construct a scale pyramid. The pyramid has n layers. Unlike SIFT, each layer has only one image. The scale of the sth layer is: σ s = σ 0 s \sigma_{s}=\sigma_{0}^{s}ps=p0s

σ 0 is the initial scale, the default is 1.2, and the original image is at layer 0.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-WESFnY8x-1646741526309)(note picture/image-20191010145652681.png)]

The size of the sth layer image: SIZE = ( H ∗ 1 σ s ) × ( W ∗ 1 σ s ) SIZE=\left(H * \frac{1}{\sigma_{s}}\right) \times\left (W * \frac{1}{\sigma_{s}}\right)SIZE=(Hps1)×(Wps1)

  • Use the Fast algorithm to detect feature points on different scales, use the Harris corner response function, sort according to the response value of the corner points, and select the first N feature points as the feature points of this scale.
  • Calculate the main direction of the feature point, calculate the gray centroid position in the circular neighborhood with the feature point as the center radius r, and use the direction from the feature point position to the centroid position as the main direction of the feature point.

The calculation method is as follows:

m p q = ∑ x , y x p y q I ( x , y ) m_{p q}=\sum_{x, y} x^{p} y^{q} I(x, y) mpq=x,yxpyq I(x,y)

Centroid location:

C = ( m 10 m 00 , m 01 m 10 ) C=\left(\frac{m_{10}}{m_{00}}, \frac{m_{01}}{m_{10}}\right) C=(m00m10,m10m01)

Main direction:

θ = arctan ⁡ ( m 01 , m 10 ) \theta=\arctan \left(m_{01}, m_{10}\right) i=arctan(m01,m10)

  • In order to solve the invariance of rotation, the neighborhood of the feature points is rotated to the main direction and the feature descriptor is constructed using the Brief algorithm, so far the feature description vector of the ORB is obtained.

BRIEF algorithm

BRIEF is a feature descriptor extraction algorithm, not a feature point extraction algorithm. It is an algorithm for generating binary descriptors. It is not expensive to extract, and the matching only needs to use a simple Hamming distance (Hamming Distance). The XOR operation can be completed. Therefore, the time cost is low, the space cost is low, and the effect is quite good is the biggest advantage.

The steps of the algorithm are described as follows :

  1. Image filtering : When there is noise in the original image, it will affect the result, so the image needs to be filtered to remove part of the noise.
  2. Select point pairs : take the feature point as the center, take the S*S neighborhood window, randomly select N groups of point pairs in the window, generally N=128,256,512, and the default is 256. Regarding how to select random point pairs, five forms are provided , the result is shown in the figure below:
    • Evenly distributed sampling in the x and y directions
    • Both x and y obey Gauss(0,S^2/25) isotropic sampling
    • x obeys Gauss(0,S 2/25), y obeys Gauss(0,S 2/100) sampling
    • x, y are randomly obtained from the grid
    • x is always at (0,0), y is randomly picked from the grid

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-FLtmsLBb-1646741526309)(note picture/image-20191010153907973.png)]

The two endpoints of a line segment in the figure are a set of point pairs, and the result of the second method is better.

  1. Construction of descriptors : Suppose x, y are the two endpoints of a point pair, p(x), p(y) are the pixel values ​​corresponding to the two points, then:

t ( x , y ) = { 1  ifp  ( x ) > p ( y ) 0  else  t(x, y)=\left\{\begin{array}{ll} 1 & \text { ifp }(x)>p(y) \\ 0 & \text { else } \end{array}\right. t(x,y)={ 10 ifp (x)>p ( and ) else 

The above-mentioned binary assignment is performed for each point pair to form the description feature vector of the key point of the BRIEF, which is generally a 128-512-bit string containing only 1 and 0, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-JzWKr10y-1646741526310)(note picture/image-20191010161944491.png)]

To implement the ORB algorithm in OPenCV, use:

Instantiate the ORB

orb = cv.xfeatures2d.orb_create(nfeatures)

parameter:

  • nfeatures: the maximum number of feature points

Use orb.detectAndCompute() to detect key points and calculate

kp,des = orb.detectAndCompute(gray,None)

parameter:

  • gray: image for key point detection, note that it is a grayscale image

return:

  • kp: key point information, including position, scale, direction information
  • des: key point descriptor, each key point BRIEF feature vector, binary string,

Plot the keypoint detection results on the image

cv.drawKeypoints(image, keypoints, outputimage, color, flags)

Example:

import numpy as np
import cv2 as cv
from matplotlib import pyplot as plt
# 1 图像读取
img = cv.imread('./image/tv.jpg')

# 2 ORB角点检测
# 2.1 实例化ORB对象
orb = cv.ORB_create(nfeatures=500)
# 2.2 检测关键点,并计算特征描述符
kp,des = orb.detectAndCompute(img,None)

print(des.shape)

# 3 将关键点绘制在图像上
img2 = cv.drawKeypoints(img, kp, None, color=(0,0,255), flags=0)

# 4. 绘制图像
plt.figure(figsize=(10,8),dpi=100)
plt.imshow(img2[:,:,::-1])
plt.xticks([]), plt.yticks([])
plt.show()

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-qMz0QrHm-1646741526310)(note picture/image-20191010162532196.png)]

summary

  1. Fast Algorithm

    Principle: If a certain number of pixels around a pixel are different from the pixel value of the point, it is considered a corner point

    API: cv.FastFeatureDetector_create()

  2. ORB algorithm

    Principle: It is a combination of FAST algorithm and BRIEF algorithm

    API:cv.ORB_create()

LBP and HOG feature operators

Learning Objectives :

  1. Understand the principle of LBP features
  2. Learn about improved algorithms for LBP: circular LBP, rotated LBP and equivalence modes
  3. Understand the principle of HOG algorithm
  4. Familiar with the gamma transform of grayscale images
  5. Understand the extraction process of HOG features
  6. Understand the extraction method of LBP features
  7. Understand the extraction method of HOG features

LBP algorithm

LBP (Local Binary Pattern) refers to the local binary pattern, which is an operator used to describe the local features of the image. The LBP feature has significant advantages such as grayscale invariance and rotation invariance. It was proposed by T. Ojala, M.Pietikäinen, and D. Harwood in 1994. Due to the simple calculation of LBP features and better results, LBP features have been widely used in many fields of computer vision.

Characterization of LBP

The original LBP operator is defined as 3 ∗ 3 3*33In the window of 3 , the center pixel of the window is used as the threshold, and the gray value of the adjacent 8 pixels is compared with it. If the surrounding pixel value is greater than the central pixel value, the position of the pixel is marked as 1, otherwise it is 0 . Thus,3 ∗ 3 3*333 The 8 points in the neighborhood can be compared to generate an 8-bit binary number (usually converted to a decimal number, that is, LBP code, a total of 256 types), that is, to obtain the LBP value of the pixel in the center of the window, and use this value to reflect the texture of the area information. As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-JmPkmNhY-1646741526310)(note picture/image-20191107142559806.png)]

The LBP value is the result obtained by rotating clockwise from the upper left pixel, as shown in the figure below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-tAmfA7Wq-1646741526311)(note picture/image-20191107142804781.png)]

The definition is as follows: LBP ( xc , yc ) = ∑ p = 0 p − 1 2 ps ( ip − ic ) LBP\left(x_{c}, y_{c}\right)=\sum_{ p=0}^{p-1} 2^{p} s\left(i_{p}-i_{c}\right)LBP(xc,yc)=p=0p12ps(ipic)

Among them, (xc,yc) means 3 ∗ 3 3*333 The central element in the neighborhood, its pixel value is ic, and ip represents the values ​​of other pixels in the neighborhood. s(x) is a symbolic function, defined as follows:

For an image of size W∗H, because the edge pixels cannot calculate the 8-bit LBP value, when converting the LBP value to a grayscale image, its size is (W-2)*(H-2).

The LBP operator uses the relationship between the surrounding points and the point to quantify the point. After quantization, the influence of light on the image can be eliminated more effectively . As long as the change in illumination is not enough to change the size relationship between the pixel values ​​of two points, the value of the LBP operator will not change. **So to a certain extent, the recognition algorithm based on LBP solves the problem of illumination change.** However, when the image illumination changes unevenly, the size relationship between pixels is destroyed, and the corresponding LBP mode also changes.

After the original LBP was proposed, researchers have continuously proposed various improvements and optimizations.

Circular LBP operator

The original LBP feature uses the gray value in a fixed neighborhood. When the scale of the image changes, the encoding of the LBP feature will change, and the LBP feature will not be able to correctly reflect the texture information around the pixel. Made improvements. The biggest defect of the basic LBP operator is that it only covers a small area within a fixed radius, and is only limited to a 3*3 neighborhood. It cannot extract the required texture features well for large-scale structures of larger images. Therefore, the researchers extended the LBP operator.

New LBP operator LBP p R LBP_{p}^{R}LBPpRCalculate the eigenvalues ​​of different radius neighborhood sizes and different pixel points, where P represents the number of surrounding pixels, R represents the radius of the neighborhood, and at the same time expands the original square neighborhood to a circle. The following figure shows three kinds of expanded An LBP example where R can be a decimal:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-WUNVwjNP-1646741526312)(note picture/image-20191107144658710.png)]

For a point that does not fall to an integer position, its gray value can be calculated by using bilinear interpolation method according to the gray value of the two nearest integer position pixels in the track.

The calculation formula of this operator is the same as that of the original LBP description operator, the difference lies in the selection of the neighborhood.

Rotation invariant LBP features

It can be seen from the definition of LBP that the LBP operator is not rotation invariant. The rotation of the image will get different LBP values. Therefore, Maenpaa et al. extended the LBP operator and proposed an LBP operator with rotation invariance, that is, continuously rotating the circular neighborhood to obtain a series of initially defined LBP values, and taking the minimum value as the LBP of the neighborhood value. Right now:

L B P p , R r i = min ⁡ { R O R ( L B P p R , i ) ∣ i = 0 , 1 , … p − 1 } L B P_{p, R}^{r i}=\min \left\{R O R\left(L B P_{p}^{R}, i\right) \mid i=0,1, \ldots p-1\right\} LBPp,Rri=min{ ROR(LBPpR,i)i=0,1,p1}

Among them, ROR(x,i) refers to rotating the LBP operator i times in the clockwise direction. As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-bAh4LlzH-1646741526312)(note picture/image-20191107150512440.png)]

The above figure shows the schematic diagram of the process of obtaining the rotation-invariant LBP. The number below the operator indicates the LBP value corresponding to the operator. The 8 LBP modes shown in the figure, after the rotation-invariant processing, finally get the The LBP value with rotation invariance is 15. That is to say, the rotation-invariant LBP modes corresponding to the eight LBP modes in the figure are all 00001111.

Uniform Pattern LBP Features

Uniform Pattern, also known as equivalent mode or uniform mode, since an LBP feature has many different binary forms, the LBP operator containing P sampling points in a circular area with a radius of R will generate 2 P 2 ^P2P种模式。很显然,随着邻域集内采样点数的增加,二进制模式的种类是以指数形式增加的。例如:5×5邻域内20个采样点,有2^20=1,048,576种二进制模式。这么多的二进制模式不利于纹理的提取、分类、识别及存取。例如,将LBP算子用于纹理分类或人脸识别时,常采用LBP模式的统计直方图来表达图像的信息,而较多的模式种类将使得数据量过大,且直方图过于稀疏。因此,需要对原始的LBP模式进行降维,使得数据量减少的情况下能最好的表示图像的信息。

为了解决二进制模式过多的问题,提高统计性,Ojala提出了采用一种“等价模式”(Uniform Pattern)来对LBP算子的模式种类进行降维。Ojala等认为,在实际图像中,绝大多数LBP模式最多只包含两次从1到0或从0到1的跳变。因此,Ojala将“等价模式”定义为:当某个LBP所对应的循环二进制数从0到1或从1到0最多有两次跳变时,该LBP所对应的二进制就称为一个等价模式类。如00000000(0次跳变),00000111(只含一次从0到1的跳变),10001111(先由1跳到0,再由0跳到1,共两次跳变)都是等价模式类。除等价模式类以外的模式都归为另一类,称为混合模式类,例如10010111(共四次跳变)。

下图所示的LBP值属于等价模式类:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-SFhAh8He-1646741526313)(note picture/image-20191107151414795.png)]

下图中包含四次跳变,属于非等价模式。

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-Q5FAO3tJ-1646741526313)(note picture/image-20191107151519089.png)]

通过这样的改进,二进制模式的种类大大减少,而不会丢失任何信息。模式数量由原来的 2 P 2^P 2P types are reduced to P ( P-1)+2 types, where P represents the number of sampling points in the neighborhood set. For the 8 sampling points in the 3×3 neighborhood, the binary pattern is reduced from the original 256 to 58, that is, it divides the values ​​into 59 categories, 58 uniform patterns are one category, and all other values ​​​​are the 59th. kind. In this way, the histogram changes from the original 256 dimensions to 59 dimensions. This makes the dimensionality of the feature vector less, and can reduce the influence of high-frequency noise.

The specific implementation of equivalent features: the number of sampling points is 8, that is, there are 2^8 kinds of LBP feature values, a total of 256 values, which exactly correspond to 0-255 of the grayscale image, so the original LBP feature image is a normal Grayscale images, and the LBP features of the equivalent mode, according to the number of 0-1 jumps, these 256 LBP feature values ​​are divided into 59 categories, divided from the number of jumps: jump 0 times - 2, jump 1 time —0, 2 jumps—56, 3 jumps—0, 4 jumps—140, 5 jumps—0, 6 jumps—56, 7 jumps—0 1, jump 8 times - 2. A total of 9 transition situations, these 256 values ​​are allocated, and the transitions less than 2 times are equivalent mode classes, a total of 58, and their corresponding values ​​are coded as 1-58 from small to large, that is, they are in LBP The gray value in the feature image is 1-58 , and the mixed mode class except the equivalent mode class is coded as 0, that is, their gray value in the LBP feature is 0, so the equivalent mode LBP feature image as a whole dark.

accomplish

The calculation of LBP features is implemented in OpenCV, but a separate interface for calculating LBP features is not provided. So we use skimage to demonstrate the algorithm to you.

skimage is Scikit-Image. A digital image processing package developed based on the python scripting language, scikit-image is an image processing package based on scipy, which processes images as numpy arrays. installation method:

pip install scikit-image

The full name of the skimage package is scikit-image SciKit (toolkit for SciPy), which extends scipy.ndimage and provides more image processing functions. It is written in the python language, developed and maintained by the scipy community. The skimage package consists of many submodules, and each submodule provides different functions. The feature module performs feature detection and extraction.

The APIs used are:

skimage.feature.local_binary_pattern(image, P, R, method=‘default’)

parameter:

  • image: input grayscale image
  • P, R: Radius and number of pixels when performing LBP operator calculation
  • method: algorithm type: {'default', 'ror', 'nri-uniform', 'var'}

default: "default", the original LBP feature; ror: circular LBP operator; nri-uniform: equivalent LBP operator; var: rotation invariant LBP operator

Example:

We extract LBP features in the following figure:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-QniurKUH-1646741526314) (note picture/face.jpeg)]

import cv2 as cv
from skimage.feature import local_binary_pattern
import matplotlib.pyplot as plt
# 1.读取图像
img = cv.imread("face.jpeg")
face = cv.cvtColor(img,cv.COLOR_BGR2GRAY)
# 2.特征提取
# 2.0 需要的参数
# LBP算法中范围半径的取值
radius = 1  
# 领域像素点数
n_points = 8 * radius 

# 2.1 原始LBP特征
lbp = local_binary_pattern(face, 8, 1)

# 2.2 圆形LBP特征
clbp = local_binary_pattern(face,n_points,radius,method="ror")

# 2.3 旋转不变LBP特征
varlbp = local_binary_pattern(face,n_points,radius,method="var")

# 2.4 等价特征
uniformlbp = local_binary_pattern(face,n_points,radius,method="nri-uniform")

fig,axes=plt.subplots(nrows=2,ncols=2,figsize=(10,8))
axes[0,0].imshow(lbp,'gray')
axes[0,0].set_title("原始的LBP特征")
axes[0,0].set_xticks([])
axes[0,0].set_yticks([])
axes[0,1].imshow(clbp,'gray')
axes[0,1].set_title("圆形LBP特征")
axes[0,1].set_xticks([])
axes[0,1].set_yticks([])
axes[1,0].imshow(varlbp,'gray')
axes[1,0].set_title("旋转不变LBP特征")
axes[1,0].set_xticks([])
axes[1,0].set_yticks([])
axes[1,1].imshow(uniformlbp,"gray")
axes[1,1].set_title("等价特征")
axes[1,1].set_xticks([])
axes[1,1].set_yticks([])
plt.show()

The test results are as follows:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-BFP8g40F-1646741526314)(note picture/image-20191107161947626.png)]

HOG algorithm

The HOG (short for Histogram of Oriented Gridients) feature detection algorithm was first proposed by French researcher Dalal et al. on CVPR-2005. It is an image descriptor that solves human target detection. and descriptors of gradient strength distribution properties. The main idea is that when the specific position of the edge is unknown, the distribution of the edge direction can also well represent the outline of the pedestrian target.

Feature extraction process

The main idea of ​​HOG is: in an image, the appearance and shape of the local target can be well distributed by the direction density distribution of the gradient or edge (that is, the statistics of the gradient, and the gradient is mainly located at the edge) describe.

Several steps of the HOG feature detection algorithm: color space normalization —> gradient calculation —> gradient direction histogram —> overlapping block histogram normalization —> HOG feature . As shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-jLMxny2I-1646741526314)(note picture/image-20191107164141468.png)]

The overall process is briefly described as follows:

  1. Grayscale the input image (the target you want to detect or the scanning window), that is, convert the color image to a grayscale image
  2. Color space normalization: Gamma correction method is used to standardize (normalize) the color space of the input image, the purpose is to adjust the contrast of the image, reduce the influence of local shadows and illumination changes in the image, and at the same time suppress the interference of noise
  3. Gradient calculation: Calculate the gradient (including size and direction) of each pixel of the image; mainly to capture contour information and further weaken the interference of light
  4. Gradient direction histogram: Divide the image into small cells (such as 8*8 pixels/cell), and count the gradient histogram of each cell (the number of different gradients) to form the descriptor of each cell
  5. Overlapped histogram normalization: Every few cells form a block (for example, 3*3 cells/block), and the feature descriptors of all cells in a block are concatenated to obtain the HOG feature descriptor of the block.
  6. HOG feature: concatenate the HOG feature descriptors of all blocks in the image image to get the HOG feature descriptor of the image (the target you want to detect), and then get the final feature vector for classification

Below we introduce the content of each step in detail:

Color Space Normalization

In order to reduce the influence of lighting factors, it is first necessary to normalize (normalize) the entire image. In the texture intensity of the image, the local surface exposure contributes a large proportion, so this compression process can effectively reduce the local shadow and illumination changes of the image. Because the color information has little effect, it is usually converted into a grayscale image first, and then gamma correction is performed.

Gamma correction can effectively reduce the impact of local shadows and lighting in the image, thereby reducing the sensitivity of the algorithm to lighting and enhancing the robustness of the algorithm.

Gamma correction is obtained using the following formula: Y ( x , y ) = I ( x , y ) γ Y(x, y)=I(x, y)^{\gamma}Y(x,y)=I(x,y)c

Among them, I(x, y) is the gray value of the image at the pixel point (x, y) before gamma correction, and Y(x, y) is the gray value at the normalized pixel point (x, y). Gamma correction is shown in the figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-hMxUAan6-1646741526315)(note picture/image-20191107171015588.png)]

As can be seen from the figure above:

  1. When γ<1, as shown by the dotted line in the above figure, in the low gray value area, the dynamic range becomes larger (when x is [0,0.2], the range of y is [0,0.5]), and the image Contrast enhancement; in the high gray value area, the dynamic range becomes smaller (when x is in [0.8,1], the range of y is [0.9,1]), the contrast of the image is reduced; at the same time, the overall gray value of the image get bigger.
  2. When γ>1, as shown by the solid line in the above figure, in the low gray value area, the dynamic range becomes smaller (when x is in [0,0.5], the range of y is [0,0.2]), and then the image The contrast of the image is reduced; in the high gray value area, the dynamic range becomes larger, and the contrast of the image is enhanced; at the same time, the overall gray value of the image becomes smaller.

The left image in the figure below is the original image, the middle image is the correction result of γ=1/2.2, and the right image is the correction result of γ=2.2.

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-vR5ir9be-1646741526316)(note picture/image-20191107171040083.png)]

In HOG feature extraction, γ is generally set to 0.5. At this time, the gray value of the image is stretched, and the lower the gray value, the greater the stretching range. That is to say, it is better for image processing with darker light. Can greatly enhance their brightness.

Image gradient calculation

Edges are caused by abrupt changes in image local features including grayscale, color, and texture. In an image, the changes between adjacent pixels are relatively small, and the regional changes are relatively flat, so the gradient amplitude will be relatively small, and vice versa, the gradient amplitude will be relatively large. The gradient corresponds to its first-order derivative in the image, so the image gradient calculation uses the first-order differential derivation processing, which can not only capture the outline, human shadow and some texture information, but also further weaken the influence of illumination. Dalal has studied many operators, as shown in the following table:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-ep0iEs92-1646741526316)(note picture/image-20191115084424182.png)]

It finally shows that [-1,0,1] and $[1,0,-1]^T $ work best, so there is a gradient:

G x ( x , y ) = I ( x + 1 , y ) − I ( x − 1 , y ) G y ( x , y ) = I ( x , y + 1 ) − I ( x , y − 1 ) \begin{aligned} G_{x}(x, y) &=I(x+1, y)-I(x-1, y) \\ G_{y}(x, y) &=I(x, y+1)-I(x, y-1) \end{aligned} Gx(x,y)Gy(x,y)=I(x+1,y)I(x1,y)=I(x,y+1)I(x,y1)

In the formula, Gx, Gy, I(x, y) respectively represent the gradient of the pixel point (x, y) in the horizontal direction and vertical direction and the gray value of the pixel, and the calculation formula for the magnitude and direction of the gradient as follows:

G ( x , y ) = G x 2 + G y 2 α ( x , y ) = tan ⁡ − 1 ( G and G x ) \begin{array}{c} G(x, y)=\sqrt{G_ {x}^{2}+G_{y}^{2}}\\alpha(x,y)=\tan^{-1}\left(\frac{G_{y}}{G_{x} }\right)\end{array}G(x,y)=Gx2+Gy2 a ( x ,y)=tan1(GxGy)

Using this operator to calculate the gradient is not only effective, but also low in computation.

Gradient histogram calculation

Dalal's research results show that the best detection effect is obtained when the gradient direction is unsigned and the number of channels is 9. At this time, one channel in one gradient direction is 180/9=20°, which represents the angle 0, 20, 40, 60…160. In the gradient direction matrix, you can see that the angle is 0-180 degrees, not 0-360 degrees. This is called "unsigned" gradients ("unsigned" gradients) because a gradient and its negative are represented by the same number That is to say, the direction of a gradient and its direction after rotating 180 degrees are considered to be the same, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-TgJEFDNX-1646741526316)(note picture/image-20191107174042837.png)]

Suppose the image is divided into multiple cells, as follows:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-uvunXIi2-1646741526316)(note picture/image-20191107175236497.png)]

Each cell unit contains 8*8 pixels, as shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-e36t0yjt-1646741526317)(note picture/image-20191107175558831.png)]

We project the gradient direction obtained in the previous step into 9 channels, and use the gradient magnitude as the weight of the projection.

When performing gradient direction projection processing, a weighted method is used to determine the weight of a channel, as shown below:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-CImAEI2Z-1646741526317)(note picture/image-20191107180300619.png)]

Another detail is that if the angle is greater than (160, 180), the angle 0 and 180 are equal, so when the angle is 165, it is projected into two channels of 0 and 160 according to the amplitude ratio:( 180-165)/(165-160)

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-oAu73wkW-1646741526318)(note picture/image-20191107180534928.png)]

By traversing all the pixels in the entire cell, you can get the gradient direction histogram of the cell unit:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-1Qh53FGu-1646741526318)(note picture/image-20191107180617264.png)]

Overlapping block histogram normalization

Due to the diversification of the local exposure of the image and the contrast between the foreground and the background, the range of the gradient value is very wide. The introduction of a normalized histogram plays a very important role in improving the detection results.

In the previous step, we created a gradient direction histogram in each cell unit. In this step, we will normalize the gradient histogram in the block block. Each block is composed of 2*2=4 cell units. Composition, as shown in the figure below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-4bVGpYBP-1646741526318) (note picture/hog-3180128.gif)]

The gradient histogram should be 4*9=36 dimensional in each block.

Before explaining how the histogram is normalized, let us see how a vector of length 3 is normalized: Assuming a pixel vector [128,64,32], the length of the vector is: sqrt{128 ^2 + 64^2 + 32^2} = 146.64 This is also known as the L2 norm of the vector. Divide each element of the vector by 146.64 to get the normalized vector [0.87, 0.43, 0.22].

We concatenate the gradient histograms in a block into a 36*1-dimensional vector and normalize to get the features in the block, because there is overlap between the blocks, that is to say, each The features in the cell unit will appear multiple times in different block blocks.

Collect HOG features

In the previous step, we got the normalized gradient direction histogram of a block block. Now we only need to traverse all the blocks in the detection image to get the gradient direction histogram of the whole image. This is the HOG feature vector we want to solve .

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-HZsCz0UD-1646741526318)(note picture/image-20191108110820778.png)]

As shown in the figure above, block blocks can overlap with each other. Assume that the size of our detection image is (64×128), and there are (64−8×2)/8=7 block blocks in the x direction. Among them, 64 is the width of the detected image, the first 8 is the cell width, 2 is the cell unit width in a block block, and the second 8 is the sliding increment of the block block. Similarly, the y direction has (128−8× 2)/8+1=15 blocks, where 128 is the height of the detected image, the first 8 is the height of the cell, 2 is the height of the cell unit in a block, and the second 8 is the sliding increment of the block , so there are a total of 7×15=105 blocks, and the dimension of the gradient histogram in each block is 36, then the dimension of the HOG feature vector with the detection image (64×128) is 105×36=3780. Display it on the image as shown below:

[External link picture transfer failed, the source site may have an anti-theft link mechanism, it is recommended to save the picture and upload it directly (img-hJKq5iYN-1646741526319)(note picture/image-20191108111824266.png)]

From the above figure, we can find that the main direction of the histogram captures the human shape, especially the torso and legs. After we get the normalized HOG features, we can use the classifier to detect pedestrians, for example, use the support vector machine SVM to classify people and background, as shown in the following figure:

[External link picture transfer failed, the source site may have an anti-leeching mechanism, it is recommended to save the picture and upload it directly (img-oLcae4uO-1646741526319)(note picture/image-20191115090149728.png)]

Advantages and disadvantages of HOG features

HOG features have the following advantages:

  • HOG represents the structural features of the edge, so it can describe the local shape information
  • The quantification of position and orientation can always be affected by translation and rotation to a certain extent
  • Taking local area normalized histogram can partially offset the influence of illumination transformation

It also has many disadvantages:

  • Descriptor generation is lengthy and high-dimensional, resulting in slow speed and poor real-time performance
  • Hard to deal with occlusion
  • Very sensitive to noise due to the nature of the gradient

accomplish

OpenCV provides an API for calculating HOG features. The process of implementing HOG feature extraction is:

  1. Instantiate the HOG feature extraction operator, the API used is:
hog = cv2.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)

parameter:

  • winSize: The size of the detection window
  • blockSize: the size of the block block
  • blockStride: the sliding step of the block block
  • cellSize: the size of the cell unit
  • Nbins: The number of directions of the statistical gradient, generally taken as 9, that is, the gradient histogram of 9 directions is calculated in a cell unit

return:

  • hog: Hog feature detection object after instantiation
  • Search the entire image, calculate the HOG feature of the image, call:
hogDes = hog.compute(img, winStride, padding)

parameter:

  • img: input image
  • winStrise: The sliding step of the detection window
  • padding: Filling, filling points around the image to process the boundary.

return:

  • hogDes: The HOG feature descriptor of the entire image. When the padding is the default (0,0), the dimension of the feature vector: [(img_size - window_size) / window_stride +1 )]*(features in each detection window dimension).

Example:

import cv2 as cv 
import numpy as np
import matplotlib.pyplot as plt

# 1.读取图像
img = cv.imread('xingren.jpeg')
gray= cv.cvtColor(img,cv.COLOR_BGR2GRAY)

# 2.Hog特征提取
# 2.1 参数设置
winSize = (64,128)
blockSize = (16,16)
blockStride = (8,8)
cellSize = (8,8)
nbins = 9

# 2.2 实例化hog对象
hog = cv.HOGDescriptor(winSize,blockSize,blockStride,cellSize,nbins)

# 2.3 计算Hog特征描述符
hogDes = hog.compute(img,winStride=(8,8))

# 2.4 输出描述符的大小
print(hogDes.size)

The output result is: 578340, the size of the graph is ( 128 ∗ 256 ) (128*256)(1282 5 6 ) , the window size is (64∗128), the block size is (16,16), the block moving step is (8,8), and the cell unit size is (8,8), each window When the feature dimension of is 3780 and the window moving step is (8,8), then the dimension of the feature of the image is:

( 128 − 64 8 + 1 ) ∗ ( 256 − 128 8 + 1 ) ∗ 3780 = 578340 \left(\frac{128-64}{8}+1\right) *\left(\frac{256-128}{8}+1\right) * 3780=578340 (812864+1)(8256128+1)3780=578340

summary

  1. LBP algorithm:

    Original LBP features : In the 3∗3 window, the center pixel of the window is used as the threshold, and the gray value of the adjacent 8 pixels is compared with it. If the surrounding pixel value is greater than the center pixel value, the position of the pixel point is Flag is 1, otherwise 0. In this way, the 8 points in the 3*3 neighborhood can be compared to generate an 8-bit binary number, that is, the LBP value.

    Circular LBP operator : Calculate the eigenvalues ​​of different radius neighborhood sizes and different pixel points

    Rotation-invariant LBP operator : Continuously rotate the circular neighborhood to obtain a series of initially defined LBP values, and take the minimum value as the LBP value of the neighborhood

    Uniform Pattern LBP feature : When the cyclic binary number corresponding to an LBP has at most two transitions from 0 to 1 or from 1 to 0, the binary corresponding to the LBP is called an equivalent pattern class. Patterns other than the equivalence pattern class are grouped into another class, called the mixed-mode class.

    API:

    Skiimage.feature.Local_binary_pattern()

  2. HOG algorithm

    Idea: In an image, the appearance and shape of a local target can be described by the gradient or the direction density distribution of the edge.

    The steps of the HOG feature detection algorithm:

    Color space normalization—>Gradient calculation—>Gradient direction histogram—>Overlapping block histogram normalization—>HOG feature

    A brief description is as follows:

    1) Grayscale the input image, that is, convert the color image into a grayscale image

    2) Color space normalization: The Gamma correction method is used to standardize (normalize) the color space of the input image. The purpose is to adjust the contrast of the image, reduce the influence of local shadows and illumination changes in the image, and suppress noise at the same time. interference

    3) Gradient calculation: Calculate the gradient (including size and direction) of each pixel of the image; mainly to capture contour information and further weaken the interference of light

    4) Gradient direction histogram: Divide the image into small cells (for example, 6*6 pixels/cell), and count the gradient histogram (number of different gradients) of each cell to form the descriptor of each cell

    5) Normalization of overlapping histograms: Every few cells form a block (for example, 3*3 cells/block), and the feature descriptors of all cells in a block are concatenated to obtain the HOG feature descriptor of the block.

    6) HOG feature: The HOG feature descriptor of the image can be obtained by concatenating the HOG feature descriptors of all blocks in the image image, and the final feature vector that can be used for classification is obtained.

  3. API:

    1) Instantiate the HOG object:

    hog = cv.HOGDescriptor()

    2) Calculate the HOG feature descriptor

    hogdes = hog.Compute()

Guess you like

Origin blog.csdn.net/qq_43966129/article/details/123362032