Computer Image Processing—HOG Feature Extraction Algorithm

1. Experiment introduction

1. Experimental content

This experiment will learn the HOG feature extraction algorithm.

2. Experimental points

  • HOG algorithm
  • Reasons why the HOG algorithm works
  • Create HOG descriptor
  • Number of elements in the HOG descriptor
  • Visualizing HOG Descriptors
  • Understanding Histograms

3. Experimental environment

  • Python 3.6.6
  • numpy
  • matplotlib
  • cv2
  • copy

2. Experimental steps

Introduction

As seen in the ORB algorithm, we can use keypoints in the image for matching to detect objects in the image. These types of algorithms are useful when wanting to detect objects that have many consistent internal properties and are unaffected by the background. For example, these algorithms can achieve good results in face detection because faces have many consistent internal features such as eyes, nose, and mouth that are not affected by the image background. However, these types of algorithms don't work very well when trying to do more general object recognition, such as pedestrian detection in images. The reason is that people's inner features are not as consistent as the face, because everyone's body shape and style are different (see image below). This means that each person will have a different set of internal characteristics, so we need something that more fully describes a person.


Fig. 1. - Pedestrians.

One option is to try to detect pedestrians by their silhouettes. Detecting objects by their contours (boundaries) in an image is very challenging because we have to deal with the difficulty of contrast between background and foreground. For example, suppose you want to detect a pedestrian in an image, she is walking in front of a white building, wearing a white coat and black pants. We can see in the image below that since the background of the image is mostly white, the black pants will have very high contrast, but since the coat is also white, the contrast will be very low. In this case, detecting the edges of pants is easy, but detecting the edges of jackets is very difficult. And that's why HOG is needed . That is, Histograms of Oriented Gradients , which was first introduced in 2005 by Navneet Dalal and Bill Triggs.


Fig. 2. - High and Low Contrast.

The Hog algorithm works by creating a histogram of the distribution of gradient directions in an image, which is then normalized in a very specific way. This particular normalization allows Hog to efficiently detect the edges of objects, even in low-contrast situations. These normalized histograms are put into a feature vector (called the HOG descriptor), which can be used to train machine learning algorithms, such as support vector machines (SVM), to detect objects based on their boundaries (edges) in the image. Due to its great success and reliability, HOG has become one of the most widely used object detection algorithms in computer vision.

In this tutorial, the content that will be covered are:

  • How the HOG algorithm works
  • How to create a HOG descriptor using OpenCV
  • How to visualize HOG descriptors

1 HOG algorithm

As the name suggests, the HOG algorithm is based on creating a histogram from the gradient orientations of the image. The HOG algorithm is implemented through the following series of steps:

  1. Computes the gradient magnitude and direction for each pixel in the detection window.

  2. Computes the gradient magnitude and direction for each pixel in the detection window.

  3. Divide the detection window into connected cells of pixels, all of the same size (see image below). The size of the cell is a free parameter, which is usually chosen to match the scale of the features to be detected. For example, in a 64 x 128 pixel detection window, square cells 6 to 8 pixels wide are suitable for detecting human limbs.

  4. Create a histogram for each cell by first grouping the gradient directions of all pixels in each cell into a specific number of direction (angle) bins, then summing the gradient magnitudes of the gradients in each angle bin (see image below). The number of bins in the histogram is a free parameter, usually set to 9 corner bins.

  5. Group adjacent cells into blocks (see image below). The number of cells in each block is a free parameter and all blocks must be the same size. The distance between each block (called stride) is a free parameter, but it is usually set to half the block size, in which case overlapping blocks will be obtained (see animation). Experience shows that the algorithm handles overlapping blocks better.

  6. Use the cells contained in each block to normalize the histogram of cells in that block (see figure below). If there are overlapping blocks, it means that most of the cells will be normalized for different blocks (see animation). Therefore, the same unit may have several different normalizations

  7. Collect all normalized histograms in all blocks into one feature vector called HOG descriptor

  8. Use the HOG descriptors obtained from many images containing the same object to train a machine learning algorithm, for example using SVM, to detect these objects in the images. For example, an SVM can be trained to detect pedestrians in images using HOG descriptors from many images of pedestrians. Training is done using positive examples that contain objects and negative examples that do not.

  9. Once the SVM is trained, a sliding window approach is used to try to detect and localize objects in the image. Detecting objects in an image requires finding parts of the image that are similar to the HOG patterns learned by the SVM.


Fig. 3. - HOG Diagram.

Vid. 1. - HOG Animation.

2 Why the HOG algorithm works

As we learned above, HOG creates histograms, called "cells", by adding gradient magnitudes in specific directions in local regions of the image. By doing so it is guaranteed that stronger gradients contribute more to the size of their respective angle histograms, while minimizing the effect of weak and randomly oriented gradients caused by noise. In this way, the histogram tells us the main gradient direction for each cell.

2.1 Dealing with the problem of relativity

Now consider a problem where the magnitude of the gradient direction can vary greatly due to changes in local lighting and the contrast between background and foreground.

To take into account the difference in background-foreground contrast, the HOG algorithm tries to detect edges locally. To do this, it defines groups of cells called blocks and normalizes the histogram using that local group of cells. The HOG algorithm can detect edges in each block very reliably through local normalization, which is called block normalization .

In addition to using block normalization, the HOG algorithm also uses overlapping blocks to improve its performance. By using overlapping blocks, each unit provides several independent components for the final HOG descriptor, where each component corresponds to a unit normalized for a different block. This may seem redundant, but experience shows that the performance of the HOG algorithm is significantly improved by normalizing multiple times for each cell to different local blocks.

Load images and import resources

The first step in building a HOG descriptor is to load the required packages into Python and load our image.

We first load the image of the triangle tiles using OpenCV. Since cv2.imread()the function loads the image as BGR, we convert the image to RGB so it can be displayed with the correct colors. As usual, we will convert the BGR image to grayscale for analysis.

import cv2
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

# 设置默认图形的尺寸
plt.rcParams['figure.figsize'] = [17.0, 7.0]

# 载入图片 
image = cv2.imread('./images/triangle_tile.jpeg')

# 将原始图像转换为RGB
original_image = cv2.cvtColor(image, cv2.COLOR_BGR2RGB)

# 将原始图像转换为灰度
gray_image = cv2.cvtColor(image, cv2.COLOR_BGR2GRAY)

# 输出原始图像和灰度图像的形状
print('The original image has shape: ', original_image.shape)
print('The gray scale image has shape: ', gray_image.shape)

# 输出图像
plt.subplot(121)
plt.imshow(original_image)
plt.title('Original Image')
plt.subplot(122)
plt.imshow(gray_image, cmap='gray')
plt.title('Gray Scale Image')
plt.show()
The original image has shape:  (250, 250, 3)
The gray scale image has shape:  (250, 250)

insert image description here

3 Create HOG descriptor

We will use OpenCV HOGDescriptorclasses to create HOG descriptors. The parameters of the HOG descriptor are set using HOGDescriptor()function. The parameters of the function. HOGDescriptor()The parameters of the HOGDescriptor() function and their default values ​​are as follows:

cv2.HOGDescriptor(win_size = (64, 128), block_size = (16, 16), block_stride = (8, 8), cell_size = (8, 8), nbins = 9, win_sigma = DEFAULT_WIN_SIGMA, threshold_L2hys = 0.2, gamma_correction = true, nlevels = DEFAULT_NLEVELS)

The official parameters are explained as follows:

  • win_sizeSize
    The size of the detection window in pixels ( width, height ). Define the region of interest. Must be an integer multiple of the pixel size.

  • block_sizeSize
    block size in pixels ( width, height ). Defines how many cells are in each block. Must be an integer multiple of the pixel size and must be smaller than the detection window. The smaller the squares, the finer the detail that can be obtained.

  • block_strideSize Block stride
    Block stride in pixels ( horizontal, vertical ). It must be an integer multiple of the cell size. block_strideDefines the distance between adjacent blocks, for example, 8 pixels horizontally and 8 pixels vertically. Longer ones block_stridesmake the algorithm run faster (because fewer blocks are evaluated), but the algorithm may not perform as well.

  • cell_sizeSize
    Cell size in pixels ( width, height ). Determine the size of the cell. The smaller the cells, the finer the detail that can be obtained.

  • nbinsint The number of histograms (bins)
    The number of bins in the histogram. Determines the number of angle bins used to make the histogram. With more bins, you can capture more gradient directions. HOG uses unsigned gradients, so the value of the angle unit will be between 0 and 180 degrees.

  • win_sigmadouble
    Gaussian smoothing window parameter. By applying a Gaussian spatial window to each pixel before computing the histogram, pixels near block edges can be smoothed, improving the performance of the HOG algorithm.

  • threshold_L2hysdouble
    L2-Hys (L2 norm with Lowe-style pruning) normalization method shrinkage. The L2-Hys method is used to normalize blocks, which consists of L2 norm, clipping and renormalization. Clipping limits the maximum value of the descriptor vector for each chunk to a value with the given threshold (0.2 by default). After clipping, the descriptor vectors were renormalized as described in *IJCV*, 60(2):91-110, 2004.

  • gamma_correctionbool
    flag to specify whether gamma correction preprocessing is required. Performing gamma correction slightly improves the performance of the HOG algorithm.

  • nlevelsint
    The maximum number of detection windows to increase.

We can see that cv2.HOGDescriptor()the function supports various parameters. The first few parameters ( block_size, block_stride, cell_size, and nbins) are probably the most commonly used parameters. Other parameters can generally keep their default values ​​to obtain good results.

In the code below, we will use cv2.HOGDescriptor()functions to set the cell size, block size, block stride, and number of bins for the HOG descriptor histogram. The method is then used .compute(image)to compute the given imageHOG descriptor (feature vector).

# 为HOG描述符指定参数

# 像素大小(以像素为单位)(宽度,高度)。 它必须小于检测窗口的大小,
# 并且必须进行选择,以使生成的块大小小于检测窗口的大小。
cell_size = (6, 6)

# 每个方向(x,y)上每个块的单元数。 必须选择为使结果
# 块大小小于检测窗口
num_cells_per_block = (2, 2)

# 块大小(以像素为单位)(宽度,高度)。必须是“单元格大小”的整数倍。
# 块大小必须小于检测窗口。
block_size = (num_cells_per_block[0] * cell_size[0],
              num_cells_per_block[1] * cell_size[1])

# 计算在x和y方向上适合我们图像的像素数
x_cells = gray_image.shape[1] // cell_size[0]
y_cells = gray_image.shape[0] // cell_size[1]

# 块之间的水平距离,以像元大小为单位。 必须为整数,并且必须
# 将其设置为(x_cells-num_cells_per_block [0])/ h_stride =整数。
h_stride = 1

# 块之间的垂直距离,以像元大小为单位。 必须为整数,并且必须
# 将其设置为 (y_cells - num_cells_per_block[1]) / v_stride = integer.
v_stride = 1

# 块跨距(以像素为单位)(水平,垂直)。 必须是像素大小的整数倍。
block_stride = (cell_size[0] * h_stride, cell_size[1] * v_stride)

# 梯度定向箱的数量
num_bins = 9        


# 指定检测窗口(感兴趣区域)的大小,以像素(宽度,高度)为单位。 
# 它必须是“单元格大小”的整数倍,并且必须覆盖整个图像。 
# 由于检测窗口必须是像元大小的整数倍,具体取决于您像元的大小,
# 因此生成的检测窗可能会比图像小一些。 
# 完全可行
win_size = (x_cells * cell_size[0] , y_cells * cell_size[1])

# 输出灰度图像的形状以供参考
print('\nThe gray scale image has shape: ', gray_image.shape)
print()

# 输出HOG描述符的参数
print('HOG Descriptor Parameters:\n')
print('Window Size:', win_size)
print('Cell Size:', cell_size)
print('Block Size:', block_size)
print('Block Stride:', block_stride)
print('Number of Bins:', num_bins)
print()

# 使用上面定义的变量设置HOG描述符的参数
hog = cv2.HOGDescriptor(win_size, block_size, block_stride, cell_size, num_bins)

# 计算灰度图像的HOG描述符
hog_descriptor = hog.compute(gray_image)
The gray scale image has shape:  (250, 250)

HOG Descriptor Parameters:

Window Size: (246, 246)
Cell Size: (6, 6)
Block Size: (12, 12)
Block Stride: (6, 6)
Number of Bins: 9

4 Number of elements in the HOG descriptor

The HOG descriptor (feature vector) is a long vector concat of the normalized histograms of all cells of all blocks in the detection window. Thus, the size of the HOG feature vector will be given by the total number of blocks in the detection window times the number of cells per block times the number of orientation bins:

\begin{equation} \mbox{total_elements} = (\mbox{total_number_of_blocks})\mbox{ } \times \mbox{ } (\mbox{number_cells_per_block})\mbox{ } \times \mbox{ } (\mbox{number_of_bins}) \end{equation}

If we have no overlapping blocks (i.e. block_stridethe equal block_sizecase), the total number of blocks can be easily calculated by dividing the size of the detection window by the block size. However, in the general case we have to take into account the fact that there are overlapping blocks. To find the total number of blocks in general (i.e. for any block_strideand block_size), we can use the formula given below:

\begin{equation} \mbox{Total}_i = \left( \frac{\mbox{block_size}_i}{\mbox{block_stride}_i} \right)\left( \frac{\mbox{window_size}_i}{\mbox{block_size}_i} \right) - \left [\left( \frac{\mbox{block_size}_i}{\mbox{block_stride}_i} \right) - 1 \right]; \mbox{ for } i = x,y \end{equation}

where Total x _xx, is the total number of blocks along the detection window width, Total y _yy, is the total number of blocks along the detection window height. Total x_xxTotal y _yyThe formula for takes into account the extra blocks produced by the overlap. In calculating Total x _xxTotal y _yyAfter that, we can get the total number of blocks in the detection window Total x _x by multiplyingx × \times × Totaly _yy. The above formula can be greatly simplified because block_size, block_stride, and window_sizeare all by cell_sizedefinition. By making all the appropriate substitutions and cancellations, the above formula simplifies to:

\begin{equation} \mbox{Total}_i = \left(\frac{\mbox{cells}_i - \mbox{num_cells_per_block}_i}{N_i}\right) + 1\mbox{ }; \mbox{ for } i = x,y \end{equation}

where cells x _xxis the total number of cells along the detection window width, cells y _yyis the total number of cells along the height of the detection window. N x N_xNxis cell_sizethe horizontal block stride in unit, N y N_yNyis the vertical block stride cell_sizein units of .

Let's calculate the number of elements of the HOG feature vector and check if it matches the shape of the HOG descriptor calculated above

# 计算沿着检测窗口宽度的总块数
tot_bx = np.uint32(((x_cells - num_cells_per_block[0]) / h_stride) + 1)

# 计算沿着检测窗口高度的块总数
tot_by = np.uint32(((y_cells - num_cells_per_block[1]) / v_stride) + 1)

# 计算特征向量中的元素总数
tot_els = (tot_bx) * (tot_by) * num_cells_per_block[0] * num_cells_per_block[1] * num_bins

# 输出HOG特征向量应具有的元素总数
print('\nThe total number of elements in the HOG Feature Vector should be: ',
      tot_bx, 'x',
      tot_by, 'x',
      num_cells_per_block[0], 'x',
      num_cells_per_block[1], 'x',
      num_bins, '=',
      tot_els)

# 打印HOG描述符的形状,看它是否与上面的匹配
print('\nThe HOG Descriptor has shape:', hog_descriptor.shape)
print()
The total number of elements in the HOG Feature Vector should be:  40 x 40 x 2 x 2 x 9 = 57600

The HOG Descriptor has shape: (57600, 1)

5 Visualizing HOG descriptors

OpenCV has no easy way to visualize HOG descriptors, so we have to do some manipulations to visualize it first. We'll start by reshaping the HOG descriptor to make our calculations easier. We will then calculate the average histogram for each cell and finally convert the histogram bins to vectors. Once you have the vectors, you can plot the corresponding vectors for each cell in the image.

The code below produces an interactive plot so you can interact with the plot. The figure contains:

  • Grayscale image,
  • HOG descriptors (feature vectors),
  • the enlarged part of the HOG descriptor, and
  • Histogram of selected cells.

You can click anywhere on the grayscale image or HOG descriptor image to select specific cells . After clicking either image, a magenta rectangle will appear showing your selected cell. The zoom window will show you a magnified version of the HOG descriptor around the selected cell; the histogram will show you the corresponding histogram for the selected cell. There are also buttons at the bottom of the interactive window that allow for additional functions such as panning, and optionally save the graph if desired. The home button will restore the graphics to their defaults.

Note : If you run this notebook in a Udacity workspace, there is about a 2 second delay in the interactive plot. This means that if you click on the image to zoom in, it takes about 2 seconds to refresh the graph.

%matplotlib notebook
%matplotlib inline
import copy
import matplotlib.patches as patches

# 设置默认图形尺寸
plt.rcParams['figure.figsize'] = [9.8, 9]

# 将特征向量重塑为 [blocks_y, blocks_x, num_cells_per_block_x, num_cells_per_block_y, num_bins].
# blocks_x和blocks_y将被换位,以便第一个索引(blocks_y)引用行号
# 第二个索引引用列号。 稍后在绘制特征向量时这将很有用,
# 以便特征向量索引与图像索引匹配。
hog_descriptor_reshaped = hog_descriptor.reshape(tot_bx,
                                                 tot_by,
                                                 num_cells_per_block[0],
                                                 num_cells_per_block[1],
                                                 num_bins).transpose((1, 0, 2, 3, 4))

# 输出特征向量的形状以供参考
print('The feature vector has shape:', hog_descriptor.shape)

# 输出重塑的特征向量
print('The reshaped feature vector has shape:', hog_descriptor_reshaped.shape)

# 创建一个数组,该数组将保存每个单元的平均梯度
ave_grad = np.zeros((y_cells, x_cells, num_bins))

# 输出ave_grad数组的形状以供参考
print('The average gradient array has shape: ', ave_grad.shape) 

# 创建一个数组,该数组将计算每个单元格的直方图数量
hist_counter = np.zeros((y_cells, x_cells, 1))

# 将每个单元格的所有直方图相加并计算每个单元格的直方图数
for i in range (num_cells_per_block[0]):
    for j in range(num_cells_per_block[1]):
        ave_grad[i:tot_by + i,
                 j:tot_bx + j] += hog_descriptor_reshaped[:, :, i, j, :]
        
        hist_counter[i:tot_by + i,
                     j:tot_bx + j] += 1

# 计算每个单元的平均梯度
ave_grad /= hist_counter
   
# 计算在所有单元格中拥有的向量总数。
len_vecs = ave_grad.shape[0] * ave_grad.shape[1] * ave_grad.shape[2]

# 创建一个数组,该数组的num_bins的弧度在0至180度之间。
deg = np.linspace(0, np.pi, num_bins, endpoint = False)

# 每个单元格都有一个带有num_bins的直方图。 对于每个像元,将每个仓绘制为矢量
# (其大小等于直方图中仓的高度,其角度与直方图中的仓对应)。 
# 为此,创建第1级数组,该数组将保留图像中所有单元格中所有向量的(x,y)坐标。
# 此外,创建等级1数组,该数组将保存图像中所有单元格中所有向量的所有(U,V)分量。
# 创建将包含所有向量位置和成分的数组。
U = np.zeros((len_vecs))
V = np.zeros((len_vecs))
X = np.zeros((len_vecs))
Y = np.zeros((len_vecs))

# 将计数器设置为零
counter = 0

# 使用余弦和正弦函数从其大小计算矢量分量(U,V)。 
# 请记住,余弦和正弦函数采用弧度表示角度。
# 从平均梯度数组计算矢量位置和大小
for i in range(ave_grad.shape[0]):
    for j in range(ave_grad.shape[1]):
        for k in range(ave_grad.shape[2]):
            U[counter] = ave_grad[i,j,k] * np.cos(deg[k])
            V[counter] = ave_grad[i,j,k] * np.sin(deg[k])
        
            X[counter] = (cell_size[0] / 2) + (cell_size[0] * i)
            Y[counter] = (cell_size[1] / 2) + (cell_size[1] * j)
        
            counter = counter + 1

# 创建以度为单位的箱,以绘制直方图。
angle_axis = np.linspace(0, 180, num_bins, endpoint = False)
angle_axis += ((angle_axis[1] - angle_axis[0]) / 2)

# 创建一个以2 x 2排列的4个子图的图形
fig, ((a,b),(c,d)) = plt.subplots(2,2)

# 设置每个子图的标题
a.set(title = 'Gray Scale Image\n(Click to Zoom)')
b.set(title = 'HOG Descriptor\n(Click to Zoom)')
c.set(title = 'Zoom Window', xlim = (0, 18), ylim = (0, 18), autoscale_on = False)
d.set(title = 'Histogram of Gradients')

# 绘制灰度图像
a.imshow(gray_image, cmap = 'gray')
a.set_aspect(aspect = 1)

# 绘制特征向量(HOG描述符)
b.quiver(Y, X, U, V, color = 'white', headwidth = 0, headlength = 0, scale_units = 'inches', scale = 5)
b.invert_yaxis()
b.set_aspect(aspect = 1)
b.set_facecolor('black')

# 定义交互式缩放函数
def onpress(event):
    
    #除非按下鼠标左键,否则什么都不做
    if event.button != 1:
        return
    
    # 仅接受子图a和b的点击
    if event.inaxes in [a, b]:
        
        # 获取鼠标点击坐标
        x, y = event.xdata, event.ydata
        
        # 选择最接近鼠标单击坐标的单元格
        cell_num_x = np.uint32(x / cell_size[0])
        cell_num_y = np.uint32(y / cell_size[1])
        
        # 设置矩形面片的边缘坐标
        edgex = x - (x % cell_size[0])
        edgey = y - (y % cell_size[1])
        
        # 创建一个与上面所选单元格匹配的矩形补丁
        rect = patches.Rectangle((edgex, edgey),
                                  cell_size[0], cell_size[1],
                                  linewidth = 1,
                                  edgecolor = 'magenta',
                                  facecolor='none')
        
        # 单个补丁只能在单个图中使用。
        # 创建补丁副本以在其他子图中使用
        rect2 = copy.copy(rect)
        rect3 = copy.copy(rect)
        
        # 更新所有子图
        a.clear()
        a.set(title = 'Gray Scale Image\n(Click to Zoom)')
        a.imshow(gray_image, cmap = 'gray')
        a.set_aspect(aspect = 1)
        a.add_patch(rect)

        b.clear()
        b.set(title = 'HOG Descriptor\n(Click to Zoom)')
        b.quiver(Y, X, U, V, color = 'white', headwidth = 0, headlength = 0, scale_units = 'inches', scale = 5)
        b.invert_yaxis()
        b.set_aspect(aspect = 1)
        b.set_facecolor('black')
        b.add_patch(rect2)

        c.clear()
        c.set(title = 'Zoom Window')
        c.quiver(Y, X, U, V, color = 'white', headwidth = 0, headlength = 0, scale_units = 'inches', scale = 1)
        c.set_xlim(edgex - cell_size[0], edgex + (2 * cell_size[0]))
        c.set_ylim(edgey - cell_size[1], edgey + (2 * cell_size[1]))
        c.invert_yaxis()
        c.set_aspect(aspect = 1)
        c.set_facecolor('black')
        c.add_patch(rect3)

        d.clear()
        d.set(title = 'Histogram of Gradients')
        d.grid()
        d.set_xlim(0, 180)
        d.set_xticks(angle_axis)
        d.set_xlabel('Angle')
        d.bar(angle_axis,
              ave_grad[cell_num_y, cell_num_x, :],
              180 // num_bins,
              align = 'center',
              alpha = 0.5,
              linewidth = 1.2,
              edgecolor = 'k')

        fig.canvas.draw()

# 在图形和鼠标单击之间创建连接
fig.canvas.mpl_connect('button_press_event', onpress)
plt.show()
The feature vector has shape: (57600, 1)
The reshaped feature vector has shape: (40, 40, 2, 2, 9)
The average gradient array has shape:  (41, 41, 9)

insert image description here

6 Understanding Histograms

Let's analyze a few static screenshots of the above figure to see if the histogram of the selected cell makes sense. Let's start by looking at cells inside the triangle instead of near the edges:


Fig. 4. - Histograms Inside a Triangle.

In this case, since the triangles are almost all the same color, there shouldn't be any major gradients in the selected cells. This is indeed the case, as we can clearly see in the zoom window and histogram. We have a lot of gradients and none of them clearly dominate the other.

Now let's look at the cells near the horizontal edges:


Fig. 5. - Histograms Near a Horizontal Edge.

Remember that edges are areas in the image where there is a sudden change in intensity. In these cases there is a high intensity gradient in a particular direction. This is exactly what we see in the corresponding histogram and zoom window for the selected cell. In the zoom window it can be seen that the dominant gradient is upwards, almost at 90 degrees, as this is the direction of the sharp change in intensity. Therefore, we should expect the 90-degree region in the histogram to be stronger than the other regions. That's actually what we're seeing.

Now let's look at the cells near the vertical edges:


Fig. 6. - Histograms Near a Vertical Edge.

In this case, we expect the dominant gradient in the cell to be horizontal, close to 180 degrees, since this is the direction in which the intensity changes sharply. Therefore, we should expect the 170-degree region in the histogram to have more gradient influence than other regions. This is what we see in the histogram, but we also see that there is another dominant gradient in the cell, the gradient in the 10 degree bin. This is because the HOG algorithm uses unsigned gradients, which means that 0 degrees and 180 degrees are considered the same. Therefore, when creating a histogram, the angle between 160 and 180 degrees is proportional to the 10 degree bins and 170 degree bins. This results in two main gradients in cells near vertical edges instead of just one.

To summarize, let's look at the cells near the diagonal edges.


Fig. 7. - Histograms Near a Diagonal Edge.

To understand what we're seeing, let's first remember that a gradient consists of an x ​​part (component) and a y part (component), just like a vector. Therefore, the final direction of the gradient will be given by the vector sum of its components. So, on vertical edges, gradients are horizontal because they only have an x ​​component, as shown in the image above. On horizontal edges, gradients are vertical because they only have a y component, as shown in the image above. So, at a diagonal edge, the gradient will also be diagonal, since both *x* and *y* components are non-zero. Since the diagonal edges in the image are close to 45 degrees, we should expect to see significant gradient directions in the 50 degree bins. And that's actually what we see in the histogram, however, as shown above, we see that there are two dominant gradients instead of one. The reason for this is that when creating a histogram, the angle near the bin boundary acts proportionally to the adjacent bins. For example, a gradient with an angle of 40 degrees is halfway between the 30 and 50 degree bins. Therefore, the magnitude of the gradient is evenly divided into 30-degree and 50-degree bins. This results in two main gradients in cells near the diagonal edges instead of just one.

Now that you know how to implement HOG, in the workspace you will find a notebook called Examples . Here you can set your own parameters for the HOG descriptors of various images. have fun!

Guess you like

Origin blog.csdn.net/chenyu128/article/details/131176778