OpenCV image feature extraction learning five, HOG feature detection algorithm

1. Overview of HOG gradient histogram

 

The Histogram of Oriented Gradient (HOG) feature is based on the computation of a normalized local histogram of oriented gradients in a dense grid. The basic idea of ​​this method is that the appearance and shape of local objects can be well described by the distribution of local gradients or edge directions, even if we do not know the corresponding gradients and edge locations. In actual operation, the image is divided into small cells (cells), and the one-dimensional gradient direction (or edge direction) histogram is accumulated and calculated in each cell.

For better invariance to lighting and shadows, contrast normalization of the histogram is required, which can be achieved by grouping cells into larger blocks and normalizing all cells within the block. The normalized block descriptor is called the HOG descriptor. The HOG descriptors of all blocks in the detection window are combined to form the final feature vector, which is then used for pedestrian detection using an SVM classifier. The detection window is divided into overlapping blocks, the HOG descriptors are calculated for these blocks, and the formed feature vectors are put into the linear SVM for target/non-target binary classification. The detection window scans the entire image at all positions and scales, and performs non-maximum suppression on the output pyramid to detect objects.

=========================================================================

2. The general process of the Histogram of Oriented Gradient (HOG) feature algorithm:

1) The HOG feature extraction method is to grayscale an image, that is, the target to be detected or the scanned form, which is to regard the image as a three-dimensional image of R, G, and B colors. For color images, the RGB components are converted into a grayscale image, the conversion mathematical formula is:

                                    Gray =0.33\cdot R+0.59\cdot G+0.11\cdot B

2) Use Gamma correction method to standardize (normalize) the color space of the input image; the purpose is to adjust the contrast of the image, reduce the influence of local shadows and illumination changes in the image, and at the same time suppress the interference of noise. In the case of uneven image illumination, Gamma correction can be used to increase or decrease the overall brightness of the image. In practice, Gamma can be standardized in two different ways, square root and logarithmic. Here we use the square root method, the formula is as follows (where γ=0.5):                                    

                                              Y\left ( x,y \right )= I\left ( x,y \right )^{\gamma }

3) Calculate the gradient dx and dy of each pixel of the image (including the gradient size and gradient direction); mainly to capture the contour information, and at the same time further weaken the interference of the light, and calculate the X-direction gradient dx and Y-direction gradient dy of the image, according to Gradient calculation mag and angle, when calculating the gradient, you can first Gaussian blur, use the sobel operator or other first-order derivative operator to calculate the gradient value dx, dy, the size and direction of the gradient:

 ,

Sobel's horizontal and vertical operators:

                             Vertical Direction=\begin{bmatrix} -1,&-2,&-1& \\ 0,&0,&0& \\1,&2,&1 \end{bmatrix} 

                            Horizontal Direction =\begin{bmatrix} -1,&0,&1& \\ -2,&0,&2& \\-1,&0,&1 \end{bmatrix}

First use the sobel operator and the Horizontal Direction gradient operator to perform convolution operations on the original image to obtain the gradient component gradscalx in the x direction (horizontal direction, with the right direction as the positive direction), and then use the Horizontal Direction gradient operator to perform convolution on the original image Product operation to get the gradient component gradscaly in the y direction (vertical direction, with upward as the positive direction). Then use the following formula to calculate the gradient size and direction of the pixel:

                               G_{x}\left ( x,y \right )=H\left ( x+1,y \right )-H\left ( x-1,y \right )

                               G_{y}\left ( x,y \right )=H\left ( x,y+1 \right )-H\left ( x,y-1 \right )

Represent the horizontal direction gradient, vertical direction gradient and pixel value of the pixel point (x, y) in the input image respectively, and the gradient magnitude and gradient direction at the pixel point (x, y) c are respectively:

                               G_{x,y}= \sqrt{G_{x}\left ( x,y \right )^{2}+G_{y}\left ( x,y \right )^{2}}

                               \alpha \left ( x,y \right )=tan^{-1}\left ( \frac{G_{x}\left ( x,y \right )}{G_{y}\left ( x,y \right )} \right )

Divide the image into 8x8 small grids, make a gradient direction histogram for the image in each small grid, each 8x8=64 pixels is a cell, and divide each cell into 9 histogram blocks according to the angle ( BIN), each pixel in the cell is weighted and projected in the histogram with the gradient direction, and mapped to a fixed angle range, the gradient direction histogram of the cell can be obtained, which is the corresponding 9-dimensional feature vector in the cell. For example, 20°-40° and 200°-220° fall into one histogram bin.

                        

4) Divide the image into small grid cells, construct a gradient direction histogram for each grid unit, and divide the gradient direction of the cell into 9 direction blocks at 360 degrees. Generally, a block (Block) is composed of several grid cells, and a cell is composed of several pixels. Assume that the parameter settings for pedestrian detection are: 2×2 cell/block, 8×8 pixels/cell, 9 histogram channels (9 bins), the length of the hog descriptor vector of a cell is 9, and the length of the feature vector of a block It is: 2×2×9=36, so the HOG vector length of the detection window=105×4×9=3780.

5) By counting the number of different gradients in the gradient histogram of each cell, the descriptor of each cell can be formed; when calculating the gradient histogram of each cell, trilinear interpolation can be used to improve the calculation rate. For the points in each cell, we think it is a three-dimensional vector\left ( x,y,\theta \right )                                                                                   

As can be seen from the picture below, the original image size is 720×475, which is cropped into a picture with a size of 64×128 pixels, and then can be divided into 128 grid cells of 8*8 size, and each grid cell will Computes a gradient histogram. A grid cell of size 8×8 pixels provides a compact/compressed representation.

In the image, each pixel includes three elements like the gradient magnitude in the x direction and the gradient direction direction, that is, an 8×8 pixel image has 8×8×3=192 pixel values, and the three channels are the largest The one of magnitude adds up to 8*8*2=128. Later we will see how these 128 numbers can be expressed as an array of 9 numbers with a histogram of 9 bins. Not only can there be a compact representation, but using a histogram to represent an image can also be more anti-noise. A gradient may have noise, but it will not be so sensitive to noise after being represented by a histogram.

 

The size of the above image is 64×128 pixels, divided into 128 grid cells of 8×8 pixel size, then the whole image has a total of 64/8 ×128/8 = 8*16=128 grids

According to the two tables of gradient magnitude and gradient direction, select the position of the bin according to the gradient direction, and determine the size of the bin according to the secondary value. The pixel in the blue circle corresponds to a gradient direction of 80 and a corresponding amplitude of 2. Add 2 to the fifth bin of the corresponding histogram; the corresponding gradient direction is 10 and the corresponding amplitude is 4. , because the angle 10 is in the middle of 0-20 degrees (exactly half), so the amplitude is divided into two according to the proportional effect size and placed in the two bins of 0 and 20.

If the angle is greater than 160 degrees, the angle is between 160 and 180 degrees, and the angle wraps so that 0 and 180 degrees are equal. So, in the example below, pixels with an angle of 165 degrees contribute proportionally to the 0 degree bin and the 160 degree bin.

It can be seen that there are many values ​​​​distributed in the bin of 0-180, which actually means that many gradient directions in this grid are either upward or downward. Add all the pixels in each 8*8 cell to the 9 bins to construct a 9-bin histogram. The histogram corresponding to the above grid is as follows:

 

6) Every several cells form a block (such as 2×2 cells/block), and the feature descriptors of all the cells in a block are connected in series to obtain the HOG feature descriptor descriptor of the block. There are several parameters that are very important, namely winSize(64,128), blockSize(16,16), blockStride(8,8), cellSize(8,8), nbins(9), which are represented by several schematic diagrams here.

a) window size winSize (64,128)

b) block size blockSize (16,16)

 c) cell size cellSize (8,8)

 Calculation of Hog feature dimension

HOGDescriptor* hog = newHOGDescriptor(cvSize(64, 48), cvSize(8, 6), cvSize(8, 6), cvSize(4, 3), 9);

According to the above description, cvSize(64,128) represents the size of the window, cvSize(16, 16) represents the size of the block, cvSize(8,8) represents the size of the block stride, cvSize(4, 4) Represents the cell size, and 9 represents the number of gradient histograms in each cell.

It can be seen that a block (block) contains A=(blockSize.width/cellSize.width)*(blockSize.height / cellSize.height)=128 grids (cells), so a block (block) contains 9A=1125 gradients histogram. A window can be calculated by B=((windowSize.width-blockSize.width)/(blockStrideSize.width)+1)* ((windowSize.height-blockSize.height)/(blockStrideSize.height)+1)= 105 blocks (block), so a window contains 9AB=3780 gradient histograms.

The 2x2 grid unit cells are combined into a large block (Block), and 1/2 of each block is an overlapping area. The main thing is to merge the histogram of each Cell into a large histogram vector, so that each block has 36 vector descriptors. Normalize the descriptor of each block. The common normalization process is L2-norm or L1-norm. The formula is as follows:

                                         L2-norm :f=\frac{v}{\sqrt{\begin{Vmatrix} v\\ \end{Vmatrix}_{2}^{2}+e^{2}}}

                                         L1-norm :f=\frac{v}{\begin{Vmatrix} v\\ \end{Vmatrix}_{2}+e}

7) Connect the HOG feature descriptors of all blocks in the image image in series to get the HOG feature descriptor of the image (the target you want to detect). This is the final feature vector available for classification.

=========================================================================

Code:

#include"stdafx.h"
#include <opencv2/opencv.hpp>
#include <iostream>
#include "math.h"
#include <opencv2/features2d.hpp>
#include <opencv2/highgui/highgui.hpp>
#include <opencv2/imgproc/imgproc.hpp>

using namespace cv;
using namespace std;
//using namespace cv::features2d;

int main(int argc, char** argv) {
	Mat src = imread("F:/photo/h1.jpg");
	if (src.empty()) {
		printf("could not load image...\n");
		return -1;
	}
	namedWindow("input image", WINDOW_AUTOSIZE);
	imshow("input image", src);

	/*Mat dst, dst_gray;
	resize(src, dst, Size(64, 128));
	cvtColor(dst, dst_gray, COLOR_BGR2GRAY);
	HOGDescriptor detector(Size(64, 128), Size(16, 16), Size(8, 8), Size(8, 8), 9);

	vector<float> descriptors;
	vector<Point> locations;
	detector.compute(dst_gray, descriptors, Size(0, 0), Size(0, 0), locations);
	printf("number of HOG descriptors : %d", descriptors.size());
	*/
	HOGDescriptor hog = HOGDescriptor();
	hog.setSVMDetector(hog.getDefaultPeopleDetector());

	vector<Rect> foundLocations;
	hog.detectMultiScale(src, foundLocations, 0, Size(8, 8), Size(32, 32), 1.05, 2);
	Mat result = src.clone();
	for (size_t t = 0; t < foundLocations.size(); t++) {
		rectangle(result, foundLocations[t], Scalar(0, 0, 255), 2, 8, 0);
	}
	namedWindow("HOG SVM Detector Demo", WINDOW_AUTOSIZE);
	imshow("HOG SVM Detector Demo", result);

	waitKey(0);
	return 0;
}

Image processing effect:

Pedestrian recognition: 

Pedestrian recognition:

Grayscale pedestrian recognition:

  

Text reference: Histogram of Oriented Gradients explained using OpenCV

Guess you like

Origin blog.csdn.net/weixin_44651073/article/details/128099635