Detailed explanation of HOG feature algorithm for feature detection and use of Opencv interface

1. Introduction to HOG features

A feature descriptor is a representation of an image or an image patch that simplifies an image by extracting useful information and discarding irrelevant information.

  • In general, a feature descriptor W x H x 3converts an image of size (channels) into na feature vector/array of length . For the HOG feature descriptor, the size of the input image is 64 x 128 x 3, and the length of the output feature vector is 3780.
  • In the HOG feature descriptor, distributions (histograms) of gradient directions are used as features. Gradients (x and y derivatives) of an image are useful because the magnitude of the gradient is large around edges and corners (areas of sudden changes in intensity), which we know contain more information about the shape of an object than flat areas.
  • The HOG (short for Histogram of Oriented Gridients) feature detection algorithm was first proposed by French researcher Dalal et al. on CVPR-2005. It is an image descriptor that solves human target detection. and descriptors of gradient strength distribution properties. The main idea is that when the specific position of the edge is unknown, the distribution of the edge direction can also well represent the outline of the pedestrian target.
  • Several steps of the HOG feature detection algorithm: 图像预处理—>梯度计算—>梯度方向直方图—>重叠块直方图归一化—>HOG特征. They are introduced respectively below.
    insert image description here

2. Implementation of HOG algorithm

2.1 Image preprocessing

  • Image Scaling
    The HOG feature descriptors for pedestrian detection are computed on 64×128 dimensions of the image. Of course, the image can be of any size. Typically, plaques at multiple scales are analyzed at many image locations. The only limitation is that the patches being analyzed have a fixed aspect ratio. In our case, the patch needs to have a 1:2 aspect ratio. For example, they can be 100x200, 128x256, or 1000x2000, but not 101x205.
    insert image description here

  • Grayscale For color images, RGB
    components can be converted into grayscale images, and the conversion formula is:
    insert image description here

  • Gamma correction
    In the case of uneven illumination of the image, Gamma correction can be used to increase or decrease the overall brightness of the image. In practice, Gamma can be standardized in two different ways, square root and logarithmic. Here we use the square root method, the formula is as follows (where γ=0.5):
    insert image description here

2.2 Calculating the gradient image

  • Note: The following steps omit the process of grayscale and gamma change, and take the color image of pedestrians as an example to calculate HOG features. To calculate the HOG
    descriptor, we need to calculate the horizontal and vertical gradients first; after all, we want to calculate the histogram of the gradients. This can be easily achieved by filtering the image with the following kernel.
    insert image description here

We can also achieve the same result by using the Sobel operator in OpenCV with kernel size 1:

// C++ gradient calculation.
// Read image
Mat img = imread("bolt.png");
img.convertTo(img, CV_32F, 1/255.0);
 
// Calculate gradients gx, gy
Mat gx, gy;
Sobel(img, gx, CV_32F, 1, 0, 1);
Sobel(img, gy, CV_32F, 0, 1, 1);
# Python gradient calculation 
 
# Read image
im = cv2.imread('bolt.png')
im = np.float32(im) / 255.0
 
# Calculate gradient
gx = cv2.Sobel(img, cv2.CV_32F, 1, 0, ksize=1)
gy = cv2.Sobel(img, cv2.CV_32F, 0, 1, ksize=1)

Next, we can find the magnitude and direction of the gradient using the following formula:
insert image description here

If you are using OpenCV, the calculation can be done using the function cartToPolar like this:

// C++ Calculate gradient magnitude and direction (in degrees)
Mat mag, angle;
cartToPolar(gx, gy, mag, angle, 1);

The Python code is as follows:

# Python Calculate gradient magnitude and direction ( in degrees )
mag, angle = cv2.cartToPolar(gx, gy, angleInDegrees=True)

The image below shows a gradient:
insert image description here
at each pixel, the gradient has a magnitude and a direction. For color images, the gradients of the three channels are evaluated (shown above). The gradient size at a pixel is the maximum value of the gradient sizes of the three channels, and the angle is the angle corresponding to the maximum gradient.

2.3 Calculate the gradient histogram in 8×8 cells

  • Why 8×8 patches? Why not 32×32? This is a design choice dictated by the functional scale we are looking for. HOG was originally used for pedestrian detection. In a pedestrian photo with a ratio of 8×8, 64×128 cells are large enough to capture interesting features (such as faces, tops of heads, etc.).
    insert image description here
  • The next step is to create a gradient histogram in these 8×8 cells. The histogram contains 9 bins, corresponding to angles 0, 20, 40 … 160. The figure below illustrates the process. We are studying the magnitude and direction of the gradient for the same 8×8 patch as above.
  • Bins are selected by orientation, and votes (values ​​into the bins) are selected by size. Let's first focus on the pixels surrounded by blue. Its angle (orientation) is 80 degrees and its magnitude is 2. Therefore, it adds 2 to the 5th bin. Pixels surrounded by red have a gradient angle of 10 degrees and a magnitude of 4. Since 10 degrees is between 0 and 20, the pixel's votes will be evenly divided into two bins.
    insert image description here
  • There is one more detail to pay attention to. If the angle is greater than 160 degrees, then between 160 and 180, we know that the angle wraps around making 0 and 180 equivalent. So, in the example below, a pixel with an angle of 165 degrees contributes proportionally to the 0 and 160 degree bins.
    insert image description here
  • The contributions of all pixels in the 8×8 cells are summed to create a 9-bin histogram. For the dimensions above, it looks like this:
    insert image description here
    In our representation, the y-axis is at 0 degrees. You can see that the histogram has a lot of weight around 0 degrees and 180 degrees, which is just another way of saying that the gradient points up or down in the color patch.

2.4 16×16 block normalization

  • Ideally, we want the descriptor to be independent of lighting changes. In other words, we want to "normalize" the histograms so that they are not affected by changes in lighting.
    insert image description here

  • As shown above, a 16×16 block has 4 histograms, which can be concatenated to form a 36 x 1 element vector, and can be normalized like a 3×1 vector normalization. The window is then shifted by 8 pixels (see animation), and a normalized 36×1 vector is computed on this window and the process is repeated. Among them, normalization is implemented as follows: standardize each block separately, there are 4 cells in a block, and each cell contains a 9-dimensional feature vector, so each block is represented by a 4x9=36-dimensional feature vector.
    insert image description here

2.5 HOG features

  • The HOG feature is to calculate the histogram of the oriented gradient feature vector

  • To compute the final feature vector for the entire image patch, the 36×1 vectors are concatenated into one giant vector. What is the size of this vector? Let's calculate it:

    • How many positions are there in 16×16 blocks? There are 7 horizontal positions and 15 vertical positions, for a total of 7 x 15 = 105 positions.
    • Each 16x16 block is represented by a 36x1 vector. So when we concatenate them all into one gain vector, we get a 36×105 = 3780 dimensional vector.
  • The calculation results are visualized as follows:

insert image description here

3. Use of HOG features in Opencv

#include "opencv2/imgproc.hpp"
#include "opencv2/highgui.hpp"
#include "opencv2/ml.hpp"
#include "opencv2/objdetect.hpp"
#include "opencv2/videoio.hpp"
#include <iostream>
#include <time.h>
using namespace cv;
using namespace cv::ml;
using namespace std;
vector< float > get_svm_detector( const Ptr< SVM >& svm );
void convert_to_ml( const std::vector< Mat > & train_samples, Mat& trainData );
void load_images( const String & dirname, vector< Mat > & img_lst, bool showImages );
void sample_neg( const vector< Mat > & full_neg_lst, vector< Mat > & neg_lst, const Size & size );
void computeHOGs( const Size wsize, const vector< Mat > & img_lst, vector< Mat > & gradient_lst, bool use_flip );
void test_trained_detector( String obj_det_filename, String test_dir, String videofilename );
vector< float > get_svm_detector( const Ptr< SVM >& svm )
{
    
    
    // get the support vectors
    Mat sv = svm->getSupportVectors();
    const int sv_total = sv.rows;
    // get the decision function
    Mat alpha, svidx;
    double rho = svm->getDecisionFunction( 0, alpha, svidx );
    CV_Assert( alpha.total() == 1 && svidx.total() == 1 && sv_total == 1 );
    CV_Assert( (alpha.type() == CV_64F && alpha.at<double>(0) == 1.) ||
               (alpha.type() == CV_32F && alpha.at<float>(0) == 1.f) );
    CV_Assert( sv.type() == CV_32F );
    vector< float > hog_detector( sv.cols + 1 );
    memcpy( &hog_detector[0], sv.ptr(), sv.cols*sizeof( hog_detector[0] ) );
    hog_detector[sv.cols] = (float)-rho;
    return hog_detector;
}
/*
* Convert training/testing set to be used by OpenCV Machine Learning algorithms.
* TrainData is a matrix of size (#samples x max(#cols,#rows) per samples), in 32FC1.
* Transposition of samples are made if needed.
*/
void convert_to_ml( const vector< Mat > & train_samples, Mat& trainData )
{
    
    
    //--Convert data
    const int rows = (int)train_samples.size();
    const int cols = (int)std::max( train_samples[0].cols, train_samples[0].rows );
    Mat tmp( 1, cols, CV_32FC1 ); //< used for transposition if needed
    trainData = Mat( rows, cols, CV_32FC1 );
    for( size_t i = 0 ; i < train_samples.size(); ++i )
    {
    
    
        CV_Assert( train_samples[i].cols == 1 || train_samples[i].rows == 1 );
        if( train_samples[i].cols == 1 )
        {
    
    
            transpose( train_samples[i], tmp );
            tmp.copyTo( trainData.row( (int)i ) );
        }
        else if( train_samples[i].rows == 1 )
        {
    
    
            train_samples[i].copyTo( trainData.row( (int)i ) );
        }
    }
}
void load_images( const String & dirname, vector< Mat > & img_lst, bool showImages = false )
{
    
    
    vector< String > files;
    glob( dirname, files );
    for ( size_t i = 0; i < files.size(); ++i )
    {
    
    
        Mat img = imread( files[i] ); // load the image
        if ( img.empty() )
        {
    
    
            cout << files[i] << " is invalid!" << endl; // invalid image, skip it.
            continue;
        }
        if ( showImages )
        {
    
    
            imshow( "image", img );
            waitKey( 1 );
        }
        img_lst.push_back( img );
    }
}
void sample_neg( const vector< Mat > & full_neg_lst, vector< Mat > & neg_lst, const Size & size )
{
    
    
    Rect box;
    box.width = size.width;
    box.height = size.height;
    srand( (unsigned int)time( NULL ) );
    for ( size_t i = 0; i < full_neg_lst.size(); i++ )
        if ( full_neg_lst[i].cols > box.width && full_neg_lst[i].rows > box.height )
        {
    
    
            box.x = rand() % ( full_neg_lst[i].cols - box.width );
            box.y = rand() % ( full_neg_lst[i].rows - box.height );
            Mat roi = full_neg_lst[i]( box );
            neg_lst.push_back( roi.clone() );
        }
}
void computeHOGs( const Size wsize, const vector< Mat > & img_lst, vector< Mat > & gradient_lst, bool use_flip )
{
    
    
    HOGDescriptor hog;
    hog.winSize = wsize;
    Mat gray;
    vector< float > descriptors;
    for( size_t i = 0 ; i < img_lst.size(); i++ )
    {
    
    
        if ( img_lst[i].cols >= wsize.width && img_lst[i].rows >= wsize.height )
        {
    
    
            Rect r = Rect(( img_lst[i].cols - wsize.width ) / 2,
                          ( img_lst[i].rows - wsize.height ) / 2,
                          wsize.width,
                          wsize.height);
            cvtColor( img_lst[i](r), gray, COLOR_BGR2GRAY );
            hog.compute( gray, descriptors, Size( 8, 8 ), Size( 0, 0 ) );
            gradient_lst.push_back( Mat( descriptors ).clone() );
            if ( use_flip )
            {
    
    
                flip( gray, gray, 1 );
                hog.compute( gray, descriptors, Size( 8, 8 ), Size( 0, 0 ) );
                gradient_lst.push_back( Mat( descriptors ).clone() );
            }
        }
    }
}
void test_trained_detector( String obj_det_filename, String test_dir, String videofilename )
{
    
    
    cout << "Testing trained detector..." << endl;
    HOGDescriptor hog;
    hog.load( obj_det_filename );
    vector< String > files;
    glob( test_dir, files );
    int delay = 0;
    VideoCapture cap;
    if ( videofilename != "" )
    {
    
    
        if ( videofilename.size() == 1 && isdigit( videofilename[0] ) )
            cap.open( videofilename[0] - '0' );
        else
            cap.open( videofilename );
    }
    obj_det_filename = "testing " + obj_det_filename;
    namedWindow( obj_det_filename, WINDOW_NORMAL );
    for( size_t i=0;; i++ )
    {
    
    
        Mat img;
        if ( cap.isOpened() )
        {
    
    
            cap >> img;
            delay = 1;
        }
        else if( i < files.size() )
        {
    
    
            img = imread( files[i] );
        }
        if ( img.empty() )
        {
    
    
            return;
        }
        vector< Rect > detections;
        vector< double > foundWeights;
        hog.detectMultiScale( img, detections, foundWeights );
        for ( size_t j = 0; j < detections.size(); j++ )
        {
    
    
            Scalar color = Scalar( 0, foundWeights[j] * foundWeights[j] * 200, 0 );
            rectangle( img, detections[j], color, img.cols / 400 + 1 );
        }
        imshow( obj_det_filename, img );
        if( waitKey( delay ) == 27 )
        {
    
    
            return;
        }
    }
}
int main( int argc, char** argv )
{
    
    
    const char* keys =
    {
    
    
        "{help h|     | show help message}"
        "{pd    |     | path of directory contains positive images}"
        "{nd    |     | path of directory contains negative images}"
        "{td    |     | path of directory contains test images}"
        "{tv    |     | test video file name}"
        "{dw    |     | width of the detector}"
        "{dh    |     | height of the detector}"
        "{f     |false| indicates if the program will generate and use mirrored samples or not}"
        "{d     |false| train twice}"
        "{t     |false| test a trained detector}"
        "{v     |false| visualize training steps}"
        "{fn    |my_detector.yml| file name of trained SVM}"
    };
    CommandLineParser parser( argc, argv, keys );
    if ( parser.has( "help" ) )
    {
    
    
        parser.printMessage();
        exit( 0 );
    }
    String pos_dir = parser.get< String >( "pd" );
    String neg_dir = parser.get< String >( "nd" );
    String test_dir = parser.get< String >( "td" );
    String obj_det_filename = parser.get< String >( "fn" );
    String videofilename = parser.get< String >( "tv" );
    int detector_width = parser.get< int >( "dw" );
    int detector_height = parser.get< int >( "dh" );
    bool test_detector = parser.get< bool >( "t" );
    bool train_twice = parser.get< bool >( "d" );
    bool visualization = parser.get< bool >( "v" );
    bool flip_samples = parser.get< bool >( "f" );
    if ( test_detector )
    {
    
    
        test_trained_detector( obj_det_filename, test_dir, videofilename );
        exit( 0 );
    }
    if( pos_dir.empty() || neg_dir.empty() )
    {
    
    
        parser.printMessage();
        cout << "Wrong number of parameters.\n\n"
             << "Example command line:\n" << argv[0] << " -dw=64 -dh=128 -pd=/INRIAPerson/96X160H96/Train/pos -nd=/INRIAPerson/neg -td=/INRIAPerson/Test/pos -fn=HOGpedestrian64x128.xml -d\n"
             << "\nExample command line for testing trained detector:\n" << argv[0] << " -t -fn=HOGpedestrian64x128.xml -td=/INRIAPerson/Test/pos";
        exit( 1 );
    }
    vector< Mat > pos_lst, full_neg_lst, neg_lst, gradient_lst;
    vector< int > labels;
    clog << "Positive images are being loaded..." ;
    load_images( pos_dir, pos_lst, visualization );
    if ( pos_lst.size() > 0 )
    {
    
    
        clog << "...[done] " << pos_lst.size() << " files." << endl;
    }
    else
    {
    
    
        clog << "no image in " << pos_dir <<endl;
        return 1;
    }
    Size pos_image_size = pos_lst[0].size();
    if ( detector_width && detector_height )
    {
    
    
        pos_image_size = Size( detector_width, detector_height );
    }
    else
    {
    
    
        for ( size_t i = 0; i < pos_lst.size(); ++i )
        {
    
    
            if( pos_lst[i].size() != pos_image_size )
            {
    
    
                cout << "All positive images should be same size!" << endl;
                exit( 1 );
            }
        }
        pos_image_size = pos_image_size / 8 * 8;
    }
    clog << "Negative images are being loaded...";
    load_images( neg_dir, full_neg_lst, visualization );
    clog << "...[done] " << full_neg_lst.size() << " files." << endl;
    clog << "Negative images are being processed...";
    sample_neg( full_neg_lst, neg_lst, pos_image_size );
    clog << "...[done] " << neg_lst.size() << " files." << endl;
    clog << "Histogram of Gradients are being calculated for positive images...";
    computeHOGs( pos_image_size, pos_lst, gradient_lst, flip_samples );
    size_t positive_count = gradient_lst.size();
    labels.assign( positive_count, +1 );
    clog << "...[done] ( positive images count : " << positive_count << " )" << endl;
    clog << "Histogram of Gradients are being calculated for negative images...";
    computeHOGs( pos_image_size, neg_lst, gradient_lst, flip_samples );
    size_t negative_count = gradient_lst.size() - positive_count;
    labels.insert( labels.end(), negative_count, -1 );
    CV_Assert( positive_count < labels.size() );
    clog << "...[done] ( negative images count : " << negative_count << " )" << endl;
    Mat train_data;
    convert_to_ml( gradient_lst, train_data );
    clog << "Training SVM...";
    Ptr< SVM > svm = SVM::create();
    /* Default values to train SVM */
    svm->setCoef0( 0.0 );
    svm->setDegree( 3 );
    svm->setTermCriteria( TermCriteria(TermCriteria::MAX_ITER + TermCriteria::EPS, 1000, 1e-3 ) );
    svm->setGamma( 0 );
    svm->setKernel( SVM::LINEAR );
    svm->setNu( 0.5 );
    svm->setP( 0.1 ); // for EPSILON_SVR, epsilon in loss function?
    svm->setC( 0.01 ); // From paper, soft classifier
    svm->setType( SVM::EPS_SVR ); // C_SVC; // EPSILON_SVR; // may be also NU_SVR; // do regression task
    svm->train( train_data, ROW_SAMPLE, labels );
    clog << "...[done]" << endl;
    if ( train_twice )
    {
    
    
        clog << "Testing trained detector on negative images. This might take a few minutes...";
        HOGDescriptor my_hog;
        my_hog.winSize = pos_image_size;
        // Set the trained svm to my_hog
        my_hog.setSVMDetector( get_svm_detector( svm ) );
        vector< Rect > detections;
        vector< double > foundWeights;
        for ( size_t i = 0; i < full_neg_lst.size(); i++ )
        {
    
    
            if ( full_neg_lst[i].cols >= pos_image_size.width && full_neg_lst[i].rows >= pos_image_size.height )
                my_hog.detectMultiScale( full_neg_lst[i], detections, foundWeights );
            else
                detections.clear();
            for ( size_t j = 0; j < detections.size(); j++ )
            {
    
    
                Mat detection = full_neg_lst[i]( detections[j] ).clone();
                resize( detection, detection, pos_image_size, 0, 0, INTER_LINEAR_EXACT);
                neg_lst.push_back( detection );
            }
            if ( visualization )
            {
    
    
                for ( size_t j = 0; j < detections.size(); j++ )
                {
    
    
                    rectangle( full_neg_lst[i], detections[j], Scalar( 0, 255, 0 ), 2 );
                }
                imshow( "testing trained detector on negative images", full_neg_lst[i] );
                waitKey( 5 );
            }
        }
        clog << "...[done]" << endl;
        gradient_lst.clear();
        clog << "Histogram of Gradients are being calculated for positive images...";
        computeHOGs( pos_image_size, pos_lst, gradient_lst, flip_samples );
        positive_count = gradient_lst.size();
        clog << "...[done] ( positive count : " << positive_count << " )" << endl;
        clog << "Histogram of Gradients are being calculated for negative images...";
        computeHOGs( pos_image_size, neg_lst, gradient_lst, flip_samples );
        negative_count = gradient_lst.size() - positive_count;
        clog << "...[done] ( negative count : " << negative_count << " )" << endl;
        labels.clear();
        labels.assign(positive_count, +1);
        labels.insert(labels.end(), negative_count, -1);
        clog << "Training SVM again...";
        convert_to_ml( gradient_lst, train_data );
        svm->train( train_data, ROW_SAMPLE, labels );
        clog << "...[done]" << endl;
    }
    HOGDescriptor hog;
    hog.winSize = pos_image_size;
    hog.setSVMDetector( get_svm_detector( svm ) );
    hog.save( obj_det_filename );
    test_trained_detector( obj_det_filename, test_dir, videofilename );
    return 0;
}

reference

1. HOG: From theory to OpenCV practice
2. [Feature detection] HOG feature algorithm

Guess you like

Origin blog.csdn.net/yohnyang/article/details/129077773