Depth articles - image processing methods (iii) elaborate HOG features and bag-of-word

Skip to main content

Contents return some methods of image processing

Previous: Depth articles - some image processing method (ii)  elaborate and performance evaluation IOU GIOU

Next: depth articles - image processing methods (d)  elaborate pyramid image

 

In this section, elaborate HOG features and bag-of-word, the next section of the image pyramid elaborate

 

Five. HOG (Histogram of Oriented Gradient, HOG) histograms of oriented gradients features

HOG feature is a feature for object detection are described in the sub-image processing and computer vision. Characterized in that it is constituted by a histogram calculating statistical hungry gradient direction image local area. HOG features combined SVM classifier has been widely used in image recognition, in particular pedestrian detection obtained great success. Need to be reminded: pedestrian detection method of HOG + SVM is performed Dalal French researchers presented at the year 2005 CVPR, and now although there are many detection algorithms have been proposed, but the basic idea is to HOG SVM + based.

1. The main idea

    In an image, appearance and shape of a local object (appearance and shape) to be the edge direction gradient or distribution closely well described. (Nature: Statistics gradient, and the gradient mainly in the local edge).

 

2. The specific method

    First, the image into small communication area, it is called cell units, and then collecting or edge gradient direction of each pixel unit cell histogram. Finally, the combination of the histograms constituting feature descriptor.

 

3. HOG feature extraction algorithm implementation

   HOG feature extraction approach is an Image (object to be detected or the scanning window)

   (1). Grayscale (an image viewed as x, y, z (gradation) of the three-dimensional image)

   (2). Gamma correction method using the input image color space standardized (normalized). Adjusting the contrast of object image, and reduce the impact of illumination shading caused local change of the image, and can suppress interference noise. First, the entire image need to be normalized, the texture intensity in the image, the partial exposure of the surface contribution to the larger, so that the compression process can effectively reduce shadows and lighting variations in the image detail. Because the color is not the role of information, usually first converted to grayscale. Gamma compression formula: may for example take  \large Gamma = \frac{1}{2}

   (3) calculating a gradient image for each pixel (including magnitude and direction), primarily in order to capture the outline information, and further weakening of the interference light. Calculating the image gradients abscissa and ordinate direction, and calculate the gradient direction value for each pixel location; derivation operation can capture only the contour, the hungry figure some texture information, but also further weaken the influence of light.

         Image pixels in  \large (x,\; y) the gradient is:

         The most commonly used method is: first with [-1, 0, 1] gradient operator to convolve the original image to obtain the x-direction (horizontal direction to the right as a positive direction) of the gradient component grand scale x, then  \large [-1, 0, 1]^{T} gradient operator to do the convolution operation on the original image to obtain the y direction (vertical direction to the upward direction is positive) of the gradient component grand scale y. Then use the above formula to calculate the gradient magnitude and direction of the pixel point.

   (4) The image is divided into cells (for example 6 x 6 pixels / cell)

   (5). The gradient histogram statistics for each cell (the number of different gradients), to form a descriptor of each cell.

         Construction gradient histogram for each purpose unit cell to provide a partial image regions encoding the image while maintaining the posture of the human subject and appearance of weak sensitivity. cell unit may be rectangular, or may be star-shaped. Each cell in the direction of its corresponding bin in the direction of the gradient magnitude is projected weights.

   (6) The composition of a cell block every several (for example 3 x 3 th cell / block), a characteristic descriptor in all cell block are connected in series will give HOG feature descriptor of the block.

          The large Block cell units (blocks), the normalized gradient histogram block. Due to changes in the local illumination and foreground - background contrast changes, so that the intensity variation is very large range of the gradient. This requires doing normalized gradient strength. It can be further normalized to illumination, shadows, and edges compression.

          The composition of each unit cell of the National Cheng Kung University, blocks the communication of the space (interval). Thus, all feature vectors in a cell block are connected in series will give HOG features of the block. These intervals are mutually overlapping, which means that: each cell feature appear several times in the final feature vector to different results. The block descriptor (feature) after normalization is called HOG features.

           There are two major sections geometries - rectangular section (R-HOG) and an annular section (C-HOG). R-HOG some interval substantially square grid, which can be characterized by three parameters: the number of cells per unit section, the number of cells per unit pixel, a histogram of the number of channels per cell. For example: the optimal parameters pedestrian detection is: 3 x 3 cell / block, 6 x 6 pixel / cell, 9 histogram channels. Wherein the number of the block: 3 x 6 x 9 = 162.

   (7) The HOG features within the image descriptor block all image can be obtained in series of the image (the object to be detected) of the HOG features descriptor. That is, in the detection window for all overlapping block HOG features collected, and combine them into the final feature vector for classification use.

 

4. The improved performance

    These local histogram in a wider range (or it is called interval block) normalizing the contrast (contrast-normalized), the method is employed: first calculate histograms in this interval (block) of density, then according to the density of the respective cell units do interval normalized. After normalization, better results can be obtained on the light and shadow through this change.

 

The advantages

    Compared with the other features described method, HOG has many advantages. First, since the HOG operating on the local grid of unit image, so that deformation of the optical image, and it is able to maintain good geometric invariance, both deformation appear only on a larger area of ​​space. Secondly, in the field of space sampling coarse, fine and strong local direction of the optical sampling normalized conditions, substantially only pedestrians can maintain an upright posture, pedestrians may allow some minor body movements, these subtle movements It can be ignored without affecting the detection results. HOG features are therefore particularly suited to conduct human in the detected image.

 

6. The pedestrian detection SVM general idea of ​​HOG +

   (1). HOG advance the positive and negative characteristics of the sample

   (2) Invested SVM classifier training to obtain a model

   (3) generated by the detector model

   (4) With the detector detects negative samples, to obtain hard example

   (5) Extraction of hard example HOG features, and binding characteristics with the first step of training inputs, to obtain the final detection sub.

 

Six. Bag-of-words bag of words (BOW)

1. bag-of-words model is a commonly used method of showing the document information retrieval. In information retrieval, bag-of-words model assumes that for a document, despite its word order and grammar, syntax and other factors, it merely as a collection, the emergence of a number of words in the document are independent of each word and it does not depend on whether there is another word. In other words, a document in any word in any position appear, are not subject to the influence of semantic document independently selected.

   For example, the following two documents:

    (1): Bob likes to play basketball, Jim likes too.

    (2): Bob also likes to play football games.

    Based on these two text documents, construct a dictionary:

dictionary = {1: "Bob", 2: "likes", 3: "to", 4: "play",
              5: "basketball", 6: "also", 7: "football",
              8: "games", 9: "Jim", 10: "too"}

    The dictionary contains a total of 10 different words, using the dictionary index number, the above two documents may each represented by a 10-dimensional vector (integer numbers 0 ~ n (n is a positive integer) a word in the document the number of occurrences):

    (1):  [1, 2, 1, 1, 1, 0, 0, 0, 1, 1]

    (2):  [1, 1, 1, 1, 0, 1, 1, 1, 0, 0]

    Each element in the vector represents the number of dictionary related elements appear in the document (hereinafter, will use the word histogram). However, the document can be seen in the process of constructing vectors, there is no express order of the words that appear in the original sentence (this is the bag-of-words model of one of the shortcomings, but have fundamental, even where it does not matter).

 

2. bag-of-words model occasions

   Now imagine a large collection of documents D, which a total of M documents, and all the words extracted from the document which together form a dictionary word comprising N, using the bag-of-words model, each document can be expressed as an N-dimensional vector, very good processing computer numeric vector. In this way, you can use the computer to complete the classification process massive document.

    Consider the bag-of-words model applied to the image expressed. To represent an image, the image can be regarded as a document, a collection of a number of "visual vocabulary", the same, there is no order between each visual vocabulary.

  

    Similarly, the dictionary can be as follows:

dictionary = {1: "鼻子", 2: "眼睛", 3: "头发", 4: "项链",
              5: "嘴巴", 6: "下巴", 7: "肩膀", 8: "脸蛋"}

 

     Because of that image in words like text documents are readily available. Need to be extracted from the image shown in independent visual vocabulary, which typically requires three steps:

      (1). Wherein the detection

      (2) wherein represents

      (3) generating a wordbook

      By observing will find that although there are differences between different instances of the same class goals, but still can find some common place between them, such as the face, although differences people face relatively large, but the eyes, mouth, nose and some relatively small parts, but not observed much difference can a common point between these different instances extracted as a visual vocabulary to identify this type of target.

       SIFT algorithm is the most widely used image are extracted locally invariant features, can be extracted using the same algorithm SIFT feature points from the image, as a visual vocabulary, and the build a word table, represented by a word list of words image.

 

3. bag-of-words model a three-step application

   (1) With SIFT algorithm to extract words from each category visual images, the visual vocabulary all together

   (2) using the k-means algorithm configured word list. k-means clustering algorithm is an indirect measure of the similarity between samples based. Between the visual vocabulary SIFT extracted at different distances from the vector, k-means algorithm can be used similar word semantic combined as a basic word in the vocabulary, the vocabulary of k construct containing a word list.

   (3) Using the vocabulary words representing an image. Using SIFT algorithm, a number of features may be extracted from each image, feature points can be replaced with a similar word list of words, the number of times each word in the vocabulary statistics appear in the image, so that the image is represented as a k-dimensional vector values.

 

 

                  

 

Skip to main content

Contents return some methods of image processing

Previous: Depth articles - some image processing method (ii)  elaborate and performance evaluation IOU GIOU

Next: depth articles - image processing methods (d)  elaborate pyramid image

Published 63 original articles · won praise 16 · views 5989

Guess you like

Origin blog.csdn.net/qq_38299170/article/details/104434459