Bag of words BOW (bag of words) and matlab programming

Recently bag of words to use, so contact with it, just turn the speaker Bo learn about.
After certainly be modified again, and so ran through the data and add new content.
Original 1: visual bag of words model BOW study notes matlab programming and
original 2: BOW principles and code analysis
Reference 1: visual words bags technical presentations
Reference 2: understand the bag of words model

text

BoW model profile

Bag of words used in the model is initially text categorization, documents represented as feature vectors. The basic idea is to assume that a text, ignoring its word order and grammar, syntax, just think of it as a collection of words, and every word of the text is independent. It simply is speaking both as a bag each document (because there are filled with words, so called word bags, Bag of words that is the consequent), and then look at the sack are what terms, will its classification. If the document in such words more and more pigs, horses, cattle, sheep, the valley, the land, tractors, and less words such as banks, building, car, park, we tend to judge it is a document depicting the countryside, rather than describing the town. For example, the following two documents:

Column 1
文档一:Bob likes to play basketball, Jim likes too.
Column 2
文档二:Bob also likes to play football games.

Based on these two text documents, construct a dictionary:

Column 1
Dictionary = {1:”Bob”, 2. “like”, 3. “to”, 4. “play”, 5. “basketball”, 6. “also”, 7. “football”,8. “games”, 9. “Jim”, 10. “too”}。

The dictionary contains a total of 10 different words, using the dictionary index number, the above two documents may each represented by a 10-dimensional vector (integer numbers 0 ~ n (n is a positive integer) a word in the document the number of occurrences):

 1:[1, 2, 1, 1, 1, 0, 0, 0, 1, 1]

 2:[1, 1, 1, 1 ,0, 1, 1, 1, 0, 0]

Each element in the vector represents the number of dictionary related elements appear in the document (hereinafter, will use the word histogram). However, the document can be seen in the process of constructing vectors, we do not express order of the words that appear in the original sentence (which is this Bag-of-words model of one of the shortcomings, but have fundamental even here it does not matter).

Why BOW model to describe the image

Although described SIFT features can be an image, but each is 128 SIFT dimensional vector, and an image typically include hundreds or thousands SIFT vectors, when performing the similarity calculation, the calculation is very large ,
common practice is to use a clustering algorithm clusters such vector data, and then a cluster of a cluster representative of visual word BOW in the SIFT vector map with a visual image to generate a code word sequence present, so each image will be described with only one codebook vector, so that when calculating the similarity efficiency is greatly improved.

Construction codebook step BOW

1, assuming that there are M training set images, preprocessing the training image set. Including image enhancement, segmentation, image unified format, a unified standard, and so on.

2, extracted SIFT features. For each image extracted SIFT features (SIFT how many of each image feature extraction uncertain). Each SIFT feature with a 128-dimensional descriptor vector representation, M is assumed that images of the N total extracted SIFT features.

3, of the N extracted 2 SIFT features for clustering by K-means, K-Means algorithm is a similarity measure between samples based on indirect clustering method, K is a parameter to the algorithm, the N sub-objects into K clusters, so as to have a high degree of similarity within clusters, low inter-cluster similarity. With k cluster center (the center in the BOW clustering model we call it the visual words), this code length is also k, is calculated for each of SIFT features from each image to the k visual words, and maps it to the nearest visual word (word frequency corresponding to +1 is about the visual word). After this step, each image becomes visual word and a vector word frequency corresponding to a sequence.
4, this configuration code. Normalized codebook vectors as the number of SIFT features of each image are undefined, so normalization. In other words, the number of feature changes to the frequency of each picture, so we can prevent problems because classification are not allowed to extract features caused by the different number. As the above example, the normalization is
, 1/12 *
. The image is also subject to pre-test, to extract SIFT features, such features to be mapped to the codebook vectors, the normalized codebook vectors, and finally calculate the midamble this distance corresponds to the distance of the nearest training images considered a test image matching. > Visual word sequence set eyes, nose {mouth} (k = 3), the training image set becomes:>> first image:
>> second image:
......

Of course, when the feature extraction sift may be labeled many small image patch, then extracted SIFT features for each patch.

To sum up, the whole process is actually to do three things, first the extraction of n images were extracted SIFT features, then the whole SIFT feature extraction is performed k-means clustering get k cluster centers as a visual dictionary, finally each SIFT feature points in each image table word specification of the images is calculated from it for each word in the vocabulary, recently +1, can obtain the images of the codebook. The third step is actually a statistical process, so BOW in vector elements are non-negative. Yunchao Gong in 2012 on NIPS has a binary coded image coding scheme for the quick retrieval of articles that such elements are non-negative features designed for.

There BOW MATLAB code that implements the database and pictures on this link: https://github.com/lipiji/PG_BOW_DEMO

[Turn] Here is the code I see this all the little harvest:

Combined with talk about the bag of the code model of visual word building process:
1, such as the use of sift feature extraction algorithm to extract the feature points from each picture is a visual word. In the code, each image 200 200 is the size, then the step 8, the image is divided into 16 small Patch 16, so there are 576 small patch, the key points are extracted sift performed on each small patch, each there is a small patch a critical point, so there are 576 key points, that is, each picture eventually became a 128-dimensional vector 576 (sift feature points is 128-dimensional), that is, 576 128 of this size a matrix . Pictures include training and test samples, a total of 360 pictures, so the data altogether or 360 576 128. The
2 using the K-means clustering, build vocabulary vocabulary. Code 300 is to find a cluster centers, so 300
128, a select number of cluster centers ranging from hundreds to thousands, the greater the general data, the more the cluster center. Clustering the training data is data 240 576 128 and test data 120 576 128, the maximum number of iterations 100, 300 obtained after the completion of the clustering cluster centers, each of 1 a 128-dimensional vector.
3, after use of the resulting cluster centers to get glossary of 360 picture histogram statistics, which is the key to look at 576 points in each picture from which the cluster centers and the smallest (most similar), then recently that adds 1 to the number of cluster centers between 1-300 represented, so that the finally obtained data BOW 300
360 the size of the matrix, there is 300 240 the training data, 300120 is the test data. Note that since the number of key points here are the same for each image, it is particularly critical impact of normalization is not, but if the number of images on each key point is not the same, it must be normalized, that is, the number of words becomes a word frequency is divided by the total number of points.

These are the basic process of establishing BOW, the author of the program there is a pyramid BOW program CompilePyramid, there is a spatial pyramid program. When the BOW BOW Pyramid statistical word frequency of each figure is the same, the key is the processing of these is not the same word frequency, word frequency statistics BOW is global, and Pyramid BOW As the name implies, is hierarchical, in code, first, the image is divided into 4 . 4 pieces, small word frequency statistics for each cluster center 300, and with 2 -1 weighted, then the picture is divided into two 2 300 pieces word frequency statistics of cluster centers, and treated with 2 -2 weights, and finally the whole image histogram global histogram, and weighted by 2 ^ -2, histograms finally together, so that the final data is such data 6300 to 360 * training. That is, the source data which is the results of the two bag of words, and the BOW pyramid BOW, and then used to classify OF BOW SVM + RBF kernel function, the accuracy was 77.5%, pyramid + SVM kernel function is classification accuracy was 82.5%; author and have their own definition of a function, it is called inter classification accuracy BOW + inter-core functions is 81.6667%, pyramid BOW + inter kernel function classification accuracy is the highest, to 90.8333 %.

With the continuous efforts of MATLAB, "bag of words" "BOF" has been written which can be called directly bagOfFeatures, bagOfWords

https://ww2.mathworks.cn/help/vision/ref/bagoffeatures.html?s_tid=doc_ta

Guess you like

Origin blog.csdn.net/qq_32642107/article/details/90181036
Bag
Recommended