Visual SLAM fourteen speak speak (second edition) Notes XI

Chapter XI loop detection

Provide front-end feature point extraction and trajectory, the initial value of the map, while the back end is responsible for the optimization of all these data. If you like the VO that consider only neighbor on the adjacent time off, the error will inevitably accumulate until the next time, so that the entire SLAM will accumulate errors, long-term results will not be reliable.

 

 

 

FIG above, a represents the real track, b front end only gives the estimated neighboring frames, Pose Graph drift after optimization, c Pose Graph added after the loop is detected in the face feature point can be the same thereby eliminating accumulated error.

 

1. Methods

a simple class: The simplest way is to pass an arbitrary two image feature matching to determine which of two associated image exists according to the number of correct matches. But blindly assume that "any two images may exist loop", so that the number of detection too great, for N possible loop, we want to detect Cn2 times. In most real-time system which is not practical. The other is randomly selected historical data and loop detection, such as in five randomly selected among the n-th frame is compared with the current frame, this approach can maintain a constant amount of computation time, but this growth method of blind trial frames N , the loop will be able to get the chance of a substantial decline, making the detection efficiency is not high.

B. Based on appearance (Appearance) loopback test. It is estimated that the front end and rear end of irrelevant, only the loop detection to determine the relationship between the similarity of two images. Based on the current loop detection method in the visual appearance to become the mainstream practice SLAM, and is applied to an actual system. The core issue: how to calculate the similarity between images. A method designed to calculate a similarity score: s (A, B), but using the pixel values ​​of the gradation is unstable, so that the function can not be well reflected similarity between two images.

c. visual slam has been based on human vision as a standard, it can not yet grasp the human brain works.

 

 

 

Saying in medicine borrow. False positive (False Positive), also known as perceptual bias, and false negatives (False Negative) called perceptual variation. And we hope that the same algorithm with human judgment, so I hope TP, TN as high as possible. Then on a particular algorithm, we count it TP on a data set, the number of occurrences TN, FP, FN, and calculate two statistics: the precision and recall rate (Precision & Recall)

Precision=TP/(TP+FP),Recall=TP/(TP+FN)

And these two indicators are a good measure of an algorithm. Above similarity score of precision and recall rate is not high.

2. The bag of words model (cross and machine learning)

Words bags (Bag-of-Words), an image object is described as "What kinds of features on the image." For example, a person, a car. Also a picture is a car, two dogs.

Wherein determining the "people", "car", "dog" and so the corresponding BoW the "word." Many words are put together, they formed a "dictionary."

Determining an image which appeared in a dictionary definition of the concept - we describe the entire picture with a case word appears. This brings an image is converted into a vector description.

Comparison of the degree of similarity in the description of step.

a. dictionary, the dictionary of many-word, and each word represents a concept. Word is a combination of a certain type of feature, it is similar to the dictionary to generate a clustering problem. This is actually intersect with machine learning, unsupervised machine learning, self-rule for the machine to find data. K-means (K-means) algorithm do well.

. B K-means: In simple terms, when there are N data, want to go into k classes.

1 k randomly selected central point: c1, c2, ...... .ck

Two pairs each sample, calculating the distance between it and each of the center point, taking as its minimum classification.

3 recalculated center point of each class

4 If you are the center point of each little change, then the algorithm converges exit; otherwise, return to step 2.

c. simple and practical tree structure dictionary

The use of a k-tree to express dictionary. The idea is relatively simple, similar to the hierarchical clustering, k-means is a direct extension of. We assume there are N feature points, it is desirable to build a depth d, each bifurcated into k tree. Practices are as follows:

1) In the root node, with the K-means all samples clustered into k classes (in fact, in order to ensure uniform clustering uses k-means ++). This gave the first layer.

2) For each node of the first layer, the samples belonging to the category node K together again to obtain the next layer.

3). And so on, and finally get the leaf level. The leaf level is the so-called Words.

In this way we constructed the word in the leaf level, the use of intermediate nodes in the tree structure is available for quick lookup. Two advantages, one that can accommodate the index K d words. Second, find the speed to ensure that the number of search efficiency levels.

 

3. similarity calculation

Be assessed distinction or importance of the word, give them different weights to play better results. In the text index, a commonly used practice known as TF-IDF (Term Frequency-Inverse Document Frequency), or the translation frequency - inverse document frequency. TF part of the idea is that a word often appears in an image, it's discrimination is high. On the other hand, IDF's idea is that the lower the frequency of occurrence of a word in the dictionary, the higher the degree of distinction when it classified image.

IDF = log (n / ni) where n is the number of all features, ni is the number of features wi of a leaf node.

TFi = (ni / n) TF part refers to the frequency characteristics appearing in a single image. A picture appeared the word wi ni times, and the number of times the word appeared in a total of n.

So wi weights equal to TF multiplied by the product of the IDF

n∗=TF×IDF

4. Experimental Analysis and Discussion

In the field of machine learning, the code is error-free and satisfied with the results, we first of all doubt "whether the network structure is large enough, deep enough level whether, if enough data sample", but still in the "good model no match for bad data." In the slam, the first from this principle is still starting,

a. increase the size of the dictionary, can often be significantly improved image ratings.

B. keyframes process, if the key frame is selected too close, so will inevitably lead to excessively high similarity between two keyframes is not easy to detect the history data loop. Therefore, the detection loop for the frame is preferably a sparse number, not the same as each other, but also covers the entire environment.

C. After a validation word bag loop detection. One is the establishment of the loop caching mechanism, consistency check on time. Single detected loop and does not constitute a good restraint, and for some time has been detected in the loop, the loop is correct. Another method is the detection of spatial coherence, that is detected by the loop feature matching two frames, estimating motion of the camera. then. Pose Graph put in motion before then, check whether previous estimates differ widely.

Relations with machine learning d.

From the bag of words model, it in itself is a non-supervised machine learning process - to build the equivalent of dictionary feature descriptors are clustered, but the tree is just a data structure of the class of poly quickly find it. Bag of words method on object recognition problem was not as good neural networks, machine learning future is so deep learning defeated worth seeing.

 

Guess you like

Origin www.cnblogs.com/Lei-HongweiNO11/p/11615861.html