visual bag of words 词袋用于SLAM重定位

DBOW

Binary features
Image database
Loop detection

A. Data base query
B. Matching grouping
C. Temporal consistency
D. Efficient geometrical consistency

结果分析

ORBSLAM2 中的Relocalization

根据词袋计算得到候选
下一步计算位姿

使用方法

1. 创建图像数据集并且提取对应的特征点数据
2. 创建Vocabulary
3. 新图像匹配
4. 增加新图片进数据库

VINS 的重定位模块
使用FBOW匹配特征点

1. 首先整理关键帧图像数据库
2. 提取新输入图片的向量
3. 从数据库中匹配得到匹配值最高的图片
4. 选取candidate进行特征点匹配

ORBSLAM2 implementation DOW2

DBOW

bags of binary words for fast place recognition in image sequence (https://ieeexplore.ieee.org/document/6202705) 这篇文章参考了前文，是ORBSLAM2中重定位模块的基础。首先按照文章的结构介绍这一篇文章。

Binary features

文章中使用FAST特征点和BRIEF描述子。
FAST和BRIEF特征点计算速度快。
使用二进制的BRIEF描述子计算二进制距离可以使用位运算符（xor），计算hamming距离。

Image database

来自原文章

预先离线收集图像，建立vocabulary。主要包括Inverse index 和Direct index。Inverse index是针对每一个词袋的单词，并且记录每个单词在每个关键帧的权重。Direct index是对每个层的记录，记录了在一张图片在当前层有哪些节点包含了图片中的特征word。
使用k means和k medians clustering。使用tf-idf作为权重。
Inverse index是为了以后计算中更快的访问每个单词的权重。
Direct index是为了更加方便的存储每张图片的每个节点的特征信息。
对于一张新图片，对每个特征点在词袋中寻找hamming距离最近的word。最终encoding成一个t维的向量（t为单词的数量）。

Loop detection

A. Data base query

简单的来说就是寻找队列中距离最近的一个关键帧。
文章中使用一个归一化的score。具体定义可以参见原文章。

B. Matching grouping

如果当前的图片和候选人X很“接近”，那么当前图片和X邻近的的一些关键帧也会同样接近。这不是我们想要的。
所以文章提出讲候选图片先grouping据类成一个个island。每个island中只会有一个最终的结果。

C. Temporal consistency

如果当前图片和某一个候选人很接近，那么上一帧应该和这个候选人很接近才对。
所以下一步的检验是候选人帧和上一帧的匹配检测。
最终得到最终候选人。

D. Efficient geometrical consistency

在获取最终的候选人之后要进行几何验证。
首先是使用之前的Direct index找到可能相似的特征匹配候选。
使用RANSAC fundament matrix匹配。

结果分析

BRIEF特征点找到的匹配大多数是中距离和远距离的。因为BRIEF没有scale信息。
SURF特征点点对近距离的特征效果更好。

ORBSLAM2 中的Relocalization

根据词袋计算得到候选

这一个流程基本和前面的ABC思想一致。
在所有关键帧中找与当前帧有公用单词的帧。
提取中存在足够过共有单词的关键帧。
计算相似分数，并且根据co-visibility累积分数。
返回所有高分的候选。

下一步计算位姿

类似上面的C步骤，使用Direct Index寻找特征点匹配。
使用PnP RANSAC，寻找更加准确的匹配点，并且设置位姿初始值。
Pose optimization （bundle adjustment）优化位姿，得到最终结果。

使用方法

下面以FBOW为例子，介绍BOW的整个使用步骤。

1. 创建图像数据集并且提取对应的特征点数据

提取方式有很多，这里就不赘述了。

2. 创建Vocabulary

根据上面提取的特征点数据，将描述子的格式转变为一个vector。然后选择适当的创建参数，最后开始创建。整个过程需要一段时间完成。但是这是一个一次性的工作，一旦建立完成，就不需要再重复了。
这一步的图片数据库选择（图片的分布，数据量大小等），会影响之后的匹配结果，所以一定要仔细思考慎重选择。

    // vSIFTdescritpors is a set of sift feature points
    std::vector<cv::Mat> vSIFTdescritpors;

    // the vocabulary creation options
    fbow::VocabularyCreator::Params params;
    params.k = 10;
    params.L = 6;
    params.nthreads = 4;
    params.maxIters = 11;
    params.verbose = true;

    // create the creator
    fbow::VocabularyCreator voc_creator;
    std::cout << "Creating a " << params.k << "^" << params.L << " vocabulary..." << std::endl;

    // record time
    auto t_start=std::chrono::high_resolution_clock::now();
  
    // create the sift feature vocabulary
    try{
        voc_creator.create(voc,vSIFTdescritpors,"sift", params);
    }catch(std::exception &ex){
        std::cerr << ex.what() << std::endl;
    }

    // end of creation
    auto t_end = std::chrono::high_resolution_clock::now();
    std::cout << "time = " << double(std::chrono::duration_cast<std::chrono::milliseconds>(t_end-t_start).count()) << " msecs ";
    std::cout << " nblocks = " << voc.size() << std::endl;
    std::cerr << "Saving " << save_path << std::endl;
  
    // save the created vocabulary to file for further use
    voc.saveToFile(save_path);

3. 新图像匹配

同样，先提取新图片的特征点，转换为FBOW需要的格式，再调用 " voc.transform(desCV);" 将描述子转换为FBOW的单词出现的二进制向量。最后使用“vv.score(vv, keyframesBOW[i]);”将这个向量分别与词袋中的其他图片比较，得到新图片与旧图片的分数比较。最后可以排序这些分数，得到我们想要的结果。

void Ulocalization::MatchImage(FeatureDescriptors &descriptors)
{

    std::cout << " [FBOW] Register new image. " << std::endl;
 
    // transform the descriptor type
    cv::Mat desCV; 
    eigen2cv(descriptors, desCV);

    fbow::fBow vv;
    int avgScore = 0;
    int counter = 0;

    std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();

    vv = voc.transform(desCV);

    std::map<double, int> score;

    // rank the scores
    int numTop = 5;
    int bestIdx = 0;
    double bestScore = 0;
    for (size_t i = 0; i<keyframesBOW.size(); ++i){
        // calculate the score
        double score1 = vv.score(vv, keyframesBOW[i]);
        // TODO: rank the socre 
        // or we can just take the highest score as result
        if (score1 > bestScore){
            bestScore  = score1;
            bestIdx = i;
        }
    }

    
    std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();

    double ttrack= std::chrono::duration_cast<std::chrono::duration<double> >(t2 - t1).count();

    std::cout << "          using time : " << ttrack << " seconds. " << std::endl;
}

4. 增加新图片进数据库

也可以增加新图片进之前建立的数据库Vocabulary，代码见下面。

void Ulocalization::AddImagesToVoc()
{
    std::cout << " [FBOW] Add the dataset images to the directionary. " << std::endl;

    std::chrono::steady_clock::time_point t1 = std::chrono::steady_clock::now();

    keyframesBOW.clear();

    for (const auto idx : framesIds ){
        cv::Mat desCV; //  = new cv::Mat();
        FeatureDescriptors descriptors = database->ReadDescriptors(idx);
        eigen2cv(descriptors, desCV);
        fbow::fBow vv = voc.transform(desCV);
        keyframesBOW.push_back(vv);
    }

    std::chrono::steady_clock::time_point t2 = std::chrono::steady_clock::now();

    double ttrack= std::chrono::duration_cast<std::chrono::duration<double> >(t2 - t1).count();

    std::cout << " [FBOW] done, using time : " << ttrack << " seconds. " << std::endl << std::endl;
}

VINS 的重定位模块

在这里插入图片描述

VINS的重定位线程的调用如上图的左半边所示。主要是启动一个loop_closure的thread线程。在这个线程中，会从bufffer中调取keyframe关键帧，输入到LoopClosure方法中去寻找回环。最终调用的函数是在TemplatedLoopClosure中的 detectloop函数。

将描述符转换为bowvector和featurevector（这是dbow的类型）。

BowVector bowvec;
FeatureVector featvec;

if(m_params.geom_check == GEOM_DI)
  m_database->getVocabulary()->transform(descriptors, bowvec, featvec,
    m_params.di_levels);
else
  m_database->getVocabulary()->transform(descriptors, bowvec);

如果查询id足够大。将转换后的两个向量查询到单词包数据库中查找匹配项。
* 无论如何，将这两个向量添加到单词包数据库中。

    QueryResults qret;
    m_database->query(bowvec, qret, m_params.max_db_results, max_id);

    // update database
    m_database->add(bowvec, featvec); // returns entry_id

删除得分较低的候选人，然后将匹配指数初始化为得分最高的候选人，如果存在的话。
* 并将查询结果转换为一个“island”向量（即按id的升序将所有candidate分块）
* 找到得分最高的island。

          // compute islands
          vector<tIsland> islands;
          computeIslands(qret, islands);

const tIsland& island = 
              *std::max_element(islands.begin(), islands.end());

通过isgeometricallyconsistent_DI检查几何一致性。
* 使用得分最高的岛（我们在上面找到）来检查

         if(m_params.geom_check == GEOM_DI) {
              // all the DI stuff is implicit in the database
              detection = isGeometricallyConsistent_DI(island.best_entry, 
                keys, descriptors, featvec, cur_pts, old_pts);
          } else {
              detection = true;
              //printf("don't check detec true\n");
          }

保存关键点、描述子（用于几何检查）和这个bowvector（用于nss规范化）。

  // update record
  // m_image_keys and m_image_descriptors have the same length
  if(m_image_keys.size() == entry_id) {
    //printf("image size = entry_id %d\n", entry_id);
    m_image_keys.push_back(keys);
    m_image_descriptors.push_back(descriptors);
  } else {
    //printf("image size != entry_id %d\n", entry_id);
    m_image_keys[entry_id] = keys;
    m_image_descriptors[entry_id] = descriptors;
  }
  
  // store this bowvec if we are going to use it in next iteratons
  if(m_params.use_nss && (int)entry_id + 1 > m_params.dislocal) {
    m_last_bowvec = bowvec;
  }

使用FBOW匹配特征点

1. 首先整理关键帧图像数据库

    keyframesBOW.clear();
    keyframesBOW.reserve(keyframeSize);

    for (int i = 0; i < keyframeSize; i ++){
        fbow::fBow vv = vocFBOW.transform(vKeyFrames[i].descriptors);
        keyframesBOW.push_back(vv);
    }

2. 提取新输入图片的向量

    fbow::fBow vv;
    vv = vocFBOW.transform(newImage.descriptors);

3. 从数据库中匹配得到匹配值最高的图片

    BOWResult *bowresults = new BOWResult[keyframeSize];

    for (size_t i = 0; i<keyframesBOW.size(); ++i){
        double score = vv.score(vv, keyframesBOW[i]);
        bowresults[i].index = i;
        bowresults[i].score = score;
    }

    // rank the scores
    // TODO: we can delete the low scores, then loop, which will be faster than sort
    std::sort(bowresults, bowresults + keyframeSize);

其中BOWResult定义如下：

struct BOWResult
{
    int index;
    double score;
};

// compare the score of the result
// define the operator for using std::sort
bool operator<(const BOWResult &a, const BOWResult &b) {
  return a.score > b.score;
}

4. 选取candidate进行特征点匹配

这里尝试使用词袋的数模型来加速匹配，但是感觉没有得到理想的效果。如果有大神路过，发现我的逻辑问题，请一定留步指导一下！谢谢！

我的想法主要是对每个特征点的词袋向量循环，找到在同一个枝叶下的特征点，然后再比较他们的描述子。如果描述子的距离足够近，那就认为他们是匹配的。代码和具体的解释如下：

std::vector<FBOWMatch> OfflineMap::FindMatchBow(GKeyFrame &newImagekf, const fbow::fBow &newimage, const int oldIndex)
{
    fbow::fBow &oldimage = keyframesBOW[oldIndex];
    // 准备循环的开始结尾flag
    fbow::fBow::const_iterator newimage_it = newimage.begin();
    fbow::fBow::const_iterator oldimage_it = oldimage.begin();
    const fbow::fBow::const_iterator newimage_end = newimage.end();
    const fbow::fBow::const_iterator oldimage_end = oldimage.end();

    // 用来记录当前的id位置，以便找到对应的描述子
    int index_new = 0;
    int index_old = 0;

    //double score = 0;
    int count = 0;
    std::vector<FBOWMatch> fbowMatches;
    while(newimage_it != newimage_end && oldimage_it != oldimage_end){
        //这里可以得到他们的词袋向量，可以用来计算他们的匹配度（相乘即可）
        //const auto& vi = newimage_it->second;
        //const auto& wi = oldimage_it->second;

        if(newimage_it->first == oldimage_it->first){
            // 这里是一些debug输出
/*
            std::cout << " [MATCH] find features in the same node. " << index_new << "th feature in new image in "
                      << newimage_it->first << "th node, and " << index_old << "th feature in old image in "
                      << oldimage_it->first << "th node." << std::endl;
*/
            // in the same node -> matched, vi * wi is their match score
            // 这里计算score，可以反应出在词袋中两个点的近似程度，fbow
            // 的源码就是使用这种方法来计算分数的
            //score = vi * wi;
            //std::cout << "       score is : " << score << std::endl;

            // 找到对应的两个描述子并且比较他们的距离。
            // 这里计算距离的函数是另外定义的距离。
            const cv::Mat &desNew = vKeyFrames[oldIndex].descriptors.row(index_old);
            const cv::Mat &desOld = newImagekf.descriptors.row(index_new);
            const int dist =  DescriptorDistance(desNew,desOld);
            //std::cout << "      descriptor distance is : " << dist << std::endl;

            // 如果距离足够小，则认为是良好的匹配点
            if(dist < TH_HIGH){
                FBOWMatch fbowMatch(index_new, index_old, dist);
                fbowMatches.push_back(fbowMatch);
                count++;
            }

            // 如果两个向量不在用一个节点，则继续寻找下一对的匹配
            // move v1 and v2 forward
            ++newimage_it; ++index_new;
            ++oldimage_it; ++index_old;
        }else if(newimage_it->first < oldimage_it->first){
            // move v1 forward
            while(newimage_it != newimage_end && newimage_it->first < oldimage_it->first){
                ++newimage_it;  ++index_new;
            }
        }else{
            // move v2 forward
            while(oldimage_it!=oldimage_end && oldimage_it->first<newimage_it->first){
                ++oldimage_it; ++index_old;
            }
        }
    }

    // 最后输出结果
    std::cout << " [MATCH] found " << count << " matches." << std::endl;
    return fbowMatches;
}

ORBSLAM2 implementation DOW2

Finally I checked the implementation of DBOW2 in ORBSLAM, the code is as following:

int SearchByBoW(GKeyFrame* pKF,GKeyFrame &F, vector<GMapPoint*> &vpMapPointMatches)
{

    int HISTO_LENGTH = 30;
    int TH_LOW = 50;
    bool mbCheckOrientation = true;
    float mfNNratio = 0.9;

    // the mappoint associated with the keyframe's keypoints, and the BOW vector
    const vector<GMapPoint*> vpMapPointsKF = pKF->GetMapPointMatches();
    const DBoW2::FeatureVector &vFeatVecKF = pKF->mFeatVec;

    // initialize the match results
    vpMapPointMatches = vector<GMapPoint*>(F.keypoints.size(),static_cast<GMapPoint*>(NULL));


    int nmatches=0;

    vector<int> rotHist[HISTO_LENGTH];
    for(int i=0;i<HISTO_LENGTH;i++)
        rotHist[i].reserve(500);
    const float factor = 1.0f/HISTO_LENGTH;

    // We perform the matching over ORB that belong to the same vocabulary node (at a certain level)
    DBoW2::FeatureVector::const_iterator KFit = vFeatVecKF.begin();
    DBoW2::FeatureVector::const_iterator Fit = F.mFeatVec.begin();
    DBoW2::FeatureVector::const_iterator KFend = vFeatVecKF.end();
    DBoW2::FeatureVector::const_iterator Fend = F.mFeatVec.end();

    // loop through the feature vectors of the two keyframes
    while(KFit != KFend && Fit != Fend){
        if(KFit->first == Fit->first){
            const vector<unsigned int> vIndicesKF = KFit->second;
            const vector<unsigned int> vIndicesF = Fit->second;

            // for each vector pair, loop through all their elements
            for(size_t iKF=0; iKF<vIndicesKF.size(); iKF++) {
                // find the corresponding mappoint and its descriptor
                const unsigned int realIdxKF = vIndicesKF[iKF];
                GMapPoint* pMP = vpMapPointsKF[realIdxKF];

                if(!pMP)
                    continue;

                const cv::Mat &dKF= pKF->descriptors.row(realIdxKF);

                int bestDist1 = 256;
                int bestIdxF  = -1 ;
                int bestDist2 = 256;

                for(size_t iF=0; iF<vIndicesF.size(); iF++){
                    const unsigned int realIdxF = vIndicesF[iF];
                    // if already matched
                    if(vpMapPointMatches[realIdxF])
                        continue;

                    const cv::Mat &dF = F.descriptors.row(realIdxF);
                    const int dist =  DescriptorDistance(dKF,dF);

                    if(dist < bestDist1){
                        bestDist2 = bestDist1;
                        bestDist1 = dist;
                        bestIdxF = realIdxF;
                    } else if(dist < bestDist2) {
                        bestDist2 = dist;
                    }
                }

                // check the best match (with the lowest score)
                // need to be standout through all the keypoints -> bestdist < w*secondbestdist
                if(bestDist1<=TH_LOW){
                    if(static_cast<float>(bestDist1) < mfNNratio*static_cast<float>(bestDist2)){
                        vpMapPointMatches[bestIdxF]=pMP;
                        const cv::KeyPoint &kp = pKF->keypoints[realIdxKF];

                        if(mbCheckOrientation){
                            float rot = kp.angle-F.keypoints[bestIdxF].angle;
                            if(rot < 0.0)
                                rot += 360.0f;
                            int bin = round(rot*factor);
                            if(bin == HISTO_LENGTH)
                                bin = 0;
                            assert(bin >= 0 && bin < HISTO_LENGTH);
                            rotHist[bin].push_back(bestIdxF);
                        }
                        nmatches++;
                    }
                }
            }
            KFit++;
            Fit++;
        } else if(KFit->first < Fit->first) {
            KFit = vFeatVecKF.lower_bound(Fit->first);
        } else {
            Fit = F.mFeatVec.lower_bound(KFit->first);
        }
    }

    if(mbCheckOrientation){
        int ind1=-1;
        int ind2=-1;
        int ind3=-1;

        ComputeThreeMaxima(rotHist,HISTO_LENGTH,ind1,ind2,ind3);

        for(int i=0; i<HISTO_LENGTH; i++){
            if(i==ind1 || i==ind2 || i==ind3)
                continue;
            for(size_t j=0, jend=rotHist[i].size(); j<jend; j++){
                vpMapPointMatches[rotHist[i][j]]=static_cast<GMapPoint*>(NULL);
                nmatches--;
            }
        }
    }

    return nmatches;
}

We can easily find the differences between its implentation and mine, I only used the vector score result, however the ORBSLAM2 used its inverse index to offer more possibility to find match.

So I went to check the fbow source code. I found I class called fbow2 within it, which could offer the inverse index (used in the upper function), however when I try to find its usage, I failed. As a result, fbow does not fully support to find corresponding using bow vector(or maybe I have missed some part, if it is the case, please let me kown, thanks).

hhhliuye

发布了16 篇原创文章 · 获赞 4 · 访问量 2203

私信关注

SLAM BOW词袋重定位