总结(/Review): 回环检测中的位置识别（Place Recognition）

更新于2018.8

开始之前

回环检测的关键在于重新识别到同一个地方，这块领域被称为位置识别（Place Recognition)。
此文主要总结大视角变化和场景变化中的位置识别。

位置识别中的一些概念

位置识别模块分为三个部分：图像处理部分；已知地图表达部分；置信决策部分（判断该图像是否属于该地图/环境）

scene recognition:识别场景的类别（很大程度上，本质是一个分类问题）；同一场景可能来自于不同的地方？

图像识别（image recognition)又称为：Content-based image retrieval (CBIR)

place recognition:位置表示一块区域，比如厨房;识别是否是以前来过的地方（本质是一个图像检索问题）

object recognition: 目标表示一个有明显边界的饿哦东西，比如桌子，显示屏

topological informaion: the node and the relation of each node.只关心两个节点是否连通，而不关心这两个点的位置关系

metric information: distance, direction, or both on the map edge(node edge).关心节点的物理位置，节点之间的相对物理位置关系

主要困难和主要思路

回环识别面对的主要困难：
- viewpoint change (little overlap)
- condition change (appearance different)
- high-efficiency

目前为侧重应对大视角变化和场景变化的方法分为3类方法：
- 使用更好的学习型特征；
- 使用high level + local level(学习型特征+传统特征；语义型+中间网络层输出)；
- 加入三维信息，得到估计视角下的合成/渲染图

图像处理部分

位置描述分为两类：
- 选择性地挑选部分图像内容进行表达（BoW)；
- 将整幅图片作为一个向量进行处理（HOG,Gist)

others libraries: FAB-MAP(SIFT, SURF, more than 10000), SqeSLAM

其中：
鲁邦的特征有：U-SURF,基于边的特征，CNN

局部特征方法:
- 词袋方法
- fisher vector
- VLAD(Vectors of locally aggregated descriptor(less computable)

全局特征方法:
- GIST(2006)
- CNN: VGG-16 [18], Net-VLAD [1] and PoseNet,NetVLAD

词袋方法（比如BoW,FAB-MAP）不太受视角变化影响（pose invariant)

全局特征（语义特征或整张图特征）比局部特征更加容易受视角(也就是观测者/相机的位姿）影响；而后者更容易受光照影响。

但总体来讲：
局部特征对位移、旋转鲁棒性较好
全局特征对关照变换、图像模糊、大小变化鲁棒性较好

所以有结合全局和局部特征的方法。

地图表达（建图）部分

可以有单纯图像检索（image retrival),拓扑地图，拓扑度量地图三种地图表达方法。

置信决策部分（Belief Generation）

大概有两种，基于单纯图像检索图像检索和基于几何位置关系（检索离现在定位比较近的关键帧图像）。
其中一半要保证空间几何一致性（即图像匹配度最好）时间一致性（即前后几张都能几何一致）

关于变化场景（changin environments)

用机器学习的方法来做场景变化时的识别：将appearance所有可能都考虑进来，比如上午下午，光照、视角变化，其中用神经网络的方法来表征变化前后整张图的映射关系。

中等层次的特征对表面变化（appearance changes) 比较鲁棒，高层特征对视角变化比较鲁棒。

随着环境变化进行记忆和遗忘（逐渐对古老的出现极少的部分降低权重）
对同一环境存储多个表达
总的来说，可以对同一地图/位置进行多重表达，其中有的表达是长期表达，不随时间变化；有的表达是短期表达，随时间调整。而且可以通过分割的方法来将静和动的区域分割出来：有的区域变化度很频繁但可能只有几种模式（比如门经常开关，但只有几种状态）。有的部分长远来看保持出现和不出现两种状态（比如停车位上的车，要么空的，要么有车）

These CNN features yield a high matching
quality but are rather high-dimensional, i.e., comparisons are
computationally expensive

traditional image description methodology developed in the past exploit techniques from image retrieval field. Recently. the rapid advances of related fields such as object detection and image classification have inspired a new technique to improve visual place recognition system, i.e., con- volutional neural networks (CNNs)

Implementation Efficiency

时间上，一般是1s左右，比如ETH那篇semantic visual localization，以及DenseVLAD(1.4),Camera Pose Voting(3.7s), PoseNet很快0.005s

Open Source

The time means the retrieve time.

C LANGUAGE:
- Disloc(BoW paradigm and Hamming embedding,Improved in 2016)
- Dow2
- FAB-MAP,OpenFABMAP
- DenseVLAD(24/7 place recognition by view synthesis. Based on RootSIFT)
- ACG-Localizer

MATLAB：
- 2018 LosxX: Lost-Visual Place Recognition using Visual Semantics for Opposite Viewpoints across Day and Night
- SeqSLAM, OpenSeqSLAM(C++),Fast SeqSLAM:The appearance-robust methods like SeqSLAM are invariant to challenging environmental conditions, but at the cost of viewpoint-dependence and velocity-sensitivity
- bccn Bilinear CNN Models for Fine-grained Visual Recognition,8 frames/sec
- Lightweight, Viewpoint-Invariant Visual PlaceRecognition in Changing Environments(BoW+VLAD)

ConvNet method:
- 1.2018 calc: Convolutional Autoencoder for Loop Closure(HOG features; caffe; 1.47ms)
- 2.2017 place-recognition:Deep Learning Features at Scale for Visual Place Recognition; pytorch； only 10% increase
- 3.2017 vpr_relocalization; 30−80 ms
- NetVLAD(MATLAB)、NetVLAD(tensorflow)。NetVLAD比DenseVLAD快（前者0.8h,后者2.5h）。后续改进（更快）：Appearance-invariant place recognition by discriminatively training a convolutional neural
network
-place365
- retrieval-2016-icmr
- 2016 deep-retrieval（Caffe实现的提取光照、视角不变性特征）

2017 Light-weight place recognition and loop detection using road markings
IGP:Mean time/query: ~10 s(100K vocabulary, 6044 database images and 800 query images)econds, 75% of the successful queries have an error below 8 meters.Part implentation of paper 2017-Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization
InLoc_demo
CVPR2018:
Zero-Shot Visual Recognition using Semantics-Preserving Adversarial Embedding Networks

learning descriptor

2016-LIFT: Learned Invariant Feature Transform
各种特征方法（包括传统与学习型）的对比：local-feature-evaluation,hpatches-benchmark,IROS2015_On the Performance of ConvNet Features for Place Recognition
其他总结帖子：https://github.com/willard-yuan/awesome-cbir-papers
VLFeat toolbox(an popular open source library of
computer vision algorithms:BoVW, FV and
VLAD)

Others open source code

2016 tinghuiz: A deep learning framework for synthesizing novel views of objects and scenes(caffe)

open dataset

Google street view(front-end view of car, including depth map data)
Specific PlacEs Dataset
Airsim(前置视野+俯视视野)
其他综述中提及到的dataset(e.g., 2017_Place recognition-An Overview of Vision Perspective)
place_recognition: Viewpoint-tolerant Place Recognition combining 2D and 3D information for UAV navigation
Oxford Robotcar dataset
Nordland dataset(train)

大牛

Torsten Sattler：Efficient & Effective Prioritized Matching for Large-Scale Image-Based Localization、InLoc、Image-based localization using LSTMs for structured feature correlation、传统特征与深度学习特征评测等的作者

Other Materials

Traditional approaches either focus on the use of 2D3D matches, known as structure-based pose estimation or solely on 2D-2D matches (structure-less pose estimation).

F1值(F1-Measure)就是精确值和召回率的调和均值, 也就是

\frac{2}{F_1} = \frac{1}{P} + \frac{1}{R}
Max-F1 就是所有测试的F1值中最大的那个F1值

论文中用到过的语义框架

SegNet
refinenet

回环检测的验证标准

Recall-Precision curve（PR curves）
AOU:就是recall-precision曲线下部分占面积（the area under a PR curve (AUCPR) ）
mean average precision
larget precision when recall is 100%
max F1(F1: recall和precision的调和平均数; 取多次的F1)