Re-ID： Reidentification by Relative Distance Comparison 论文解析

注：刚刚看完了这篇论文，顺便整理了一下这篇论文的思路，还是得膜一下伟诗大佬，666

论文地址：Person Re-identification

论文解析

Matching people across nonoverlapping camera views at different locations and different time.
RE-ID的一个经典的简化设定是一个probe set(比如包括p1, p2, p3三张图，分别对应三个不同的人)和一个gallery set(g1, g2, g3)。p和g分别来自不同的camera view，probe set中的p是要查找的人，而gallery set中的g相当于一个小型数据库，我们要从中找到proset中的人，假设(p1, g1),(p2, g2),(p3, g3)是正确的匹配。假如我要寻找p1这个人，就是计算p1与(g1, g2, g3)的距离，然后做一个ranking。理想情况下，g1应该排在rank1。rank n accuracy指的是在rank n及其之前就找到了正确匹配的人的比例。
Challenge
- In a busy uncontrolled enviroment monitored by cameras from a distance, person verification relying upon biometric such as face and gait is infeasible and unreliable.（在一个复杂的环境下行人的生物特征不明显）
- As the transition time between disjoint cameras varies greatly from individual to individual with uncertainty, it is hard to impose accurate temporal and spatial constrains.（时间跨度不同，不能用准确的空间约束条件）
- The visual appearance features, extract mainly from the clothing and shapes of people, are intrinsically for matching people. （就是指这些特征并不具有代表性）. In addition, a person’s appearance often undergoes large variations accross non-overlapping camera views due to significant changes in view angle, lighting, background clutter, and occlusion。(这导致了不同的人在不同的camera views下比同一个人更加相似)
Two steps to deal with RE-ID
- A feature representation is computed from both the query and each of the gallery image
- The distance between each pair of potential matches is measured which is then used to determine whether a gallery image contains the same person as the query image.
现有的研究中，大多数是把研究的中心放在第一步：即如何提取可信赖的，具有区分性的特征。对于第二步，只采用了标准的距离测量方法。这样的局限性在于：
- Under severe changes in viewing conditions that can cause significant apperance variations
- Applying a standard distance measure is undesirable as it essentially treats all features equally without discarding bad features selectively in each individual matching circumstance.
所以这篇论文将研究的重心放在了第二步
- We focus on the second step of person reidentification. That is, given a set of features extracted from each person image, we seek to quantify and differentiate these features by learning the optimal distance measure that is most likely to give correct matches.
正常来说，应该这样建模：假设同一个人的图片集作为一个类。那么应该learning a distance measure，使得类内的图片距离最小，类间图片距离最大。但是由于行人重识别的四个重要特征，该建模方式仍存在缺陷。四个特征如下：
- The intraclass variation can be large and, more importantly, can vary significantly for different classes as it is caused by large and unpredictable viewing condition changes. (在视觉条件变化剧烈的情况下，类内的差异可能大过类间的差异)
- The interclass variation also varies drastically across different pairs of classes and there are often severe overlaps between classes in a feature space due to similar apperance of different people.（类间差异也会变化很大，而且类间可能会有重叠的部分，比如相似的特征）
- The training set for learning the model consists of images of matched people across different camera views. In order to capture the large intra and intervariations, the number of classes is necessarily large, typically on the order of hundreds. This represent a large scale learning problem that challenges existing machine learning algorithm.（为了捕捉到类内与类间的差异，需要有较大的数据集，即类的数量要尽可能多。然而这又是一个大规模的learning problem）
- Annotating a large number of matched people across camera views is not only tedious, but also inherently limited in its usefulness.（标注的工作沉闷，且影响了它的有用性）
在以上四个条件下，RE-ID还要面临undersample的问题(由于每个类只有a handful of images)。欠采样会导致模型过拟合的问题。尤其是前面的那种建模方式，会使得过拟合的问题更加严重。
- 原始方法：The true match is not only ranked higher but also has as small a distance to the query image as possible compared to that of wrong matches.
所以这篇论文提出了RDC模型
- The model aims to learn an optimal distance in the sense that for a given query image, the true match is desired to be ranked higher than the wrong matches among the gallery image set. The model cares less about how large the absolute distance between the pair of images for the true match.
然而，该模型面临这两个问题：
- large space complexity
- local optimum learning problem
故提出了ensemble RDC模型：
- Rather than learning a batch mode RDC, we propose learning a set of weak RDC models, each computed using a small subset of the data, and then combining them to build a stronger RDC using ensemble learning.
之后呢，作者还将RDC模型与其它RE-ID方法做了个比较：
- RDC vs Baseline method
  - RDC：perform distance learning
  - Baseline Method: base on nonlearning l1-norm distance and Bhattacharyya distance
  - 通过实验证明了perform learning distance的重要性
- RDC vs Adaboost
  - RDC performs a ranking-based soft discriminant feature selection while Adaboost perform large margin-based discriminant selection
  - RDC is able to evaluate the importance of different combinations of features(second order information), while Adaboost assumes different features are independent and selects them individually.（对于second order information, 我的理解是不仅包含feature本身，而且还包含feature间的关系）
- RDC vs PLS
  - PLS is a regression method and it can only be learned on the gallery dataset. Since there are only limited samples per person for training PLS and the people’s apperance varies largely, PLS is sensitive to the learned data and may noy generalize to new data very well.
  - RDC model are learned using an independent training set consisting of different people from those in the gallery set.
  - 我个人的理解就是：PLS只能将gallery set作为训练集，但由于欠采样问题该模型不能很好的处理新数据；而RDC模型有独立的训练集，该训练集包括了gallery set中不同的人。
- RDC vs Related Distance Learning Method(Xing’s method, LMNN, ITM, MCC)
  - RDC vs LMNN
    - only LMNN exploits relative distance comparison, but it is used as an optimization constraint rather than the main objective function, and moreover a hard rather than a soft margin measure is used to quantify each relative distance comparison.
    - LMNN将rdc作为一种优化的约束条件而不是主要的学习函数，而且它还采用了large-margin的学习方法(也就是一开始的那个有缺陷的建模)
  - RDC vs MCC
    - MCC is not a relative distance comparison-based method
    - MCC gives the most comparable results to RDC when the training set is large. However, its performance degrades dramatically when the size of the training data decreases.
    - 我的理解是在数据较多的情况下，RDC的性能要优于MCC
  - 总而言之，四个Related Distance Learning Method 都会收到欠采样的影响
- RDC vs Related Ranking Method
  - RDC vs primal RankSVM method
    - 两者的性能都比non-learning-based method 与四种distance-learning-based mthod好，但RDC的性能表现更加
    - primal RankSVM需要计算一个参数, 该模型的性能对tuning这个参数十分敏感，尤其是欠采样的时候，即该参数的tuning很重要，但这又大大增加了计算量
    - 原因在于：
    - The logistic function-based modeling that enforces a softer constraint on relative distance comparison and exploiting second-order rather than first-order feature quantification.
- RDC vs RankBoost
  - Without access to special hardware, RankBoost was only tractable for the smallest training dataset, that is means RankBoost is intractable for high-dimensional feature space.
  - And RankBoost needs to learn an optimal weak classifier at each iteration, which cause high computation cost
  - The weak ranker in RankBoost is too weak based on a single feature
  - all features are treated independently.
  - RankBoost只能处理维度低的数据集，且它的计算量很大，对所有特征同等对待，导致其性能远远弱于RDC模型
最后，作者提供了RE-ID的研究方向：
- It would be interesting to investigate how information on groups of people can assist person reidentification as contextual information.
- 人群的信息可否作为行人重识别的上下文信息？
- How to detect a group of people in practical scenarios is still an open problem
- 如何在现实的情景中检测出人群？
- In the current work, no attempt has been made to remove the background information from a person image which could typically have an negative effect on the performance of person reidentification.

以上内容皆为本人观点，欢迎大家提出批评和指导，我们一起探讨！

Re-ID： Reidentification by Relative Distance Comparison 论文解析

论文解析

猜你喜欢