Neural Person Search Machine（神经行人搜索器）

Abstract

研究环境：in the wild

思想：在整张image中每次递归缩小行人搜索的区域，直至精确定位到targetperson，在每次递归搜索中都会充分利用query和texture cues信息。

行人搜索器实现了行人搜索的本地化递归。受益于神经搜索机制，NPSM能够自动的有选择性的将它的焦点从一个松散的区域缩小到包含target的紧密区域。这个过程中，NPSM使用内部原始memory component来记忆queryrepresentation，该query representation能调制注意力并增强对其他区域的稳健性。

benchmarkdatasets：CUHK-SYSU Person Searchdataset And PRW dataset.

evaluationprotocols：mAP And top-1

1. Introduction

Person search = person detection + re-identification

Distracting factors: large appearancevariance across multiple cameras, low resolution, cluttered background,unfavorable camera setting, etc.

Pioneer work: “Joint Detection andIdentification Feature Learning for Person Search”. In this paper, Xiao adoptedthe end-to-end person search model based on the proposed Online InstanceMatching (OIM) loss function to jointly train person detection andre-identification networks. 基本流程是：首先检测出candidatepersons，然后将query和candidate做比较，最后在rand中选择first image作为search result.

Drawback: This pipeline mentioned aboveassumed that within an image, the target person only appears at a singlelocation.

本文中提出的方法和人类处理复杂的视觉信息的神经系统更加相似，都是通过不断缩小search region并在更小范围内努力寻找target person。Such a coarse-to-fine search process is intuitively useful forexisting person search solutions.

在构建行人搜索模型时，需要解决两个问题：1、将query person的信息整合到searchprocess中作为memory。2、在每次的递归缩小搜寻范围的过程（coarse-to-finesearch process从粗到细的搜寻进程）中根据memory的引导选择出subregion

文中提出的NPSM是在Conv-LSTM上改进的，它包含每个person的上下文信息，并且将query person存储到外存中来帮助模型选择targetregion。实验表明，这个方法比其他方法效果更好。

本文的贡献：

1) 将行人搜索重新定义为递归的聚焦于targetregion的过程。

2) 推出一种新方法：用contextualinformation来减小distracting factors的影响。（原文是：We coin a novel method more robust to distracting factors benefitingfrom contextual information不知道这样理解对不对）

3) NPSM将query person information存储到primitive memory中来引导模型递归的选择effective regions。

2. Related Work

介绍person search、person detection、re-identification、LSTM based attention methods的相关工作，以及突出本文提出的新模型NPSM的优势。

1) 在解决行人搜索问题上。前人的出发点是：person detection + person re-identification；而本文的出发点是：gradually removing interference or irrelevant target persons for thequery person。

2) 基于LSTM的attention methods中，都采用了blindattention mechanism，但是NPSM是一个query-aware model。

3. Proposed Neural Person Search Machines

3.1 Architecture Overview

架构有两个组件，Primitive Memory和Neural Search Networks。

3.2 person search with NPSM

1. 图中q指query feature，Res50 Part1对应ResNet-50中的conv1到conv4_3，Res50 Part2对应ResNet-50中的conv4_4到conv5_3。

输入query image，通过Res50 Part1提取出其特征图q。NSN中的cell和gates定义如下：

其中，*表示卷积操作，⊙代表哈达玛积（两矩阵相乘的结果为对应位置元素乘积），w代表二维的卷积核，是上一步缩小区域的特征图代表在t时刻的输入，都是保持空间维度的三维tensor分别表示t时刻的input gate、forget gate、output gate、memory cell、hidden state

2. This section introduces how thesubregion of each recursive time-step is generated and shrunk from the biggerregion of the previous time-step.

2.1 如何选择出每一步的candidate subregion？：

总体思路很简单：先分离出很多boundingbox，然后根据聚类的思想将这些bounding box聚合成C个subregion就可以了。

详细步骤：定义候选region为：R = (min(θx1), min(θy1), max(θx2), max(θy2))，然后根据区域R中包含的每个包围框的关系分离出候选子区域（candidate subregions），在选择候选子区域的过程中使用欧几里得距离来度量proposal bounding box之间的距离关系，然后将距离相对较近的proposal bounding box聚合成一个candidate subregion，最后形成了C个subregion。

2.2 如何从candidate subregion中选择target region？

先将candidate subregion放入Res50-Part1中提取出feature map，然后放入ROL pooling layer使feature map的size为K*K*D，接着将提取出的region feature 与q共同放入NSN中，NSN使用卷积层将两者整合并产生attention maps。

如何计算attention score map？

上图中的m,n分别对应于subregion的height和width

NSN选择平均分最高的subregion作为下一步搜索的parentregion，不断重复上面的步骤，直到search path达到最终的提议。

2.3 Identification Net

它以Res50-Part2的输出作为输入，它由全局平均池化层和一个256维全连接层组成。然后计算查询人的特征与最终人的搜索结果之间的余弦相似度作为评估。