2018 ECCV-Beyond Part Models:Person Retrieval with Refined Part Pooling

Motivation

基于part-level特征的re-id方法现在性能不错，但是利用了额外的线索(pose estimation等)，同时因为domain的差异性，产生part的模型精度相对不高容易引入噪声影响（比较区域的内容不一致性）最终re-id的性能。==> 是否一定需要额外的线索？如何更精准定位？

Contribution

提出了Part-based Convolution Baseline(PCB)通过对特征图进行均等划分来学习part-level特征（不需要额外线索，在Market1501上rank1：92.3%，mAP：77.4%超过了大多数SOTA方法
为了提高划分区域的内容一致性，本文提出了adaptive pooling method来精修均等划分的区域，通过该方法在Market1501上PCB rank1：92.3%，mAP：81.6%,产生了新的SOTA。

1.Introduction

re-id定义
基于深度学习的part-level特征是目前的SOTA方法
学习到具有判别力part特征基本的条件：对于part的精确定位
- 利用额外的线索(pose estimation)，数据集之间的bias定位精度有影响，姿势检测带来的噪声
- attention机制

动机 + 贡献

2.Related Work

Hand-crafted part features for person retrieval
Deeply-learned part features
基于深度学习的两大优势：
- 有很强的判别性
- 能够对人体进行解析，能够得到人体部位特征，但是存在数据集gap
Deeply-learned part with attention mechanism
PAR和本文工作类似，都利用了part-classifier来引导soft partition
不同点：

- PAR 本文方法

Motivation 直接学习对齐区域 refine pre-partitioned parts

Woking mechanism unsupervised manner semi-supervised

Traning process / 引导学习

Performance better

-	PAR	本文方法
Motivation	直接学习对齐区域	refine pre-partitioned parts
Woking mechanism	unsupervised manner	semi-supervised
Traning process	/	引导学习
Performance		better

3.PCB：A Strong Convolutional Baseline

3.1.Structure of PCB

Backbone network.
图像分类网络，只保留卷积层，本文主要使用ResNet50
From backbone to PCB *
对backbone进行了修改，只保留了GAP层之前的，输出的3D-Tensor– $\mathbf{T}$
columns vector: 沿着通道轴的激活单元构成的向量
将 $\mathbf{T}$ 划分成 $p$ 个水平条，通过average pooling将每个水平条中的列向量汇总为单个part-level向量 $\mathbf{g}_i(i=1,2,...,3,p,)$
利用1x1卷积来降低 $\mathbf{g}$ 的维度得到256-dim的 $\mathbf{h}$ ,再将每个 $\mathbf{h}$ 输入分类器对身份进行分类
训练阶段：最小化 $p$ 个分类器的Cross-Entropy损失之和
测试阶段：将每一个 $\mathbf{g} \ or \ \mathbf{h}$ 拼接起来得到最终的描述符 $\mathcal{G \ or \ H}, \mathcal{G} = [\mathbf{g}_1, \mathbf{g}_2,...,\mathbf{g}_p] or \mathcal{H} = [\mathbf{h}_1, \mathbf{h}_2,...,\mathbf{h}_p]$

3.2. Important Parameters

input image size $[\mathbf{H}, \mathbf{W}]$ :本文为384 x 128，高宽比为3:1
spatial size of $\mathbf{T}$ :等同下采样率，较低的下采样率能带来精度上的提升，本文移除了backbone最后一个空间下采样操作，最终 $\mathbf{T}$ 的大小为24 x 8
$\mathbf{T}$ 被均等划分为6个水平条

3.3. Potential Alternative Structures

Variant 1. 训练阶段对 $\mathbf{h}$ 取平均得到 $\hat{\mathbf{h}}$ ,只使用一个分类器，测试阶段仍然对 $\mathbf{g}\ or\ \mathbf{h}$ 进行拼接
Variant 2. PCB所有的分类器共享全连接层参数

4. Refined Part Pooling

4.1. Within-Part Inconsistency

within-part inconsistency定义：在相同part中的列向量 $f$ 应该相似并且与其他part的不相似
在PCB训练收敛后，通过计算余弦距离比较每个与的相似度，若距离较小表明接近相应的part，如下图:
- 相同水平条的大多数列向量在一起形成了簇
- 存在许多outliers，misalign

4.2. Relocating Outliers

对所有的列向量 $f$ 进行分类，基于已学习的 $\mathbf{T}$ ，使用一个带softmax的线性层:
$P (P_{i} | f) = s o f t m a x (W_{i}^{T} f) = \frac{e x p (W_{i}^{T} f)}{\sum_{j = 1}^{p} e x p (W_{j}^{T} f)}$ $P(P_i|f) = softmax(W_i^Tf) = \frac{exp(W_i^Tf)}{\sum_{j=1}^pexp(W_j^Tf)}$
对于每个 $\mathbf{T}$ 中的列向量 $f$ 以及 $f$ 属于part $P_i$ 的概率，每一个 $P_i(i=1,2,...,p)$ 由 $P(P_i|f)$ 作为采样权重从所有的列向量 $f$ 采样得到
$P_{i} = {P (p_{i} | f) \times f, \forall f \in F}$ $P_i = \{P(p_i|f) \times f, \forall f \in F \}$
通过上述方法，refined part pooling利用“soft”与adaptive partition来精修原本“hard”与uniform partition，并重新分配outlier

4.3. Induced Training for Part Classifier

part分类器的权重 $W$ 缺少监督信息训练，本文采取induced training过程

4.4. Discussions on Refined Part Pooling

对于是否需要induced training进行了实验，说明了induced training的对性能提升的帮助

5. Experiments

5.1. Datasets and Settings

Market 1501
DukeMTMC-reID
CUHK03

5.2. Implementation details

Implementation of IDE for comparison.
本文实现了一版优化的IDE作为比较
Training.
水平翻转与归一化
batch size 64
60 epochs
base learning 0.1，在40个epochs后decay to 0.01
backbone pretrained on ImageNet, pre-trained layer学习率为base learning的十分之一
用了10epochs训练RPP，learning rate为0.01

5.3. Performance evaluation

PCB is a strong baseline.
PCB相比IDE提升较大，并且只是典型的分类网络，是一个很好的baseline
Refined part pooling (RPP) improves PCB especially in mAP
RPP对mAP的提升相比rank acc更大，说明RPP对寻找具有挑战的图像对帮助更大
The benefit of using $p$ losses.
与Variant 1只用一个loss的比较，说明对每个part使用一个loss对学习part特征的重要性
The benefit of NOT sharing parameters among iden- tity classifiers.
与Variant 2比较说明了共享参数不利于part feature的学习
Comparison with state of the art.

5.4. Parameters Analysis

The size of images and tensor T .
测试了输入大小从192 X 64到576 X192以96 X 32做间隔以及下采样率降低一半的效果，如下图
大尺寸图片与较小的采样率对精度都有提升，使用384 x 128输入与一半的下采样率同576 X 192与原始下采样率有一样的精度
The number of parts $p`.
对 $p$ 不同的取值做了实验

5.5. Induction and Attention Mechanism

对induced training的alation实验：
1）PCB效果总是优于PAR
2）PCB通过RPP带的attention机制来聚焦在不同的part
3）induction procedure对于RPP的学习很重要

6. Conclusion

总结了本文的贡献

2018 ECCV-Beyond Part Models:Person Retrieval with Refined Part Pooling

Motivation

Contribution

1.Introduction

2.Related Work

Hand-crafted part features for person retrieval

Deeply-learned part features

Deeply-learned part with attention mechanism

- PAR 本文方法 Motivation 直接学习对齐区域 refine pre-partitioned parts Woking mechanism unsupervised manner semi-supervised Traning process / 引导学习 Performance better

3.PCB：A Strong Convolutional Baseline

3.1.Structure of PCB

Backbone network.

From backbone to PCB *

3.2. Important Parameters

3.3. Potential Alternative Structures

4. Refined Part Pooling

4.1. Within-Part Inconsistency

4.2. Relocating Outliers

4.3. Induced Training for Part Classifier

4.4. Discussions on Refined Part Pooling

5. Experiments

5.1. Datasets and Settings

5.2. Implementation details

Implementation of IDE for comparison.

Training.

5.3. Performance evaluation

PCB is a strong baseline.

Refined part pooling (RPP) improves PCB especially in mAP

The benefit of using p p p losses.

The benefit of NOT sharing parameters among iden- tity classifiers.

Comparison with state of the art.

5.4. Parameters Analysis

The size of images and tensor T .

The number of parts $p`.

5.5. Induction and Attention Mechanism

6. Conclusion

猜你喜欢

- PAR 本文方法

Motivation 直接学习对齐区域 refine pre-partitioned parts

Woking mechanism unsupervised manner semi-supervised

Traning process / 引导学习

Performance better

The benefit of using $p$ losses.