Paper: Parsing R-CNN for Instance-Level Human Analysis Reading Notes

Disclaimer: This article is a blogger original article, reproduced, please attach Bowen link! https://blog.csdn.net/m0_37263345/article/details/90550582

A thesis

Parsing R-CNN for Instance-Level Human Analysis

https://arxiv.org/abs/1811.12596

 

Second, the paper notes

1, analysis of the problem of previous work

Examples of human-level analysis also problems

(1), mask branch is given a class-independent mask, but examples of analysis require more detailed features

(2), examples of the need for geometric analysis and semantic relationship of the various parts of the body (no mask)

 

2, innovative:

(1), (PPS) pyramid structures were characterized using the extracted feature map:

bbox branch and their own definition of a separate branch extract feature map, bbox use the original FPN policy (p2-p5), while using only p2 own branch of feature (larger size), because the author analyzes the data set found coco coco data inside a small proportion of the character very much.

(2) increase the resolution of ROI:

Previous work using ROIpooling 7 * 7 * 14 or 14, for object detection or object dividing these tasks do not need too much detail may apply, but for human body parts division or positioning of these key points required details of the task, the original resolution rate lost a lot of information. His own branch of ROI definition resolution becomes 32 * 32, which increases the computational complexity and memory overhead. Reducing the batch size to solve the above problem

(3), and presents a geometric context encoder module

FCN head using branch results will not be shallower ratio difference (a) different large parts of the body (b) for each geometrical contact portion, where the use of non-local (Gaussian version) structure may be beneficial (c) 32 * 32 of ROI require deeper convolution digestion.

ASPP very effective in the field of semantic segmentation

Based on the above, and proposed a non-local ASPP binding module Geometric and Context Encoding (GCE), and later joined bn

(4), longitudinal decoupling

Will put forward their own branches, divided into three small branches (a) semantic space transformation (b) GCE (c) converts semantic features to specific tasks of the three branches of the relationship before GCE, GCE module and after GCE

 

3. Experimental

 

On two other human parsed data set (CIHP, MHP v2.0) experiment:

Respectively, demonstrate the effectiveness of each innovation. The previously mentioned measures have some improvement, but the pre-training model COCO key point in the data set and the right to use the pose estimation of heavy branches to initialize the custom, this part has greatly improved. Increase the number of rounds of training has also brought some improvement.

 

On DensePose-COCO experiment:

Calculate the index has not changed, but the loss of function into a pixel-level softmax fcn inside

Got first place. . .

 

 

4. Problems?

1, ASPP

2, pixel-level softmax see FCN instead of the original loss of densepose

Guess you like

Origin blog.csdn.net/m0_37263345/article/details/90550582