A thesis
Parsing R-CNN for Instance-Level Human Analysis
https://arxiv.org/abs/1811.12596
Second, the paper notes
1, analysis of the problem of previous work
Examples of human-level analysis also problems
(1), mask branch is given a class-independent mask, but examples of analysis require more detailed features
(2), examples of the need for geometric analysis and semantic relationship of the various parts of the body (no mask)
2, innovative:
(1), (PPS) pyramid structures were characterized using the extracted feature map:
bbox branch and their own definition of a separate branch extract feature map, bbox use the original FPN policy (p2-p5), while using only p2 own branch of feature (larger size), because the author analyzes the data set found coco coco data inside a small proportion of the character very much.
(2) increase the resolution of ROI:
Previous work using ROIpooling 7 * 7 * 14 or 14, for object detection or object dividing these tasks do not need too much detail may apply, but for human body parts division or positioning of these key points required details of the task, the original resolution rate lost a lot of information. His own branch of ROI definition resolution becomes 32 * 32, which increases the computational complexity and memory overhead. Reducing the batch size to solve the above problem
(3), and presents a geometric context encoder module
FCN head using branch results will not be shallower ratio difference (a) different large parts of the body (b) for each geometrical contact portion, where the use of non-local (Gaussian version) structure may be beneficial (c) 32 * 32 of ROI require deeper convolution digestion.
ASPP very effective in the field of semantic segmentation
Based on the above, and proposed a non-local ASPP binding module Geometric and Context Encoding (GCE), and later joined bn
(4), longitudinal decoupling
Will put forward their own branches, divided into three small branches (a) semantic space transformation (b) GCE (c) converts semantic features to specific tasks of the three branches of the relationship before GCE, GCE module and after GCE
3. Experimental
On two other human parsed data set (CIHP, MHP v2.0) experiment:
Respectively, demonstrate the effectiveness of each innovation. The previously mentioned measures have some improvement, but the pre-training model COCO key point in the data set and the right to use the pose estimation of heavy branches to initialize the custom, this part has greatly improved. Increase the number of rounds of training has also brought some improvement.
On DensePose-COCO experiment:
Calculate the index has not changed, but the loss of function into a pixel-level softmax fcn inside
Got first place. . .
4. Problems?
1, ASPP
2, pixel-level softmax see FCN instead of the original loss of densepose