Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

2D人体姿态估计，openpose中人体姿态估计部分的主要组成。 **仅作为个人备忘录** 这里记录的是当时看论文记录在笔记本上的内容。后续会分析代码（主要会记录下数据处理部分），这篇论文阅读大多数内容来自 http://blog.csdn.net/yengjie2200/article/details/68064095

Code: https://github.com/CMU-Perceptual-Computing-Lab/caffe_rtpose
Openpose: https://github.com/CMU-Perceptual-Computing-Lab/openpose
参考blog：http://blog.csdn.net/yengjie2200/article/details/68064095

loss function：

这里，loss方程有一个空间上的加权weight spatially，是因为有些数据集并没有完全标记所有的人，用其提供的mask说明有些区域可能包含unlabeled的人。W是binary mask。在没标记的位置W为0。
最终的目标函数是将各个stage的loss求和：

Confidence Maps for Part Detection

产生分布，取最大值

每一个body part(j)算一个confidence map。所以有多少个part（关节），就有多少个相对应part的confidence map。图像区域中每个点都有一个confidence值，构成confidence map。confidence map中每点的值与真值位置的距离有关，离得越近confidence 越高。用高斯分布来描述，confidence峰值就是真值位置。假设k个人，图像中每个人都有预测位置的confidence map，将k个人的confidence map的集合合成为一个confidence map时，取该点各个人的confidence的最大值。这里用max而不用average是为了：及时多个peak点离得很近，精度仍然不受影响。

在test阶段，在预测的confidence maps上进行非极大值抑制来获得body part candidates.
Part Affinity Fields for Part Association
The part affinity is a 2D vector field for each limb. For each pixel in the area belonging to a particular limb, a 2D vector encodes the direction that points from one part of the limb to the other. Each type of limb has a corresponding affinity field joining its two associated body parts.

在test时，confidence score的计算方法：
计算预测的PAF（vector）与candidate limb 方向的alignment (方向是否一致，用点积计算）。
we measure association between candidate part detections by computing the line integral over the corresponding PAF, along the line segment connecting the candidate part locations. In other words, we measure the alignment of the predicted PAF with the candidate limb that would be formed by connecting the detected body parts.Specifically, for two candidate part locations dj1 and dj2 ,we sample the predicted part affinity field, Lc along the line segment to measure the confidence in their association:

In practice, we approximate the integral by sampling and summing uniformly-spaced values of u.
MultiPerson Parsing using PAFs
假设通过对confidence map进行极大值抑制，得到多个body part，每个body part 有多个detection candidate。 (图像中有多人，所以会有多个detection candidate）。找到两两body part 之间最优连接的问题，就变成了a maximum weight bipartite graph matching 的问题。
所以，变成graph问题后，可以这样理解：graph的nodes就是body part detection candidates，graph的edges就是所有可能的body part之间的connections，每个edge上的weight就是公式7计算出来的part affinity aggregate。A matching in a bipartite graph is a subset of the edges chosen in such a way that no two edges share an node.
就是找到权值最大的edge连接方式。
数学表达式：

文章使用Hungarian algorithm来求解。
找到multiple persons的full body pose的问题就变成，在K-partite graph 中求maximum weight cliques partition。This problem is NP Hard [32] and many relaxations exist.

本文中为优化增加了两个relaxation ：
1.选择最少的edges形成tree skeleton（骨骼）of human pose，而不使用整个的graph
2.把cliques partition problem 分解成一系列的bipartite matching subproblems，然后独立地分析adjacent tree nodes之间的匹配。
With these two relaxations, the optimization is decomposed simply as:

所以，通过公式（12）-（14）我们可以顺序获得每个limb（肢）的正确的correct candidates. 然后把share同一part的limb集合在一起就得到了full-body pose。
Result：

对尺度比较小的人检测效果不如其他算法。Table 3 shows results from top teams in the challenge. It is noteworthy that our method has lower accuracy than the top-down methods on people of smaller scales (APM). The reason is that our method has to deal with a much larger scale range spanned by all people in the image in one shot. In contrast, top-down methods can rescale the patch of each detected area to a larger size and thus suffer less degradation at smaller scales.

失败的例子：

时间分析：

The original frame size is 1080x1920, which we resize to 368x654 during testing to fit in GPU memory. The runtime analysis is performed on a laptop with one NVIDIA GeForce GTX-1080 GPU.
Our method has achieved the speed of 8:8 fps for a video with 19 people.
原文链接： http://blog.csdn.net/u011956147/article/details/79291040

Realtime Multi-Person 2D Pose Estimation using Part Affinity Fields

猜你喜欢