人体姿态估计2014-2017

一、常用数据集

Pose Estimation/keypoint常用数据集

1. Posetrack：https://posetrack.net/

> 500 video sequences
> 20K frames
> 150K body pose annotations
3 challenges

2. LSP：http://sam.johnson.io/research/lsp.html

样本数：2K
关节点个数：14
全身，单人

3. FLIC：https://bensapp.github.io/flic-dataset.html

样本数：2W
关节点个数：9
全身，单人

4. MPII：http://human-pose.mpi-inf.mpg.de/

样本数：25K
关节点个数：16
全身，单人/多人，40K people，410 human activities

5. MSCOCO：http://cocodataset.org/#download

样本数：>= 30W
关节点个数：18
全身，多人，keypoints on 10W people

6. AI Challenge：https://challenger.ai/competition/keypoint/subject

样本数：21W Training, 3W Validation, 3W Testing
关节点个数：14
全身，多人，38W people

二、主流方法

2D Pose estimation主要面临的困难：遮挡、复杂背景、光照、真实世界的复杂姿态、人的尺度不一、拍摄角度不固定等。

单人姿态估计

传统方法：基于Pictorial Structures， DPM

▪ 基于深度学习的算法包括直接回归坐标(Deep Pose)和通过热力图回归坐标(CPM, Hourlgass)

目前单人姿态估计，主流算法是基于Hourlgass各种更改结构的算法。

多人姿态估计

二维图像姿态估计基于CNN的多人姿态估计方法，通常有2个思路（Bottom-Up Approaches和Top-Down Approaches）：

（1）Top-Down Approaches，即two-step framework，就是先进行行人检测，得到边界框，然后在每一个边界框中检测人体关键点，连接成一个人形，缺点就是受检测框的影响太大，漏检，误检，IOU大小等都会对结果有影响，算法包括RMPE、Mask-RCNN 等。

（2）Bottom-Up Approaches，即part-based framework，就是先对整个图片进行每个人体关键点部件的检测，再将检测到的部件拼接成一个人形，缺点就是会将不同人的不同部位按一个人进行拼接，代表方法就是openpose、DeepCut 、PAFs。

tricks

采用多尺度，多分辨率的网络结构
采用基于Residual Block来构建网络
扩大感受野（large kernel, dilation convolution, Spatial Transformer Network、hourglass module）
预处理很重要（将人放在输入图片的中心，人的尺度尽量归一化到统一尺度，对图片进行翻转、旋转）
后处理同样重要

三、Single PersonPose estimation

2014----Articulated Pose Estimation by a Graphical Model with ImageDependent Pairwise Relations

2014----DeepPose_Human Pose Estimation via Deep Neural Networks

2014----Joint Training of a Convolutional Network and a Graphical Model forHuman Pose Estimation

2014----Learning Human Pose Estimation Features with Convolutional Networks

2014----MoDeep_ A Deep Learning Framework Using Motion Features for HumanPose Estimation

2015----Efficient Object Localization Using Convolutional Networks

2015----Human Pose Estimation with Iterative Error

2015----Pose-based CNN Features for Action Recognition

2016----Advancing Hand Gesture Recognition with High Resolution ElectricalImpedance Tomography

2016----Chained Predictions Using Convolutional Neural Networks

2016----CPM----Convolutional Pose Machines

2016----CVPR-2016----End-to-End Learning of Deformable Mixture of Parts andDeep Convolutional Neural Networks for Human Pose Estimation

2016----Deep Learning of Local RGB-D Patches for 3D Object Detection and 6DPose Estimation

2016----PAFs----Realtime Multi-Person 2D Pose Estimation using PartAffinity Fields

2016----Stacked hourglass----StackedHourglass Networks for Human Pose Estimation

2016----Structured Feature Learning for Pose Estimation

2017----Adversarial PoseNet_ A Structure-aware Convolutional Network forHuman pose estimation

2017----CVPR2017 oral----Realtime Multi-Person 2D Pose Estimation usingPart Affinity Fields

2017----Learning Feature Pyramids for Human Pose Estimation

2017----Multi-Context_Attention_for_Human_Pose_Estimation

2017----Self Adversarial Training for Human Pose Estimation

四、Multi-PersonPose estimation

2016----AssociativeEmbedding_End-to-End Learning for Joint Detection and Grouping

2016----DeepCut----Joint Subset Partition and Labeling for Multi PersonPose Estimation

2016----DeepCut----Joint Subset Partition and Labeling for Multi PersonPose Estimation_poster

2016----DeeperCut----DeeperCut A Deeper, Stronger, and Faster Multi-PersonPose Estimation Model

2017----G-RMI----Towards Accurate Multi-person Pose Estimation in the Wild

2017----RMPE_ Regional Multi-PersonPose Estimation

这篇是上海交大卢策吾教授项目组的论文，基于Top-Down Approaches。

论文的Motivation就是解决定位误差和定位框冗余检测这两个问题。引入Google提出的Spatial

Transformer Networks，可以使得传统的卷积带有了裁剪、平移、缩放、旋转等特性。

论文中一个实验：Upper Bound of Our Framework，就是论文直接使用ground truth的人体边

界框，在验证数据集取得84.2 mAP成绩，说明算法不仅需要提供人体边界框，第二阶段的单人姿态估计性能也需要提高。

脑洞：可以参考MSRA的deformable convolutional network，应该有新的paper。

2017----COCO2017 Keypointswinner----Cascaded Pyramid Network for Multi-Person Pose Estimation

2017----PyraNet----Learning Feature Pyramids for Human Pose Estimation