IPR for 2D key point detection: Integral Human Pose Regression

Insert image description here

Paper link: Integral Human Pose Regression
Time: 2018.09 ECCV'2018
Author Team: Xiao Sun, Bin Xiao, Fangyin Wei, Shuang Liang, and Yichen Wei
Classification: Computer Vision – Human Key Point Detection – 2D top-down

Table of contents:

1.IPR background
2.IPR gesture recognition
3.IPR network architecture diagram
4. Quote

1. Mainly for learning records. If there is any infringement, please send me a private message to make corrections.
2. The level is limited. Thank you for pointing out any deficiencies.


1.IPR background

  For the Human Pose Estimation task, there are two main methods based on deep learning:

  1. Based on the regressing method, the position coordinates of each key point are directly predicted.
  2. Based on the heatmap method, a heat map is predicted for each key point and the score of each position is predicted.

  For the paper Numerical Coordinate Regression with Convolutional Neural Networks, it can be divided into two categories: Heatmap-based and Regression-based. IPR has both regression prediction key point coordinates in the regressing method and decoding prediction in the heatmap method.

  IPR mainly generates joint coordinates through the network instead of heatmap, which proves the importance of generating joint coordinates. On the heatmap, the original maximum value is changed to the expected value. The method is to use softmax normalization on the heatmap, and then perform probability integral regression.


2.IPR gesture recognition

  The model structure consists of a deep convolutional backbone network and a shallow head network. The former is used to extract convolutional features from the input image, and the latter is used to estimate the target output (heat map or joint) from the features. The experimental network design architecture designed in this article:
Insert image description here

  1. Decoding method
      Since the resolution of the heat map in the downsampling step of the deep neural network is lower than the resolution of the input image, inevitable quantization errors result. Using higher resolution images and heat maps helps improve accuracy but requires computation and storage. Regression methods perform end-to-end learning and produce continuous output, avoiding the above problems.
      IPR will associate and unify the heat map representation and joint regression, and modify the "maximum value" operation to "get the expected value". Using Soft-Argmax decoding, first use Softmax to normalize the probability heat map, and then use the expectation method to obtain the predicted coordinates. The 2D coordinates are obtained as shown in the following formula:
      Ω ΩΩ is the domain,ppp is all the coordinates in the domain,H k ~ ( p ) \tilde{\mathbf{H}_k}(\mathbf{p})Hk~( p ) is the probability weight, which is obtained by softmax normalization of the heatmap.
    J k = ∫ p ∈ Ω p ⋅ H ~ k ( p ) \mathbf{J}_k=\int_{\mathbf{p}\in\Omega}\mathbf{p}\cdot\tilde{\mathbf{H} }_k(\mathbf{p})Jk=pΩpH~k( p )
    H ~ k ( p ) = e H k ( p ) ∫ q ∈ Ω e H k ( q ) \tilde{\mathbf{H}}_k(\mathbf{p})=\frac{e^{ \mathbf{H}_k(\mathbf{p})}}{\int_{\mathbf{q}\in\Omega}e^{\mathbf{H}_k(\mathbf{q})}}H~k(p)=qΩeHk(q)eHk(p)


  2.   For the joint coordinate loss in the supervised method, this article uses the L1 and L2 distances between the predicted joints and the real joints on the ground as the loss function for experiments. It is found that L1loss is always better than L2loss, so all experiments in the article use L1loss .
    L re = ∣ ∣ J gt − J ^ re ∣ ∣ 1 = ( ∣ J ^ x − J x ∣ + ∣ J ^ y − J y ∣ ) L_{re}=||J_{gt}-\hat{J }_{re}||_1=(|\hat{J}_x-J_x|+|\hat{J}_y-J_y|)Lre=∣∣JgtJ^re1=(J^xJx+J^yJy)

  3. Results Evaluation
      MPII dataset, comparison of direct regression and integral regression methods using heatmap, backbone is ResNet-50. All integral regression methods (I1, I2, I3) are significantly better than their heatmap-based counterparts (H1, H2, H3). The combination of key point heatmap and key point coordinate regression methods has the best effect.
    Insert image description here
      Experiments on different backbone networks (ResNet and hourglass) show that the method using coordinate regression is better. ResNet18 uses coordinate regression, which can reach the height of ResNet101 using heatmap regression. Coordinate regression is a better choice when small networks need to be used.
    Insert image description here
      Experimental comparison of using coordinate regression methods and not using coordinate regression methods in multi-stage networks.
    Insert image description here
      Comparison between the COCO data set and other state-of-the-art methods at the time:
    Insert image description here

  4. Summarize

可微的,允许端到端训练,快速、非参数化(运算开销小)。
可以很容易地与任何基于热图的方法相结合,潜在的热图表示使其易于训练。
具有连续输出,不存在量化问题

3.IPR network architecture diagram

crowdpose.onnx.png


4. Quote

Quote 1
Quote 2
Quote 3

Guess you like

Origin blog.csdn.net/qq_54793880/article/details/131116685