【Study Notes】Integral Human Pose Regression


[Learning materials] This article summarizes all aspects of the Integral Pose Regression method - Zhihu (the summary is in place, you must read it)

1. Comparison of two basic methods

1. Decoding method

The difference between softmax and argmax:

Due to the downsampling step in deep neural networks, the resolution of the heatmap is lower than that of the input image. This leads to unavoidable quantization errors modifying the "maximum" operation to "take the expected value". Joints are estimated as an integral over all positions in the heatmap, weighted according to their probabilities (normalized from likelihood). We call this approach integral regression.

2.
The detection-based method on supervision manually renders the Gaussian heat map, and supervises the output of the network pixel by pixel
. IPR directly supervises the coordinate value .
3. Performance
conclusion: integral regression works better on `difficult samples`
Detection The -based method is seriously affected by texture information, so once it encounters severe occlusion, the response area is easily lost or shifted, while the Regression-based method can better remember the relative position relationship between key points.

2.IPR method

2.1 Locality

Its response value is concentrated in a local area, and the response of other places is almost 0. The place with the largest response value corresponds to the target point. A probability distribution centered on the real position. The farther the distance, the lower the probability of being marked by people. Let me call it "locality" here .

2.1.1 Why is the IPR method more localized?

When supervised we are implicitly learning the Laplace distribution.

The true distribution of key points on the COCO dataset is actually between Laplace and Gaussian distributions , with edges sharper than Gaussian distributions and smoother than Laplace distributions.

2.2 Shape constraints

2.2.1 Why do you need to constrain the shape

The Soft-Argmax calculation process is to perform Softmax normalization on the output feature map, find the expectation as the coordinate value, and directly supervise through the coordinate value, so as long as the expected value is correct, that is, as long as the distribution meets the expected requirements, no matter what it looks like, loss will be reduced, so the predicted Heatmap may appear "multi-peak", "flat", and the maximum response value point offset.

2.2.2 Solving shapes

In addition to the performance degradation in very extreme cases and cases, other cases have performance improvements. It can be seen that adding shape constraints is effective .

2.3  Supervision method and gradient difference

The difference in gradient form makes the training process of IPR much more difficult than the detection-based method.

2.4 Summary

The performance disadvantage of the Integral Pose Regression method mainly comes from four aspects:

  1. Bias introduced by the Softmax property
  2. There is a difference between the real distribution of data and the simple distribution pre-defined by humans
  3. Unclear learning goals due to lack of constraints on the nature of probability distributions
  4. Instable gradient form leads to inefficient learning

Guess you like

Origin blog.csdn.net/weixin_50862344/article/details/130222835