Pose Estimation-Stacked Hourglass Networks for Human Pose Estimation

0. Preface

1. What problem to solve

  • At that time, the research of convolutional neural network in pose estimation was not enough, and various new results were still being explored to deal with pose estimation problems.
  • The ultimate goal of pose estimation is to determine which pixel of the original image each key point is, so it must be a process from downsample to upsample.

2. What method was used

  • The design inspiration of hourglass comes from the need to extract information of all sizes.

  • The overall structure of Hourglass is shown in the figure below

    • That is, the size of the feature map keeps shrinking, increasing, shrinking, and increasing.
    • image-20201231094139028
  • For each hourglass, its structure is as follows

    • image-20201231094222478
  • There is a picture in the paper, put it down here

    • The structure of each box in each hourglass corresponds to the left picture.
    • The picture on the right introduces Intermediate Supervision.
      • How to translate, intermediate supervision? strange.
      • This means that the network is composed of multiple hourglasses, and each hourglass has to output a prediction result (that is, the key point heat map, the blue box in the figure below), and calculate the loss function.
    • image-20201231095144951

3. How effective is it

  • At that time, SOTA was achieved on both FLIC and MPII. It's not very useful to post pictures now. After all, it was a few years ago, so if you need it, you can read the paper by yourself.
  • Record the training details casually
    • Single pose estimation problem
    • The image size is 256x256
    • Rotation is used for data enhancement
    • The loss function is MSE
    • GT uses a 2D Gaussian distribution

4. What are the problems & what can be learned

  • It should belong to the backbone commonly used in current attitude estimation. In fact, centernet also uses hourglass as the backbone.
  • However, I feel that the network is still relatively complex, it is estimated to be relatively large, and the time performance is average.

Guess you like

Origin blog.csdn.net/irving512/article/details/112003268