Article Directory
0. Preface
- Relevant information:
- Basic information of the paper
- Field: attitude estimation
- Author unit: University of Michigan
- Publication time: ECCV 2016
- One sentence summary: proposed an hourglass-style backbone.
1. What problem to solve
- At that time, the research of convolutional neural network in pose estimation was not enough, and various new results were still being explored to deal with pose estimation problems.
- The ultimate goal of pose estimation is to determine which pixel of the original image each key point is, so it must be a process from downsample to upsample.
2. What method was used
-
The design inspiration of hourglass comes from the need to extract information of all sizes.
-
The overall structure of Hourglass is shown in the figure below
- That is, the size of the feature map keeps shrinking, increasing, shrinking, and increasing.
-
For each hourglass, its structure is as follows
-
There is a picture in the paper, put it down here
- The structure of each box in each hourglass corresponds to the left picture.
- The picture on the right introduces Intermediate Supervision.
- How to translate, intermediate supervision? strange.
- This means that the network is composed of multiple hourglasses, and each hourglass has to output a prediction result (that is, the key point heat map, the blue box in the figure below), and calculate the loss function.
3. How effective is it
- At that time, SOTA was achieved on both FLIC and MPII. It's not very useful to post pictures now. After all, it was a few years ago, so if you need it, you can read the paper by yourself.
- Record the training details casually
- Single pose estimation problem
- The image size is 256x256
- Rotation is used for data enhancement
- The loss function is MSE
- GT uses a 2D Gaussian distribution
4. What are the problems & what can be learned
- It should belong to the backbone commonly used in current attitude estimation. In fact, centernet also uses hourglass as the backbone.
- However, I feel that the network is still relatively complex, it is estimated to be relatively large, and the time performance is average.