One article openpose attitude estimation

1. What is pose estimation

insert image description here
The following are the various points of the coco data set, and different data sets have different
insert image description here
insert image description here
examples:
insert image description here
insert image description here
insert image description here

insert image description here
insert image description here
insert image description here

insert image description here

2. Two methods of attitude estimation

2.1 Top-down method

insert image description here
After getting the frame in the first step, do a regression task on a single frame, for example: cut out the single-person image, input the network model of 17 key points, and get the positions of the head, shoulders, etc. Note that if the position here is to be accurate, you must is a relative position.
What is the relative position: The relative position is the position of the head. For the distance between the width and height of the cut image, assuming that the upper left corner of the image is the origin, then the stolen position is w=1 from the origin and h=2 from the origin.

The second step is splicing point, the rules of splicing point are configured in advance.

Advantages of this method:
1. Point splicing will not be wrong (accurate)
Problems of this method:
insert image description here
This method is suitable for:
1. Speed ​​is not required
2. Accuracy is required
Multiple people, real-time problems Example:
insert image description here
The improvement of openpose is Strip out the target detection, directly detect the key points of the image, and then splicing, how to splicing?
insert image description here

2.2 openpose method

Key point acquisition:
insert image description here
output 18 feature maps (each feature map corresponds to a key point), the output above is the heat map of the right shoulder. Then when the label is defined, it needs to be defined as a Gaussian type. Which points are close have a high probability, and which points are far away have a low probability.

Splicing: (PAF part affinity fields part of the affinity field)
purpose: to find the most suitable splicing method (splicing direction)
insert image description here
to introduce the vector concept
18 points have 19 connection methods, 2 points (x1, y1) (x2, y2) There will be a vector
with two directions, the direction of x and the direction of y, so the number of feature maps for this prediction is: 19*2=38, and 38 feature maps represent 19 directions.

How to connect with the direction?
insert image description here
Scores definitely require online learning, so learning involves label making:
insert image description here
in label making, vector information is needed, and vectors have size and direction. Now only the direction is needed, so unit vectors are most suitable. Direction
insert image description here
Example 1: If
insert image description here
insert image description here
insert image description here
the direction in the label is clear, Let’s see next: the direction in the prediction.
insert image description here
Introduce the idea of ​​integration: integration is to calculate the approximate area, for example , to estimate the area of
​​an infinite number of rectangular figures whose intervals are infinitely close to the x-axis. ok, how to connect? = "Score (weight) The weight is calculated in this way How to match the score after knowing it? A point, such as the neck point, can match the left shoulder, right shoulder, left leg, right leg, and one point matches many points. This kind of matching is very difficult (case b in the figure below). The points obtained above can only be Say which direction is good and which direction is bad. The common matching is binary matching, which directly applies the Hungarian algorithm. In posture matching, it is stipulated that the neck is only connected to the right shoulder, the right shoulder is only connected to the right elbow, and the right elbow is only connected to the right hand (case c in the figure below), and the matching of each feature point is dichotomous (case d in the figure below). ), two points are well done. If you still don't understand, look down first, and you will understand when you get to the third picture below! ! !
insert image description here

insert image description here

insert image description here
insert image description here




insert image description here
insert image description here

3. Framework

insert image description here
Input the image (as in case a in the above figure), walk two branches (b, c)
to estimate the actual position of the key point (as in case b in the above figure), and obtain 18 feature maps.
Estimate the vector between the key points (as in case c in the above figure), and get 38 feature maps.
insert image description here
According to the binary matching at the door, the green dot on the left is the left shoulder, the green line connects to the blue dot on the left (left elbow), and the red line connects to the blue dot on the right (left elbow). Do binary matching and match to the right one.
(As shown in case d above), 19 feature maps are obtained.
insert image description here
Finally, the skeleton result is obtained based on 19 feature maps (as shown in case e above)

4. Network structure

4.1 CPM (generation)

Only the positioning of key points
insert image description here
introduces cascading ideas
insert image description here
insert image description here
. The above figure shows that when the receptive field is larger, that is, the x-axis, the more pixels you see, the higher the accuracy rate. Take the following figure as an example.
insert image description here
insert image description here
The figure above shows that even if the field of view is small, a loss function is added to increase the accuracy of finding small fields of view, which also lays the foundation for a larger field of view later. (a learning process)

4.2 openpose

insert image description here
insert image description here
insert image description here

5. Gesture recognition

5.1 Get supervision txt

  • Set behavior tags such as standing, walking, running, jumping, sitting, squatting, kicking, punching, waving, etc., collect relevant videos through the camera for each type of behavior, and divide the video into multiple pictures
  • Use openpose to extract the gesture features as the basic recognition features of the complete action, and integrate the information into the txt file.

5.2 Feature Integration

  • Match the extracted feature information with the corresponding pictures and behavior labels one by one and integrate them into a TXT file. The integrated TXT information is used as input (picture and bone feature points) and output label (behavior label) csv file respectively.

  • The input features can be key point features, line features connected by different bone points, and surface features formed by the combination of different lines.

  • These features will be learned by a classification algorithm.

5.3 Classification of Machine Learning Algorithms

insert image description here
insert image description here

5.4 Classification of Deep Learning Algorithms

Use keras to build the RNN network model, and add secondary detection to prevent misjudgment of sitting down and falling down. The secondary detection mainly compares the ratio of the height and width of the human body to determine whether it is a fall action.

Supongo que te gusta

Origin blog.csdn.net/weixin_43676010/article/details/128179151
Recomendado
Clasificación