Use machine learning to detect and evaluate 33 2D poses of the human body

In the previous articles, we shared the code implementation process of face 468-point detection and hand blackjack. In this issue, we will detect and evaluate human body posture

Human pose estimation from video plays a crucial role in various applications such as quantifying physical exercise, sign language recognition, and whole-body gesture control, and also overlays digital content and information on top of the physical world in augmented reality.

MediaPipe Pose, an ML solution for high-fidelity human pose tracking, leverages BlazePose research and also obtains the entire 33 2D landmarks (or 25 upper body landmarks) of an RGB video frame from the ML Kit Pose Detection API. While current state-of-the-art methods rely primarily on powerful desktop environments for inference, MediaPipe Pose's approach achieves real-time performance on most modern mobile phones, and even the Web.

 ML pipeline

The solution utilizes a two-step detector-tracker ML pipeline, where the pipeline first localizes a person/pose region of interest (ROI) within a frame using a detector. The tracker then uses the ROI cropped frame as input to predict pose landmarks in the ROI. Note that for the video use case, the detector is called only when needed (i.e., for the first frame) and when the tracker can no longer recognize the human pose in the previous frame. For other frames, the pipeline derives ROIs only from the pose landmarks from the previous frame.

Person/pose detection model (BlazePose detector)

This detector is inspired by the lightweight model, which is used as a proxy for detectors. It explicitly predicts two additional virtual keypoints that firmly describe the center, rotation and scale of the human body as a circle. , we predict the midpoint of the person's hips, the radius of the circle circumscribing the entire person, and the connecting shoulders

Guess you like

Origin blog.csdn.net/weixin_44782294/article/details/129906591