mediapipe——Human posture joint point detection (pose module) study notes (full)

1.1 Solution API, parameters

API/Parameters illustrate
STATIC_IMAGE_MOD Defaults to False , treating the input image as a video stream. It will try to detect the most prominent person in the first image and further locate pose landmarks after successful detection. In subsequent images, it simply tracks those landmarks without calling another detection until tracking of the target is lost, which can reduce computation and latency. If True , the human detection method will be performed on each input image, which is very suitable for processing a batch of static, possibly irrelevant images.
MODEL_COMPLEXITY The default is 1, the complexity of the posture landmark model: 0, 1, 2. Landmark accuracy and inference latency generally increase with model complexity.
smooth_landmarks Defaults to True to smooth images, filtering pose landmarks on different input images to reduce jitter, but ignored if static_image_mode is also set to True.
upper_body_only The default is False, whether to only detect landmarks on the upper body. There are 33 landmarks for human body posture and 25 landmarks for upper body posture.
enable_segmentation Default is False. If set to true, the solution generates segmentation masks in addition to pose landmarks.
smooth_segmentation Defaults to True, filters segmentation masks on different input images to reduce jitter, but is ignored if enable_segmentation is set to False, or static_image_mode is set to True.
min_tracking_confidence Default is 0.5. Minimum confidence value (between 0-1) from the landmark tracking model for pose landmarks that will be considered successfully tracked, otherwise person detection will be automatically invoked on the next input image. Setting it to a higher value improves the robustness of the solution, but at the cost of higher latency. If static_image_mode is True, person detection will run on every image frame.
min_detection_confidence The default is 0.5, which is the minimum confidence value (between 0-1) from the person detection model. If the threshold is higher than this threshold, the detection is considered successful.

1.2 Draw key points and connections

1.2.1 API
  • mediapipe.solutions.drawing_utils.draw_landmarks()

mediapipe.solutions.drawing_utils is a module. You can first create an alias for the module and then perform specific operations.

mp_drawing = mp.solutions.drawing_utils
mp_drawing.draw_landmarks()
1.2.2 Function parameters
parameter illustrate
image The original picture that needs to be drawn
landmark_list Detected key point coordinates (results.pose_landmarks)
connections Connecting lines need to connect those coordinates (mpPose.POSE_CONNECTIONS). If this parameter is not passed in, they will not be connected.
landmark_drawing_spec The color and thickness of the coordinates
connection_drawing_spec The thickness and color of the connecting wire, etc.
results = pose.process(image)
mpDraw.draw_landmarks(img, results.pose_landmarks, mpPose.POSE_CONNECTIONS)

1.3 Posture joint point tracking encapsulation module

  • mp.solutions.pose
mpPose = mp.solutions.pose  # 姿态识别方法,创建一个别名
#调用myPose模块中的Pose类
pose = mpPose.Pose(static_image_mode=False, # 静态图模式,False代表置信度高时继续跟踪,True代表实时跟踪检测新的结果
                   #upper_body_only=False,  # 是否只检测上半身
                   smooth_landmarks=True,  # 平滑,一般为True
                   min_detection_confidence=0.5, # 检测置信度
                   min_tracking_confidence=0.5)  # 跟踪置信度

1.4 View the coordinates of 33 joint points

1.4.1 33 joint point parameter names

Insert image description here

class PoseLandmark(enum.IntEnum):
  """The 33 pose landmarks."""
  NOSE = 0
  LEFT_EYE_INNER = 1
  LEFT_EYE = 2
  LEFT_EYE_OUTER = 3
  RIGHT_EYE_INNER = 4
  RIGHT_EYE = 5
  RIGHT_EYE_OUTER = 6
  LEFT_EAR = 7
  RIGHT_EAR = 8
  MOUTH_LEFT = 9
  MOUTH_RIGHT = 10
  LEFT_SHOULDER = 11
  RIGHT_SHOULDER = 12
  LEFT_ELBOW = 13
  RIGHT_ELBOW = 14
  LEFT_WRIST = 15
  RIGHT_WRIST = 16
  LEFT_PINKY = 17
  RIGHT_PINKY = 18
  LEFT_INDEX = 19
  RIGHT_INDEX = 20
  LEFT_THUMB = 21
  RIGHT_THUMB = 22
  LEFT_HIP = 23
  RIGHT_HIP = 24
  LEFT_KNEE = 25
  RIGHT_KNEE = 26
  LEFT_ANKLE = 27
  RIGHT_ANKLE = 28
  LEFT_HEEL = 29
  RIGHT_HEEL = 30
  LEFT_FOOT_INDEX = 31
  RIGHT_FOOT_INDEX = 32
1.4.2 Check the coordinates of a certain joint point
results = pose.process(image)#将图像传给姿态识别模型
#用index保存索引,记录序号,lm为具体值
for index, lm in enumerate(results.pose_landmarks.landmark):
            
        	#print(lm)
          	"""
          	x: 0.42567315697669983
			y: 4.285938739776611
			z: 0.28193268179893494
			visibility: 0.001105456380173564
          	"""
            
            
            # 保存每帧图像的宽、高、通道数
            h, w, c = img.shape
            
            # 得到的关键点坐标x/y/z/visibility都是比例坐标,在[0,1]之间
            # 转换为像素坐标(cx,cy),图像的实际长宽乘以比例,像素坐标一定是整数
            cx, cy = int(lm.x * w), int(lm.y * h)
            
            # 打印坐标信息
            print(index, cx, cy)

1.4.3 Convert xy scale coordinates into pixel coordinates
  • Since the directly generated coordinate values ​​are too small, they can be converted into the pixel size unit of the image.
# 保存每帧图像的宽、高、通道数
h, w, c = img.shape
            
# 得到的关键点坐标x/y/z/visibility都是比例坐标,在[0,1]之间
# 转换为像素坐标(cx,cy),图像的实际长宽乘以比例,像素坐标一定是整数
cx, cy = int(lm.x * w), int(lm.y * h)

1.5 View FPS

  • FPS: The number of frames transmitted per second
1.5.1 View FPS
#循环之前pTime初始化为0
#每一次循环:
cTime = time.time() #处理完一帧图像的时间
fps = 1/(cTime-pTime)#即为FPS
pTime = cTime  #重置起始时间
1.5.2 Display FPS on picture
# 在视频上显示fps信息,先转换成整数再变成字符串形式,文本显示坐标,文本字体,文本大小
cv2.putText(img, str(int(fps)), (70,50), cv2.FONT_HERSHEY_PLAIN, 3, (255,0,0), 3)  

Guess you like

Origin blog.csdn.net/weixin_63676550/article/details/128855431