Mediapipe implements gesture tracking

Environment: python3.8, pycharm2020
Hardware: Logitech c505e

Mediapipe hands

Official website information: https://google.github.io/mediapipe/solutions/hands.html
Mediapipe is a machine learning library open sourced by Google, which has some solutions for facial recognition and gesture recognition, and provides languages ​​such as python and js package.

MediaPipe Hands is a high-fidelity hand and finger tracking solution. It uses machine learning (ML) to infer 21 key 3D hand information from just one frame. We can use it to extract the coordinates of the key points of the hand. The key points are numbered as follows:

Simple gesture tracking example:

According to the documentation on the official website, we can quickly implement a simple gesture tracking example:

import cv2
import mediapipe as mp
import time

cap = cv2.VideoCapture(0, cv2.CAP_DSHOW)

mpHands = mp.solutions.hands
hands = mpHands.Hands()
mpDraw = mp.solutions.drawing_utils
# 帧率统计
pTime = 0
cTime = 0

while True:
    success, img = cap.read()
    imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)#转换为rgb
    results = hands.process(imgRGB)

    print(results.multi_hand_landmarks)
    if results.multi_hand_landmarks:
        for handLms in results.multi_hand_landmarks:
            for id, lm in enumerate(handLms.landmark):
                print(id, lm)
                # 获取手指关节点
                h, w, c = img.shape
                cx, cy = int(lm.x*w), int(lm.y*h)
                cv2.putText(img, str(int(id)), (cx+10, cy+10), cv2.FONT_HERSHEY_PLAIN,
                            1, (0, 0, 255), 2)
            mpDraw.draw_landmarks(img, handLms, mpHands.HAND_CONNECTIONS)

    # 统计屏幕帧率
    cTime = time.time()
    fps = 1 / (cTime - pTime)
    pTime = cTime
    cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)

    cv2.imshow("image", img)
    if cv2.waitKey(2) & 0xFF == 27:
        break

cap.release()

The actual effect is shown in the figure:
insert image description here

The number of frames here is limited by the frame rate of the camera.

Partial api analysis

Initial configuration of Hands

We are hands = mpHands.Hands()completing the initial configuration of Hands. From the constructor below, we can see the initialization parameters that we can configure

def __init__(self,
            static_image_mode=False,
            max_num_hands=2,
            min_detection_confidence=0.5,
            min_tracking_confidence=0.5):
  • static_image_mode is the static image mode, the default is false, used to process video
  • max_num_hands is the maximum number of hands supported to detect
  • The latter two should be the confidence thresholds for detection and tracking (knowledge related to probability theory). When the confidence is less than the value, the image will be reprocessed. The larger the value, the more accurate the result will be.

Collection of gesture information

results = hands.process(imgRGB)After completing the processing of the image, it should be noted that the input here must be in RGB format.

If a hand is detected, then a list will be returned containing the x, y, z values ​​of the 21 marker points

  • The range of x and y is [0.1, 1], and we can get the specific pixel coordinates by multiplying the width and height of the image.
  • z is the depth, the smaller the value, the closer the coordinate point is to the camera.

results.multi_hand_landmarksis a list of all detected hands. By accessing this list, we can get the information about the corresponding flags of each hand. The specific example is as follows, we can get the pixel coordinates of each flag:

if results.multi_hand_landmarks:
        for handLms in results.multi_hand_landmarks:
            for id, lm in enumerate(handLms.landmark):
                print(id, lm)
                # 获取手指关节点
                h, w, c = img.shape
                cx, cy = int(lm.x*w), int(lm.y*h)
                cv2.putText(img, str(int(id)), (cx+10, cy+10), cv2.FONT_HERSHEY_PLAIN,
                            1, (0, 0, 255), 2)

Extraction of specific landmarks

With the above introduction, we can easily get the coordinates of the specific marker points, such as obtaining the information of 4 marker points (that is, the thumb):
print(handLms.landmark[4])

Related Links

https://www.youtube.com/watch?v=9iEPzbG-xLE
https://google.github.io/mediapipe/solutions/hands.html

I will share some applications based on Mediapipe, such as gesture recognition, etc.
1. Gesture recognition code connection

Guess you like

Origin blog.csdn.net/qq_43550173/article/details/116174714