1. Development environment

1、pycharm

2、python3.9

3、opencv-python

4、mdiapipe-0.8.3

2. Overview of Gesture Recognition

The ability to sense the shape and motion of the hand is an important part of improving user experience across a variety of technology domains and platforms. For example, it could form the basis for sign language understanding and gesture control, and in augmented reality the overlay of digital content and information on top of the physical world.
MediaPipe Hands is a high fidelity hand and finger tracking solution. It uses machine learning (ML) to infer 3D landmarks of 21 hands from a single frame. Current state-of-the-art methods mainly rely on powerful desktop environments for inference, and achieve real-time performance on mobile phones, and even scale to multi-hands.

3. Hand Landmark Model

After palm detection on the entire image, the Hand Landmark model performs precise keypoint positioning on the 21 3D hand joint coordinates within the detected hand area through regression, that is, direct coordinate prediction.
In order to obtain the real data for training, about 30K real images were manually annotated using 21 3D coordinates, as shown below (if it exists in the corresponding coordinates, get the Z value from the image). To better cover hand poses and provide additional supervision on hand geometry, we also render high-quality synthetic hand models on various backgrounds and map them to corresponding 3D coordinates.

4. Tools and dependent library installation

4.1 python download and installation:

Official website download (the current computers basically support 64bit):

After downloading, double-click to run. Remember to check " Add Python 3.9 to PATH " first , and then click " Install Now " to install .

4.2 pycharm installation

Official website download (default installation)

5. Multi-gesture recognition source code

import cv2
import mediapipe as mp
import time

class handDetector():
    def __init__(self, mode=False, maxHands=2, complexity=1, detectionCon=0.5, trackCon=0.5):
        # 初始化类的参数
        self.mode = mode
        self.maxHands = maxHands
        self.complexity = complexity
        self.detectionCon = detectionCon
        self.trackCon = trackCon

        # 初始化手跟踪模块
        self.mpHands = mp.solutions.hands
        # mediapipe version: 0.8.11
        # self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.complexity,
        #                                 self.detectionCon, self.trackCon)
        # mediapipe version: 0.8.3
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
                                        self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils

    # 跟踪手关节点位置
    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

    # 对手关节点绘制圆圈
    def findPostion(self, img, handNo=0, draw=True):
        lmList = []
        if self.results.multi_hand_landmarks:
            for myHand in self.results.multi_hand_landmarks:
                for id, lm in enumerate(myHand.landmark):
                    h, w, c = img.shape
                    cx, cy = int(lm.x * w), int(lm.y * h)
                    lmList.append([id, cx, cy])
                    if draw:
                        cv2.circle(img, (cx, cy), 12, (255, 0, 255), cv2.FILLED)
        return lmList


def main():
    pTime = 0
    cTime = 0

    # 打开摄像机Camera0
    cap = cv2.VideoCapture(0)

    # 实例化类对象
    detector = handDetector()

    while True:
        # 读取摄像机的视频图像
        success, img = cap.read()

        # 跟踪关节点位置
        img = detector.findHands(img)

        # 对手关节点绘制圆圈
        lmList = detector.findPostion(img)

        # 计算实时帧率
        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        # 显示实时帧率
        cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 2,
                    (255, 0, 255), 2)

        # 显示视频图像
        cv2.imshow("Image", img)
        cv2.waitKey(1)


if __name__ == "__main__":
    main()

5.1 Precautions for using source code

This routine is implemented based on mediapipe-0.8.3 version, and is also compatible with the code implementation of version 0.8.11.
If you have installed the mediapipe-0.8.11 version, you can open the commented out code and comment out the implementation code related to the 0.8.3 version.
Version 0.8.11 has one more complexity parameter than version 0.8.3, and the order of palm identification numbers is also different.

Opencv's (multiple) gesture recognition