Gesture recognition using MediaPipe and OpenCV's Python

Gesture Recognition

Gesture recognition technology is a very useful technology, which can convert human gestures into a form that computers can understand, so as to achieve a more natural, fast and intuitive way of interaction. This article will introduce a gesture recognition technology based on MediaPipe and OpenCV, which can realize real-time recognition and analysis of gestures.

MediaPipe

MediaPipe is an open source machine learning framework that can be used to build computer vision and machine learning applications, and provides many pre-trained models and tools. In this article, we will use the Hands model in MediaPipe, which can realize real-time tracking and key point detection of hands. OpenCV is an open source computer vision library that can be used for image processing and analysis.

In this article, we will use MediaPipe and OpenCV to implement gesture recognition technology and apply it to actual scenarios.

First, we need to introduce the MediaPipe and OpenCV libraries, and create an instance of the Hands model:

import cv2
import mediapipe as mp

mpHands = mp.solutions.hands
hands = mpHands.Hands(static_image_mode=False,
                      max_num_hands=2,
                      min_detection_confidence=0.5,
                      min_tracking_confidence=0.5)
mpDraw = mp.solutions.drawing_utils

Then, we need to open the camera and start processing the image:

cap = cv2.VideoCapture(0)
while True:
    ok, img = cap.read()
    if ok:
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        results = hands.process(imgRGB)
        # print(results.multi_hand_landmarks)
        # ...

In the loop, we first read a frame from the camera and convert it to RGB format. Then, we pass the image to the Hands model for processing, and get the processing result. Next, we will analyze the processing results and visualize the results.

In the Hands model, each detected hand is represented as a HandLandmark object, which contains the position information of 21 key points. We can track and analyze the hand by traversing the key points in the HandLandmark object and drawing the position of the key points in the image. For example, we can use the following code to plot the position of each key point:

for handLms in results.multi_hand_landmarks:
    for id, lm in enumerate(handLms.landmark):
        h, w, c = img.shape
        cx, cy = int(lm.x * w), int(lm.y * h)
        cv2.circle(img, (cx, cy), int(w / 50), (200, 0, 200), cv2.FILLED)

In this code, we loop through all detected hands, then loop through each keypoint, and calculate its position in the image. Finally, we use the cv2.circle function to draw a circle in the image representing the position of the keypoint.

In addition to hand tracking and analysis, we can also use the Hands model for gesture recognition. For example, we can judge the type of gesture based on the degree of bending of the finger. In the code of this article, we use a simple algorithm to realize the judgment of the bending degree of the finger. The specific implementation is as follows:

finger_count = 0
for id in [4, 8, 12, 16, 20]:
    if handLms.landmark[id].y < handLms.landmark[id - 2].y:
        finger_count += 1

In this code, we iterate through the keypoints of the five fingers, and compare the position of the current keypoint with the position of the previous keypoint to determine whether the finger is bent. If the position of the current key point is lower than the position of the previous key point, then the finger is bent. Finally, we counted the number of fingers bent and judged the gesture type based on the number of fingers.

Finally, we can output gesture types to the screen, and realize real-time recognition and analysis of gestures. For example, we can use the following code to output gesture types to the screen:

gesture_dict = {
    
    
    1: "Fist",
    2: "One",
    3: "Two",
    4: "Three",
    5: "Four",
    6: "Five"
}
prev_fingers = [0, 0, 0, 0, 0]
for handLms in results.multi_hand_landmarks:
    finger_count = 0
    for id in [4, 8, 12, 16, 20]:
        if handLms.landmark[id].y < handLms.landmark[id - 2].y:
            finger_count += 1
            cv2.circle(img, (cx, cy), int(w / 50), (0, 255, 0), cv2.FILLED)
    prev_fingers.pop(0)
    prev_fingers.append(finger_count)
    if prev_fingers == [0, 5, 5, 5, 5]:
        gesture_type = 6
    elif prev_fingers[1:] == [1, 1, 1, 1]:
        gesture_type = 2
    elif prev_fingers[1:] == [2, 2, 2, 2]:
        gesture_type = 3
    elif prev_fingers[1:] == [3, 3, 3, 3]:
        gesture_type = 4
    elif prev_fingers[1:] == [4, 4, 4, 4]:
        gesture_type = 5
    else:
        gesture_type = 1
    cv2.putText(img, gesture_dict[gesture_type], (50, 50), cv2.FONT_HERSHEY_SIMPLEX, 1, (0, 0, 255), 2)

In this code, we define a dictionary of gesture types, which contains six gesture types. Then, we use a list to record the bending of each finger in the previous four frames, and judge the current gesture type according to the number of fingers bent. Finally, we output the gesture type to the screen, and use different colors and sizes to indicate the position of each key point and the bending of the finger.

Summarize

This paper introduces a gesture recognition technology based on MediaPipe and OpenCV, which can realize real-time recognition and analysis of gestures. This technology has high practical value and can be applied in many fields, such as smart home, game control, human-computer interaction, etc. In the future, with the continuous development of computer vision and artificial intelligence technology, gesture recognition technology will become more advanced and intelligent, bringing more convenience and innovation to people's life and work.

Guess you like

Origin blog.csdn.net/qq_46556714/article/details/130952761