opencv combat project gesture recognition - gesture control keyboard

Gesture recognition is a human-computer interaction technology that realizes the operation and control of computers, smart phones, smart TVs and other devices by recognizing human gestures.

1.  Opencv implements hand tracking (locating key points of the hand)

2. The opencv actual combat project realizes gesture tracking and returns position information (encapsulation call)

3. Opencv practical project gesture recognition - gesture control mouse

4. opencv actual combat project gesture recognition - gesture control keyboard

to be continued

This column records the author's learning journey and will continue to be updated. Welcome to subscribe and learn together

This project uses Google's open source framework mediapipe , which has a lot of models for us to use, such as face detection, body detection, hand detection, etc.insert image description here

 The code needs to use opencv    HandTraqckModule module mediapipe module and a keyboard control module pynput, cvzone module

One, HandTraqckModule module 

In the previous article, there is a tutorial on encapsulating the hand detection module. Here is a brief introduction. There are new modules that can be easily learned.

import cv2
import mediapipe as mp
import math


class HandDetector:
    """
    Finds Hands using the mediapipe library. Exports the landmarks
    in pixel format. Adds extra functionalities like finding how
    many fingers are up or the distance between two fingers. Also
    provides bounding box info of the hand found.
    """

    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, minTrackCon=0.5):
        """
        :param mode: In static mode, detection is done on each image: slower
        :param maxHands: Maximum number of hands to detect
        :param detectionCon: Minimum Detection Confidence Threshold
        :param minTrackCon: Minimum Tracking Confidence Threshold
        """
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.minTrackCon = minTrackCon

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
                                        self.detectionCon, self.minTrackCon)
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]
        self.fingers = []
        self.lmList = []

    def findHands(self, img, draw=True):
        """
        Finds hands in a BGR image.
        :param img: Image to find the hands in.
        :param draw: Flag to draw the output on the image.
        :return: Image with or without drawings
        """
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, handNo=0, draw=True):
        """
        Finds landmarks of a single hand and puts them in a list
        in pixel format. Also finds the bounding box around the hand.
        :param img: main image to find hand in
        :param handNo: hand id if more than one hand detected
        :param draw: Flag to draw the output on the image.
        :return: list of landmarks in pixel format; bounding box
        """

        xList = []
        yList = []
        bbox = []
        bboxInfo = []
        self.lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                h, w, c = img.shape
                px, py = int(lm.x * w), int(lm.y * h)
                xList.append(px)
                yList.append(py)
                self.lmList.append([px, py])
                if draw:
                    cv2.circle(img, (px, py), 5, (255, 0, 255), cv2.FILLED)
            xmin, xmax = min(xList), max(xList)
            ymin, ymax = min(yList), max(yList)
            boxW, boxH = xmax - xmin, ymax - ymin
            bbox = xmin, ymin, boxW, boxH
            cx, cy = bbox[0] + (bbox[2] // 2), \
                     bbox[1] + (bbox[3] // 2)
            bboxInfo = {"id": id, "bbox": bbox, "center": (cx, cy)}

            if draw:
                cv2.rectangle(img, (bbox[0] - 20, bbox[1] - 20),
                              (bbox[0] + bbox[2] + 20, bbox[1] + bbox[3] + 20),
                              (0, 255, 0), 2)

        return self.lmList, bboxInfo

    def fingersUp(self):
        """
        Finds how many fingers are open and returns in a list.
        Considers left and right hands separately
        :return: List of which fingers are up
        """
        if self.results.multi_hand_landmarks:
            myHandType = self.handType()
            fingers = []
            # Thumb
            if myHandType == "Right":
                if self.lmList[self.tipIds[0]][0] > self.lmList[self.tipIds[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            else:
                if self.lmList[self.tipIds[0]][0] < self.lmList[self.tipIds[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)

            # 4 Fingers
            for id in range(1, 5):
                if self.lmList[self.tipIds[id]][1] < self.lmList[self.tipIds[id] - 2][1]:
                    fingers.append(1)
                else:
                    fingers.append(0)
        return fingers

    def findDistance(self, p1, p2, img, draw=True):
        """
        Find the distance between two landmarks based on their
        index numbers.
        :param p1: Point1 - Index of Landmark 1.
        :param p2: Point2 - Index of Landmark 2.
        :param img: Image to draw on.
        :param draw: Flag to draw the output on the image.
        :return: Distance between the points
                 Image with output drawn
                 Line information
        """

        if self.results.multi_hand_landmarks:
            x1, y1 = self.lmList[p1][0], self.lmList[p1][1]
            x2, y2 = self.lmList[p2][0], self.lmList[p2][1]
            cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

            if draw:
                cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
                cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
                cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
                cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)

            length = math.hypot(x2 - x1, y2 - y1)
            return length, img, [x1, y1, x2, y2, cx, cy]

    def handType(self):
        """
        Checks if the hand is left or right
        :return: "Right" or "Left"
        """
        if self.results.multi_hand_landmarks:
            if self.lmList[17][0] < self.lmList[5][0]:
                return "Right"
            else:
                return "Left"


def main():
    cap = cv2.VideoCapture(0)
    detector = HandDetector(detectionCon=0.8, maxHands=1)
    while True:
        # Get image frame
        success, img = cap.read()
        # Find the hand and its landmarks
        img = detector.findHands(img)
        lmList, bboxInfo = detector.findPosition(img)
        print(detector.handType())

        # Display
        cv2.imshow("Image", img)
        cv2.waitKey(1)


if __name__ == "__main__":
    main()

  1. Imported libraries: Necessary libraries are imported, including OpenCV ( cv2) for image processing and display, Mediapipe ( mediapipe) for hand detection and tracking, and math libraries ( math).

  2. HandDetectorClass: This is the main gesture detector class, providing several methods to handle hand detection and analyzing gestures.

    • __init__Method: Initialize the parameters of the detector, such as detection mode, maximum detection lot, confidence threshold for detection and tracking, etc.

    • findHandsApproach: Find hands in a given image, optionally plotting the detections.

    • findPositionMethod: Find the keypoint locations (landmarks) of individual hands and store them in a list in pixel format, and calculate the bounding box information of the hand at the same time.

    • fingersUpMethod: Determine how many fingers are open in the gesture, and return the result as a list.

    • findDistanceMethod: Computes the distance between two specified keypoints and plots the result on the image.

    • handTypeMethod: Determine whether the hand type is left-handed or right-handed.

I won't go into details

This function is available in a special package called cvzone, but I don’t know if it’s a version problem, something is missing, and it can’t run, so I can only check the module by hand.

Below is the code of the main function

import cv2
from cvzone.HandTrackingModule import HandDetector
from HandTrackingModule import *
from time import sleep
import numpy as np
import cvzone
from pynput.keyboard import Controller

cap = cv2.VideoCapture(0)
cap.set(3, 1280)
cap.set(4, 720)

detector =HandDetector(detectionCon=0.5)
keys = [["Q", "W", "E", "R", "T", "Y", "U", "I", "O", "P"],
        ["A", "S", "D", "F", "G", "H", "J", "K", "L", ";"],
        ["Z", "X", "C", "V", "B", "N", "M", ",", ".", "/"]]
finalText = ""

keyboard = Controller()


def drawAll(img, buttonList):
    for button in buttonList:
        x, y = button.pos
        w, h = button.size
        cvzone.cornerRect(img, (button.pos[0], button.pos[1], button.size[0], button.size[1]),
                          20, rt=0)
        cv2.rectangle(img, button.pos, (x + w, y + h), (255, 0, 255), cv2.FILLED)
        cv2.putText(img, button.text, (x + 20, y + 65),
                    cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
    return img


#
# def drawAll(img, buttonList):
#     imgNew = np.zeros_like(img, np.uint8)
#     for button in buttonList:
#         x, y = button.pos
#         cvzone.cornerRect(imgNew, (button.pos[0], button.pZXos[1], button.size[0], button.size[1]),
#                           20, rt=0)
#         cv2.rectangle(imgNew, button.pos, (x + button.size[0], y + button.size[1]),
#                       (255, 0, 255), cv2.FILLED)
#         cv2.putText(imgNew, button.text, (x + 40, y + 60),
#                     cv2.FONT_HERSHEY_PLAIN, 2, (255, 255, 255), 3)
#
#     out = img.copy()
#     alpha = 0.5
#     mask = imgNew.astype(bool)
#     print(mask.shape)
#     out[mask] = cv2.addWeighted(img, alpha, imgNew, 1 - alpha, 0)[mask]
#     return out


class Button():
    def __init__(self, pos, text, size=[85, 85]):
        self.pos = pos
        self.size = size
        self.text = text


buttonList = []
for i in range(len(keys)):
    for j, key in enumerate(keys[i]):
        buttonList.append(Button([100 * j + 50, 100 * i + 50], key))

while True:
    success, img = cap.read()
    img = detector.findHands(img)
    lmList, bboxInfo = detector.findPosition(img)
    img = drawAll(img, buttonList)

    if lmList:
        for button in buttonList:
            x, y = button.pos
            w, h = button.size

            if x < lmList[8][0] < x + w and y < lmList[8][1] < y + h:
                cv2.rectangle(img, (x - 5, y - 5), (x + w + 5, y + h + 5), (175, 0, 175), cv2.FILLED)
                cv2.putText(img, button.text, (x + 20, y + 65),
                            cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
                l, _, _ = detector.findDistance(8, 12, img, draw=False)
                print(l)

                ## when clicked
                if l < 30:
                    keyboard.press(button.text)
                    cv2.rectangle(img, button.pos, (x + w, y + h), (0, 255, 0), cv2.FILLED)
                    cv2.putText(img, button.text, (x + 20, y + 65),
                                cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
                    finalText += button.text
                    sleep(0.15)

    cv2.rectangle(img, (50, 350), (700, 450), (175, 0, 175), cv2.FILLED)
    cv2.putText(img, finalText, (60, 430),
                cv2.FONT_HERSHEY_PLAIN, 5, (255, 255, 255), 5)

    cv2.imshow("Image", img)
    cv2.waitKey(1)
  1. Import library: import the required library, including OpenCV ( cv2) for image processing and display, HandDetector in Mediapipe for hand detection, cvzonefor drawing button appearance, numpyfor array processing, pynput.keyboardController in Mediapipe for simulating keyboard input, timeUsed for delay.

  2. Set camera parameters: Set the resolution of the camera to 1280x720 through OpenCV.

  3. Create a HandDetector instance: use the HandDetector class to create a gesture detector instance, and set the detection confidence threshold to 0.5.

  4. Create button list: Create a list containing the information of virtual keyboard buttons, and the key layout is keysdefined by nested lists.

  5. Create Buttonclass: the class used to create virtual buttons, each button contains position, text content and size.

  6. Main loop: Enter an infinite loop for processing real-time camera capture image frames.

    • Read Image Frames: Capture image frames from the camera.

    • Hand Detection: Use a gesture detector to find hands and keypoints in an image.

    • Draw Button: Call drawAllthe function to draw a virtual button on the image.

    • Iterate through the list of buttons: Each button is checked to see if a finger is touching the button.

      • If the finger is within the range of the button, draw a highlight effect.

      • Calculate the distance between the finger touch point and the center of the button, if it is less than a certain threshold, simulate a keyboard press and record the input.

    • Draw Typed Text: Draws the typed text on the image.

    • Display image: Display the processed image through OpenCV.

    • Wait for keyboard input: Waits for 1 millisecond to keep the image window responsive.

  7. Run the main program: execute the main loop to handle real-time camera capture and gesture recognition.

If you encounter any problems, you can leave a message in the comment area, and everyone can learn from each other!

Guess you like

Origin blog.csdn.net/weixin_45303602/article/details/132253550