Opencv actual combat project gesture recognition - gesture control mouse

Gesture Recognition Series Article Directory

Gesture recognition is a human-computer interaction technology that realizes the operation and control of computers, smart phones, smart TVs and other devices by recognizing human gestures.

1. Opencv implements hand tracking (locating key points of the hand)

2. opencv actual combat project realizes gesture tracking and returns position information (encapsulation call)

3. Gesture recognition - gesture volume control (opencv)

4. opencv combat project gesture recognition - gesture control mouse

to be continued

This column records the author's learning journey and will continue to be updated. Welcome to subscribe and learn together


This project uses Google's open source framework mediapipe , which has a lot of models for us to use, such as face detection, body detection, hand detection, etc.

insert image description here

 

The code needs to use opencv    HandTraqckModule module mediapipe module and a mouse control module autopy


One, HandTraqckModule module 

In the previous article, there is a tutorial on packaging the hand detection module, here is a brief introduction

import cv2
import mediapipe as mp
import time

class handDetector():
    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, trackCon=0.5):
        """
        初始化手势检测器对象。

        Args:
            mode (bool): 是否检测多只手。默认为False,只检测单只手。
            maxHands (int): 最多检测的手的数量。默认为2。
            detectionCon (float): 手势检测的置信度阈值。默认为0.5。
            trackCon (float): 手势跟踪的置信度阈值。默认为0.5。
        """
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon

        # 创建 Mediapipe Hands 模块和绘制工具对象
        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
                                       self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]

    def findHands(self, img, draw=True):
        """
        检测手势并在图像上绘制关键点和连接线。

        Args:
            img (numpy.ndarray): 输入图像。
            draw (bool): 是否在图像上绘制标记。默认为True。

        Returns:
            numpy.ndarray: 绘制了关键点和连接线的图像。
        """
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)
        
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

 

This is the first small module for creating gesture detector objects and doing gesture detection and drawing keypoints and connecting lines on the image. Here is a detailed explanation of the first small module:

  1. handDetectorClass: Defines the gesture detector object, which has the following initialization parameters and methods.

    • __init__(self, mode=False, maxHands=2, detectionCon=0.5, trackCon=0.5): Initialization function, create gesture detector object and set related parameters.

      • mode: Whether to detect multiple hands, the default is False.
      • maxHands: The number of hands to detect at most, the default is 2.
      • detectionCon: Confidence threshold for gesture detection, the default is 0.5.
      • trackCon: Confidence threshold for gesture tracking, default is 0.5.
    • findHands(self, img, draw=True): Detect gestures and draw keypoints and connecting lines on the image.

      • img: input image (numpy array).
      • draw: Whether to draw markers on the image, default is True.
    • mpHands: Mediapipe Hands module.

    • hands: Hand model, used for gesture detection.

    • mpDraw: Mediapipe drawing tool.

    • tipIds: ID list of key points at the end of the finger.

  2. findHandsMethod: Receive an input image, detect gestures, and draw keypoints and connecting lines on the image.

    • img: input image (numpy array).
    • draw: Whether to draw markers on the image, default is True.
  3. imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB): Convert an image in BGR format to RGB format for Mediapipe processing.

  4. self.results = self.hands.process(imgRGB): Use the Mediapipe Hand model to process images and get gesture detection results.

  5. if self.results.multi_hand_landmarks:: If multiple hands are detected.

  6. for handLms in self.results.multi_hand_landmarks:: Iterate over each detected hand.

  7. self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS): Draw gesture key points and connecting lines on the image.

  8. Returns an image with keypoints and connecting lines drawn.

    def findPosition(self, img, handNo=0, draw=True):
        """
        获取手指关键点位置和包围框。

        Args:
            img (numpy.ndarray): 输入图像。
            handNo (int): 指定要分析的手的索引。默认为0,即第一只手。
            draw (bool): 是否在图像上绘制标记。默认为True。

        Returns:
            list: 手指关键点列表。
            tuple: 包围框坐标 (xmin, ymin, xmax, ymax)。
        """
        xList = []
        yList = []
        bbox = []
        self.lmList = []

        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                # 获取关键点在图像中的坐标
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                xList.append(cx)
                yList.append(cy)
                self.lmList.append([id, cx, cy])
                if draw:
                    # 在图像上绘制关键点
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)

            xmin, xmax = min(xList), max(xList)
            ymin, ymax = min(yList), max(yList)
            bbox = xmin, ymin, xmax, ymax

            if draw:
                # 在图像上绘制包围框
                cv2.rectangle(img, (xmin - 20, ymin - 20), (xmax + 20, ymax + 20),
                              (0, 255, 0), 2)

        return self.lmList, bbox

 

This is the second small module, which is used to obtain the key point position and bounding box of the finger. Here is a detailed explanation of the second small module:

  1. findPositionMethod: draw the finger key points on the image, and return the coordinate list of the finger key points and the bounding box coordinates of the hand.

    • img: input image (numpy array).
    • handNo: Specifies the index of the hand to analyze, defaults to 0, which is the first hand.
    • draw: Whether to draw markers on the image, default is True.
  2. xListand yList: used to store the x and y coordinates of the key points of the finger.

  3. bbox: The coordinates of the bounding box, used to determine the position of the hand.

  4. self.lmList: A list of finger key points, in the format of [id, x, y].

  5. myHand = self.results.multi_hand_landmarks[handNo]: Get the gesture key point information of the specified index.

  6. for id, lm in enumerate(myHand.landmark):: Traverse the key points of the hand.

  7. h, w, c = img.shape: Get the height, width and number of channels of the image.

  8. cx, cy = int(lm.x * w), int(lm.y * h): Calculate the coordinates of key points in the image.

  9. xList.append(cx)and yList.append(cy): Add coordinates to the list.

  10. self.lmList.append([id, cx, cy]): Add the key point information to the key point list.

  11. if draw:: True if the flag is drawn.

  12. cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED): Draw keypoints on the image.

  13. xmin, xmax = min(xList), max(xList)and ymin, ymax = min(yList), max(yList): Compute the coordinates of the bounding box.

  14. bbox = xmin, ymin, xmax, ymax: Set the coordinates of the bounding box.

  15. cv2.rectangle(img, (xmin - 20, ymin - 20), (xmax + 20, ymax + 20), (0, 255, 0), 2): Draw a bounding box on the image.

  16. Returns a list of finger keypoints and bounding box coordinates.

    def fingersUp(self):
        """
        判断手指是否伸展。

        Returns:
            list: 包含每个手指的状态,1表示伸展,0表示弯曲。
        """
        fingers = []

        # 判断拇指是否伸展
        if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
            fingers.append(1)
        else:
            fingers.append(0)

        # 判断其他手指是否伸展
        for id in range(1, 5):
            if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                fingers.append(1)
            else:
                fingers.append(0)

        return fingers

    def findDistance(self, p1, p2, img, draw=True, r=15, t=3):
        """
        计算两个关键点之间的距离。

        Args:
            p1 (int): 第一个关键点的索引。
            p2 (int): 第二个关键点的索引。
            img (numpy.ndarray): 输入图像。
            draw (bool): 是否在图像上绘制标记。默认为True。
            r (int): 圆的半径,用于标记关键点。默认为15。
            t (int): 绘制线条的粗细。默认为3。

        Returns:
            float: 两个关键点之间的距离。
            numpy.ndarray: 绘制了距离标记的图像。
            list: 包含关键点坐标的列表 [x1, y1, x2, y2, cx, cy]。
        """
        x1, y1 = self.lmList[p1][1:]
        x2, y2 = self.lmList[p2][1:]
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

        if draw:
            # 在图像上绘制线条和关键点
            cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
            cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
        
        # 计算两个关键点之间的距离
        length = math.hypot(x2 - x1, y2 - y1)

        return length, img, [x1, y1, x2, y2, cx, cy]

 

This is the third small module, which is used to judge whether the finger is stretched and calculate the distance between two key points. The following is a detailed explanation of the third small module:

  1. fingersUpMethod: Determine whether each finger is stretched, and return a list containing the state of the fingers.

    • Return value: A list containing the state of each finger, 1 for stretch, 0 for bend.
  2. findDistanceMethod: Calculate the distance between two keypoints and draw markers on the image.

    • p1and p2: the index of the two keypoints.
    • img: input image (numpy array).
    • draw: Whether to draw markers on the image, default is True.
    • r: The radius of the circle, used to mark key points, the default is 15.
    • t: The thickness of the drawn line, the default is 3.
  3. x1, y1 = self.lmList[p1][1:]and x2, y2 = self.lmList[p2][1:]: Get the coordinates of two keypoints.

  4. cx, cy = (x1 + x2) // 2, (y1 + y2) // 2: Calculate the center coordinates of two key points.

  5. if draw:: True if the flag is drawn.

  6. cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t): Draw a line connecting two key points on the image.

  7. cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)And cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED): Draws solid circles at two keypoints, used to mark keypoints.

  8. cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED): Draws a solid circle at the center of the keypoint to mark the center of the distance.

  9. length = math.hypot(x2 - x1, y2 - y1): Calculates the distance between two keypoints using the Pythagorean Theorem.

  10. Return Value: Returns the calculated distance, the image with the marker drawn and a list [x1, y1, x2, y2, cx, cy] containing the keypoint coordinates.

 

full code

"""
Hand Tracking Module

"""

import cv2
import mediapipe as mp
import time
import math
import numpy as np

class handDetector():
    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, trackCon=0.5):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands,
        self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)
        # print(results.multi_hand_landmarks)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                    self.mpHands.HAND_CONNECTIONS)

        return img

    def findPosition(self, img, handNo=0, draw=True):
        xList = []
        yList = []
        bbox = []
        self.lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                # print(id, lm)
                h, w, c = img.shape
                cx, cy = int(lm.x * w), int(lm.y * h)
                xList.append(cx)
                yList.append(cy)
                # print(id, cx, cy)
                self.lmList.append([id, cx, cy])
                if draw:
                    cv2.circle(img, (cx, cy), 5, (255, 0, 255), cv2.FILLED)

        xmin, xmax = min(xList), max(xList)
        ymin, ymax = min(yList), max(yList)
        bbox = xmin, ymin, xmax, ymax

        if draw:
            cv2.rectangle(img, (xmin - 20, ymin - 20), (xmax + 20, ymax + 20),
            (0, 255, 0), 2)

        return self.lmList, bbox

    def fingersUp(self):
        fingers = []
        # Thumb
        if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
            fingers.append(1)
        else:
            fingers.append(0)

        # Fingers
        for id in range(1, 5):
            if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                fingers.append(1)
            else:
                fingers.append(0)

            # totalFingers = fingers.count(1)

        return fingers

    def findDistance(self, p1, p2, img, draw=True,r=15, t=3):
        x1, y1 = self.lmList[p1][1:]
        x2, y2 = self.lmList[p2][1:]
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

        if draw:
            cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
            cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
            length = math.hypot(x2 - x1, y2 - y1)

        return length, img, [x1, y1, x2, y2, cx, cy]

def main():
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture(1)
    detector = handDetector()
    while True:
        success, img = cap.read()
        img = detector.findHands(img)
        lmList, bbox = detector.findPosition(img)
        if len(lmList) != 0:
            print(lmList[4])

        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        cv2.putText(img, str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3,
        (255, 0, 255), 3)

        cv2.imshow("Image", img)
        cv2.waitKey(1)

if __name__ == "__main__":
    main()

 

Two, the main code

import cv2
import numpy as np
import HandTrackingModule as htm
import time
import autopy

##########################
wCam, hCam = 640, 480
frameR = 100  # Frame Reduction
smoothening = 7
#########################

pTime = 0
plocX, plocY = 0, 0
clocX, clocY = 0, 0

cap = cv2.VideoCapture(0)
cap.set(3, wCam)
cap.set(4, hCam)
detector = htm.handDetector(maxHands=1)
wScr, hScr = autopy.screen.size()
# print(wScr, hScr)

while True:
    # 1. Find hand Landmarks
    success, img = cap.read()
    img = detector.findHands(img)
    lmList, bbox = detector.findPosition(img)
    # 2. Get the tip of the index and middle fingers
    if len(lmList) != 0:
        x1, y1 = lmList[8][1:]
        x2, y2 = lmList[12][1:]
        # print(x1, y1, x2, y2)

    # 3. Check which fingers are up
    fingers = detector.fingersUp()
    # print(fingers)
    cv2.rectangle(img, (frameR, frameR), (wCam - frameR, hCam - frameR),
                  (255, 0, 255), 2)
    # 4. Only Index Finger : Moving Mode
    if fingers[1] == 1 and fingers[2] == 0:
        # 5. Convert Coordinates
        x3 = np.interp(x1, (frameR, wCam - frameR), (0, wScr))
        y3 = np.interp(y1, (frameR, hCam - frameR), (0, hScr))
        # 6. Smoothen Values
        clocX = plocX + (x3 - plocX) / smoothening
        clocY = plocY + (y3 - plocY) / smoothening

        # 7. Move Mouse
        autopy.mouse.move(wScr - clocX, clocY)
        cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
        plocX, plocY = clocX, clocY

    # 8. Both Index and middle fingers are up : Clicking Mode
    if fingers[1] == 1 and fingers[2] == 1:
        # 9. Find distance between fingers
        length, img, lineInfo = detector.findDistance(8, 12, img)
        print(length)
        # 10. Click mouse if distance short
        if length < 40:
            cv2.circle(img, (lineInfo[4], lineInfo[5]),
                       15, (0, 255, 0), cv2.FILLED)
            autopy.mouse.click()

    # 11. Frame Rate
    cTime = time.time()
    fps = 1 / (cTime - pTime)
    pTime = cTime
    cv2.putText(img, str(int(fps)), (20, 50), cv2.FONT_HERSHEY_PLAIN, 3,
                (255, 0, 0), 3)
    # 12. Display
    cv2.imshow("Image", img)
    cv2.waitKey(1)

 

The following is an explanation of the main functions of the code and the operation steps:

  1. Import the required libraries and modules ( cv2, numpy, HandTrackingModule, time, autopy) and some configuration parameters.

  2. Create a camera object cap, set the width and height of the camera to wCamand hCam.

  3. Create a handDetectorobject detectorfor detecting gestures. Here we're only using one hand, so set that maxHands=1.

  4. Get the width and height of the screen for later coordinate conversion.

  5. Enters an infinite loop, continuously processing video frames.

  6. In the loop, first get a frame image from the camera, and call detector.findHandsthe method to detect gestures. Then, call detector.findPositionthe method to get the gesture's key point coordinates and bounding box.

  7. According to the detected key points of the gesture, judge the state of the finger (whether it is lifted or not).

  8. Draw a rectangle as the active area for moving the mouse.

  9. When the index finger is lifted but the middle finger is not lifted, enter the mobile mode. The gesture coordinates are converted to on-screen coordinates, then smoothed, and finally autopy.mouse.movethe mouse is moved using the method.

  10. Draw a solid circle to indicate the position of the index finger, and update the last position.

  11. When the index finger and middle finger are lifted at the same time, it enters the click mode. Calculate the distance between the index finger and middle finger, if the distance is less than the threshold, perform a mouse click operation.

  12. Calculates and displays the frame rate.

  13. Display the processed image in the window, press any key to exit the loop.

Use the images captured by the camera to detect gestures, and then simulate mouse operations according to the state of the fingers. In move mode, use the position of the index finger to move the mouse; in click mode, use the distance between the index finger and middle finger to simulate a mouse click. In this way, the function of controlling the mouse through gestures can be realized.

 

It should be noted that before running the code, you need to put your index finger in front of the camera. Otherwise an error will be reported

If you encounter other errors, you can leave a message in the comment area, and we can solve them together

 

Guess you like

Origin blog.csdn.net/weixin_45303602/article/details/132239635