Gesture control computer mouse based on mediapipe and opencv

Through my last article, you can learn about how mediapipe uses hand detection. Then we can do some cool things. In this article, I will explain how to use gestures to control the computer mouse.

Before we start, we want to introduce a library pyautogui that can operate the computer mouse. Here I will briefly introduce some functions of the library, so that you can watch the final source code

Function name effect
pyautogui.size() Get the resolution of the screen, the return value is width, height
pyautogui.click(x,y,button = ‘left/right’) Perform a left or right click on the screen (x, y)
pyautogui.doubleClick(x,y) Double-click the left button at (x,y) on the screen
pyautogui.moveTo(x, y, duration=0) Move the mouse to the specified (x, y); duration is used to set the moving time, which is an optional parameter
pyautogui.FAILSAFE =True By default, this function is True, which means: when the mouse is in the upper left corner of the screen, the program will report an error; the purpose is to prevent the program from controlling the mouse all the time and the program cannot end

Since the use of mediapipe has been introduced in the previous article, it will not be repeated here. If you don’t know what mediapipe is, click here to read my last article

train of thought

First use opencv to call the camera, read the image from the camera, reverse the image (the image read by the camera is opposite to the reality), and convert the image to RGB mode. Then use mediapipe for hand detection, and use a list to store the coordinates of 21 key points of the hand. Use this list to detect:
        when the index finger of the finger is less than 160 degrees, it is determined that the
        mouse clicks the left button; when the middle index of the finger is less than 160 degrees, it is determined that the mouse clicks the right button;
        when the distance between the index finger and the median is less than 40, it is determined that the mouse click double click Left button
        (if you don’t know how to detect the angle of the finger, please read my previous article)
Then I use the median of the coordinates of 0 and 9 of the key point of the hand to replace the mouse, which is probably the green point I marked in the picture below s position
insert image description here

Difficulties in the realization process

There are two difficulties here: First, I get the screen resolution (width, height) at the beginning, and then get the image size (x, y) of each frame, and then get the proportional relationship (ratio_x =width/x, ratio_y=height/y), when the coordinates of the green point in the above figure are multiplied by the proportional relationship, the position of the mouse on the screen is obtained. This looks good, but in practice there are big problems, because you will find that the mouse on the screen will not reach the edge area of ​​​​the screen (you can test it yourself).
Second, if the finger hovers over a certain point, the mouse on the computer will keep shaking. Because your finger will be shaking all the time and cannot be completely still, this subtle change will be multiplied by the ratio, and it will be magnified, which will cause the mouse on the screen to shake while the finger seems to be hovering.

Solution:

First: I saw numpy.interp() on the Internet, which is a one-dimensional interpolation function (the parameters of this function will not be explained here, I will explain it in the final source code), as shown in the figure below The effect diagram of the interpolation function. The x and y axes on the diagram represent the size of the image and the screen. Although the interpolation here and the scaling method I use can both be seen as linear equations of a line, the interpolation function works better. This confuses me.
insert image description here
Second: For the hovering of the finger, the mouse keeps shaking. We can first record the coordinates of the mouse on the screen (pre_x, pre_y), and when we get the next mouse coordinates (mouse_x, mouse_y), we can calculate x = (mouse_x - pre_x) / smooth
and y = (mouse_y - pre_y )/smooth
(mouse_x - pre_x) and (mouse_y - pre_y) here are equivalent to calculating the distance of the mouse twice, dividing by smooth is
equivalent to shortening the distance, and then using (pre_x + x, pre_y + y) for The current coordinate position of the mouse, and then update
pre_x = pre_x + x, pre_y = pre_y + y. In this way, the distance between the front and rear mouse is reduced to prevent the mouse from shaking, and the position of the mouse is updated.
Note here that the subtraction order of (mouse_x - pre_x) and (mouse_y - pre_y) and smooth must be greater than 1 (less than 1 is equivalent to expanding the distance). When the value of smooth is larger, the mouse will move slower and the mouse Will be stable; the smaller the value, the faster the mouse movement will be, and the mouse will shake.

full code;

import pyautogui
import time
import cv2
import numpy as np
import math
import mediapipe as mp

# 获取手指的坐标,全部获取存入列表中并返回该列表
def finger_coordinate(hand):
    finger = []
    for handlms in hand.multi_hand_landmarks:
        for lm in handlms.landmark:
            x, y = int(lm.x * cap_x), int(lm.y * cap_y)
            finger.append([x, y])
        #mpdraw.draw_landmarks(frame, handlms, mphand.HAND_CONNECTIONS)
    return finger

#判断手指的弯曲度数以及食指和中值的距离,将判断的结果存入judge_finfer列表中,
#如果食指弯曲则judge_finger[0] = 1,中指弯曲则judge_finger[1] = 1,如果两手指的距离小于40则judge_finger[2] = 1
#最后返回judge_finger
def check_finger(finger):
    judge_finger = [0,0,0]
    for i,id in enumerate([5,9]):
        a = round(math.hypot(finger[id][0]-finger[id+1][0],finger[id][1]-finger[id+1][1]),2)
        b = round(math.hypot(finger[id+1][0]-finger[id+2][0],finger[id+1][1]-finger[id+2][1]),2)
        c = round(math.hypot(finger[id][0]-finger[id+2][0],finger[id][1]-finger[id+2][1]),2)
        try:
            angle = math.acos((a**2+b**2-c**2)/(2*a*b))*57
        except ValueError:
            angle = 180
        except ZeroDivisionError:
            angel = 0
        if angle < 160:
            judge_finger[i] = 1

    dist = math.hypot(finger[8][0]-finger[12][0],finger[8][1]-finger[12][1])
    cv2.circle(frame, finger[8], 25, (255, 0, 255), -1, )
    cv2.circle(frame, finger[12], 25, (255, 0, 255), -1, )
    if dist < 40:
        judge_finger[2] = 1
    return judge_finger


########################
#变量设置区域
offset_x = 60
offset_y = 150

smooth = 5
pre_x = 0
pre_y = 0
########################

########################
#初始化区域
cap = cv2.VideoCapture(0)

cap_x = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
cap_y = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
screen_x,screen_y = pyautogui.size()

mphand = mp.solutions.hands
hands = mphand.Hands()
mpdraw = mp.solutions.drawing_utils
########################


while cap.isOpened():
    #从摄像头中获取图像,并将图像反转,转换为RGB模式
    _,frame = cap.read()
    frame = cv2.flip(frame, 1)
    img = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
    cv2.rectangle(frame, (offset_x, offset_y), (cap_x - offset_x, cap_y), (0, 255, 0), 2)

    #进行手势的检测,返回关于手势的检测值
    hand = hands.process(img)
    #判断是否检测到手部
    if hand.multi_hand_landmarks:
        #检测手指的坐标值,并返回列表
        finger = finger_coordinate(hand)

        #1.确定鼠标的坐标

        #以编号0和9的中值为鼠标点
        x,y = (finger[0][0]+finger[9][0])//2,(finger[0][1]+finger[9][1])//2
        cv2.circle(frame, (x,y), 25, (255, 0, 255), -1)
        #使用插值法进行图像点到屏幕鼠标点的映射
        mouse_x = np.interp(x,(offset_x,cap_x-offset_x),(0,screen_x))
        mouse_y = np.interp(y,(offset_y,cap_y),(0,screen_y))
        #平滑鼠标坐标
        mouse_x = pre_x + (mouse_x-pre_x) / smooth
        mouse_y = pre_y + (mouse_y-pre_y) / smooth
        pre_x = mouse_x
        pre_y = mouse_y
        #将确定的鼠标点进行在屏幕上移动
        pyautogui.moveTo(mouse_x, mouse_y, duration=0)

        #2.检测手指来确定鼠标的点击操作

        #以第二根和第三根手指弯曲的角度为左右键,小于160度视为点击操作,以及两手指的距离小于40视为左键双击
        judge_finger = check_finger(finger)
        # 每次只能进行一个操作,所以返回值只能有一个为真,如果有超过1个为真,则进行下一次循环
        count = judge_finger.count(1)
        if count == 1:
            index = judge_finger.index(1)
            if index == 0:
                pyautogui.click(mouse_x,mouse_y,button = 'left')
                cv2.putText(frame, 'click left', (10, 20), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 1)
            elif index == 1:
                pyautogui.click(mouse_x,mouse_y,button = 'right')
                cv2.putText(frame, 'click right', (10, 20), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 1)
            else:
                pyautogui.doubleClick(mouse_x,mouse_y)
                cv2.putText(frame, 'click double', (10, 20), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 1)
        else:
            cv2.putText(frame,'no or more operate',(10,20),cv2.FONT_HERSHEY_COMPLEX,1,(0,255,0),1)
    else:
        cv2.putText(frame, 'plese put your hand', (10, 20), cv2.FONT_HERSHEY_COMPLEX, 1, (0, 255, 0), 1)

    #显示图像,并在按下ESC按键时退出程序
    cv2.imshow('img',frame)
    ret = cv2.waitKey(1)
    if ret == 27:
        break

#释放摄像头
cap.release()
cv2.destroyAllWindows()

'''
英语单词
capture  捕获
frame   帧
coordinate 坐标
offset 偏移
smooth 平滑
previous  以前
'''

Summarize

Today's computer vision, speech recognition, and natural language processing are all inseparable from machine learning and deep learning. It just so happens that the college now also offers a course on machine learning, and the content and formulas in it are really daunting, so next I plan to record my learning process and my understanding and perception, and share it with everyone. Creation is not easy, I hope everyone can support it.insert image description here

Guess you like

Origin blog.csdn.net/m0_59151709/article/details/129282975