Python artificial intelligence [control the mouse with gestures in the air] "free your hands"

 Hello everyone, I am the blogger of csdn: lqj_ myself

This is my personal blog homepage:

lqj_My blog_CSDN blog-WeChat applet, front-end, python field blogger lqj_ I am good at WeChat applet, front-end, python, etc. https://blog.csdn.net/lbcyllqj?spm=1011.2415 .3001.5343 Bilibili welcome attention:Xiao Miao Develop

This article "python artificial intelligence [control the mouse with gestures in the air] "free your hands"" has been stored in my python column and python artificial intelligence vision column

Table of contents

say up front

creative description

Gesture Recognition Palm Detection

code and explanation

import the corresponding library

Create a class for detecting left-handed and right-handed labels:

main function:

Call the opencv library to realize the image

Set the number of the camera [internal and external cameras]

Set the width and height of the display screen of the camera

Detect hand key point coordinates

Determine whether the index and middle fingers are extended

Judgment conditions, if only the index finger is stretched out, then enter the movement-only mode

Coordinate conversion, index finger in the window coordinates to convert the coordinates of the mouse on the desktop

Judgment conditions, if the index finger and middle finger are both extended, and the distance between the two fingers is detected to be short enough [within the set distance] corresponding to the "mouse click event"

Call the opencv library to display the image of the program

release resources

complete learning code

video display


 

say up front

With the birth of CahtGPT, an openAI company in the United States, artificial intelligence has reawakened. Under the background of such an era, the potential theme of "intelligence + everything" has evolved. Global intelligence has become an inevitable trend. Artificial intelligence is an irreplaceable product of the development of the times. As a college student, I am willing to contribute to the development of the times! ! !

creative description

I found that when we are lecturing at school, or the teacher is lecturing at school, we will constantly click the mouse to switch ppt and other functions of the computer. This is very inconvenient for me, so I developed this manual script based on this problem. It can perfectly run on any computer, and then call the camera to realize the main functions.

Gesture Recognition Palm Detection

At present, the research directions of gesture recognition at this stage are mainly divided into: gesture recognition based on wearable devices and gesture recognition based on vision methods. Gesture recognition based on wearable devices mainly obtains a large amount of sensor data by wearing gloves with a large number of sensors on the hand, and analyzes the data. Although the accuracy of this method is relatively high, it is difficult to be practically applied in daily life due to the high cost of the sensor. At the same time, the sensor glove will cause inconvenience to the user and affect further emotional analysis, so this method is more applied. In some unique relatively professional instruments. The focus of this project is on the research of gestures based on visual methods. Here, the framework of Mediapipe is used as an example to facilitate readers to better reproduce and understand related fields.

Gesture recognition based on vision methods is mainly divided into static gesture recognition and dynamic gesture recognition. From the perspective of text understanding, dynamic gesture recognition will definitely be more difficult than static gesture recognition, but static gestures are a special state of dynamic gestures. We can detect continuous dynamic videos through static gesture recognition frame by frame, and further analyze before and after Frame relationship to improve the gesture system.

MediaPipe uses a single-stage target detection algorithm SSD in training the palm model. It is optimized by using three operations at the same time: 1.NMS; 2.encoder-decoder feature extractor; 3.focal loss. NMS is mainly used to suppress multiple repeated frames of a single object identified by the algorithm, and obtain the detection frame with the highest confidence; the encoder-decoder feature extractor is mainly used for larger scene context perception, even small objects (similar to the retanet method) ; The focal loss is extracted from RetinaNet, which mainly solves the problem of unbalanced positive and negative samples. This is a technique that can increase points for target detection in an open environment. Using the techniques described above, MediaPie achieves an average accuracy of 95.7% in palm detection. Without using 2 and 3, the resulting baseline is only 86.22%. An increase of 9.48 points indicates that the model can accurately identify the palm. As for why the palm detector is used instead of the hand, the main reason is that the author believes that training the hand detector is more complicated, and the features that can be learned are not obvious, so the palm detector is made.

 

code and explanation

import the corresponding library

import cv2
import autopy
import numpy as np
import time
import math
import mediapipe as mp

Create a class for detecting left-handed and right-handed labels:

class handDetector():
    def __init__(self, mode=False, maxHands=2, model_complexity=1, detectionCon=0.8, trackCon=0.8):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon
        self.model_complexity = model_complexity

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.model_complexity,self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        print(self.results.multi_handedness)  # 获取检测结果中的左右手标签并打印

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, draw=True):
        self.lmList = []
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                for id, lm in enumerate(handLms.landmark):
                    h, w, c = img.shape
                    cx, cy = int(lm.x * w), int(lm.y * h)
                    # print(id, cx, cy)
                    self.lmList.append([id, cx, cy])
                    if draw:
                        cv2.circle(img, (cx, cy), 12, (255, 0, 255), cv2.FILLED)
        return self.lmList

    def fingersUp(self):
        fingers = []
        # 大拇指
        if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
            fingers.append(1)
        else:
            fingers.append(0)

        # 其余手指
        for id in range(1, 5):
            if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                fingers.append(1)
            else:
                fingers.append(0)

        # totalFingers = fingers.count(1)
        return fingers

    def findDistance(self, p1, p2, img, draw=True, r=15, t=3):
        x1, y1 = self.lmList[p1][1:]
        x2, y2 = self.lmList[p2][1:]
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

        if draw:
            cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
            cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
            length = math.hypot(x2 - x1, y2 - y1)

        return length, img, [x1, y1, x2, y2, cx, cy]

main function:

Detect gestures and draw skeleton information

def main():
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture(0)
    detector = handDetector()
    while True:
        success, img = cap.read()
        img = detector.findHands(img) 

Get the list of coordinate points

lmList = detector.findPosition(img)

Call the opencv library to realize the image

        if len(lmList) != 0:
            print(lmList[4])

        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        cv2.putText(img, 'fps:' + str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
        cv2.imshow('Image', img)
        cv2.waitKey(1)

Set the number of the camera [internal and external cameras]

##############################
wCam, hCam = 1000, 1000
frameR = 100
smoothening = 5
##############################
cap = cv2.VideoCapture(0)

Set the width and height of the display screen of the camera

cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
plocX, plocY = 0, 0
clocX, clocY = 0, 0

detector = handDetector()
wScr, hScr = autopy.screen.size()

Detect hand key point coordinates

while True:
    success, img = cap.read()
    # 1. 检测手部 得到手指关键点坐标
    img = detector.findHands(img)
    cv2.rectangle(img, (frameR, frameR), (wCam - frameR, hCam - frameR), (0, 255, 0), 2,  cv2.FONT_HERSHEY_PLAIN)
    lmList = detector.findPosition(img, draw=False)

Determine whether the index and middle fingers are extended

    if len(lmList) != 0:
        x1, y1 = lmList[8][1:]
        x2, y2 = lmList[12][1:]
        fingers = detector.fingersUp()

Judgment conditions, if only the index finger is stretched out, then enter the movement-only mode

if fingers[1] and fingers[2] == False:

Coordinate conversion, index finger in the window coordinates to convert the coordinates of the mouse on the desktop

x3 = np.interp(x1, (frameR, wCam - frameR), (0, wScr))
y3 = np.interp(y1, (frameR, hCam - frameR), (0, hScr))
clocX = plocX + (x3 - plocX) / smoothening
clocY = plocY + (y3 - plocY) / smoothening
autopy.mouse.move(wScr - clocX, clocY)
cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
plocX, plocY = clocX, clocY

Judgment conditions, if the index finger and middle finger are both extended, and the distance between the two fingers is detected to be short enough [within the set distance] corresponding to the "mouse click event"

        if fingers[1] and fingers[2]:
            length, img, pointInfo = detector.findDistance(8, 12, img)
            if length < 40:
                cv2.circle(img, (pointInfo[4], pointInfo[5]),
                           15, (0, 255, 0), cv2.FILLED)
                autopy.mouse.click()

Call the opencv library to display the image of the program

    cTime = time.time()
    fps = 1 / (cTime - pTime)
    pTime = cTime
    cv2.putText(img, f'fps:{int(fps)}', (15, 25),
                cv2.FONT_HERSHEY_PLAIN, 2, (255, 0, 255), 2)
    cv2.imshow("I am Ai XiaoMiao", img)
    k=cv2.waitKey(1) & 0xFF
    if k == ord(' '):  # 退出
       break

release resources

#释放摄像头
cap.release()
#释放内存
cv2.destroyAllWindows()

complete learning code

#coding=utf-8
import cv2
import autopy
import numpy as np
import time
import math
import mediapipe as mp
class handDetector():
    def __init__(self, mode=False, maxHands=2, model_complexity=1, detectionCon=0.8, trackCon=0.8):
        self.mode = mode
        self.maxHands = maxHands
        self.detectionCon = detectionCon
        self.trackCon = trackCon
        self.model_complexity = model_complexity

        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.model_complexity,self.detectionCon, self.trackCon)
        self.mpDraw = mp.solutions.drawing_utils
        self.tipIds = [4, 8, 12, 16, 20]

    def findHands(self, img, draw=True):
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
        self.results = self.hands.process(imgRGB)

        print(self.results.multi_handedness)  # 获取检测结果中的左右手标签并打印

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, draw=True):
        self.lmList = []
        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                for id, lm in enumerate(handLms.landmark):
                    h, w, c = img.shape
                    cx, cy = int(lm.x * w), int(lm.y * h)
                    # print(id, cx, cy)
                    self.lmList.append([id, cx, cy])
                    if draw:
                        cv2.circle(img, (cx, cy), 12, (255, 0, 255), cv2.FILLED)
        return self.lmList

    def fingersUp(self):
        fingers = []
        # 大拇指
        if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
            fingers.append(1)
        else:
            fingers.append(0)

        # 其余手指
        for id in range(1, 5):
            if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
                fingers.append(1)
            else:
                fingers.append(0)

        # totalFingers = fingers.count(1)
        return fingers

    def findDistance(self, p1, p2, img, draw=True, r=15, t=3):
        x1, y1 = self.lmList[p1][1:]
        x2, y2 = self.lmList[p2][1:]
        cx, cy = (x1 + x2) // 2, (y1 + y2) // 2

        if draw:
            cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
            cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
            cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
            length = math.hypot(x2 - x1, y2 - y1)

        return length, img, [x1, y1, x2, y2, cx, cy]


def main():
    pTime = 0
    cTime = 0
    cap = cv2.VideoCapture(0)
    detector = handDetector()
    while True:
        success, img = cap.read()
        img = detector.findHands(img)        # 检测手势并画上骨架信息

        lmList = detector.findPosition(img)  # 获取得到坐标点的列表




        # k = cv2.waitKey(1) & 0xFF  # 判断按键



        if len(lmList) != 0:
            print(lmList[4])

        cTime = time.time()
        fps = 1 / (cTime - pTime)
        pTime = cTime

        cv2.putText(img, 'fps:' + str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
        cv2.imshow('Image', img)
        cv2.waitKey(1)















##############################
wCam, hCam = 1000, 1000
frameR = 100
smoothening = 5
##############################
cap = cv2.VideoCapture(0)  # 若使用笔记本自带摄像头则编号为0  若使用外接摄像头 则更改为1或其他编号
#设置摄像头的呈现画面的宽高
cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
plocX, plocY = 0, 0
clocX, clocY = 0, 0

detector = handDetector()
wScr, hScr = autopy.screen.size()
# print(wScr, hScr)

while True:
    success, img = cap.read()
    # 1. 检测手部 得到手指关键点坐标
    img = detector.findHands(img)
    cv2.rectangle(img, (frameR, frameR), (wCam - frameR, hCam - frameR), (0, 255, 0), 2,  cv2.FONT_HERSHEY_PLAIN)
    lmList = detector.findPosition(img, draw=False)




    # k = cv2.waitKey() & 0xFF  # 判断按键

    # 2. 判断食指和中指是否伸出
    if len(lmList) != 0:
        x1, y1 = lmList[8][1:]
        x2, y2 = lmList[12][1:]
        fingers = detector.fingersUp()

        # 3. 若只有食指伸出 则进入移动模式
        if fingers[1] and fingers[2] == False:
            # 4. 坐标转换: 将食指在窗口坐标转换为鼠标在桌面的坐标
            # 鼠标坐标
            x3 = np.interp(x1, (frameR, wCam - frameR), (0, wScr))
            y3 = np.interp(y1, (frameR, hCam - frameR), (0, hScr))

            # smoothening values
            clocX = plocX + (x3 - plocX) / smoothening
            clocY = plocY + (y3 - plocY) / smoothening

            autopy.mouse.move(wScr - clocX, clocY)
            cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
            plocX, plocY = clocX, clocY

        # 5. 若是食指和中指都伸出 则检测指头距离 距离够短则对应鼠标点击
        if fingers[1] and fingers[2]:
            length, img, pointInfo = detector.findDistance(8, 12, img)
            if length < 40:
                cv2.circle(img, (pointInfo[4], pointInfo[5]),
                           15, (0, 255, 0), cv2.FILLED)
                autopy.mouse.click()

    cTime = time.time()
    fps = 1 / (cTime - pTime)
    pTime = cTime
    cv2.putText(img, f'fps:{int(fps)}', (15, 25),
                cv2.FONT_HERSHEY_PLAIN, 2, (255, 0, 255), 2)
    cv2.imshow("I am Ai XiaoMiao", img)
    k=cv2.waitKey(1) & 0xFF
    if k == ord(' '):  # 退出
       break
#释放摄像头
cap.release()
#释放内存
cv2.destroyAllWindows()

video display

Guess you like

Origin blog.csdn.net/lbcyllqj/article/details/130461266