Hello everyone, I am the blogger of csdn: lqj_ myself
This is my personal blog homepage:
lqj_My blog_CSDN blog-WeChat applet, front-end, python field blogger lqj_ I am good at WeChat applet, front-end, python, etc. https://blog.csdn.net/lbcyllqj?spm=1011.2415 .3001.5343 Bilibili welcome attention:Xiao Miao Develop
This article "python artificial intelligence [control the mouse with gestures in the air] "free your hands"" has been stored in my python column and python artificial intelligence vision column
Table of contents
Gesture Recognition Palm Detection
import the corresponding library
Create a class for detecting left-handed and right-handed labels:
Call the opencv library to realize the image
Set the number of the camera [internal and external cameras]
Set the width and height of the display screen of the camera
Detect hand key point coordinates
Determine whether the index and middle fingers are extended
Judgment conditions, if only the index finger is stretched out, then enter the movement-only mode
Call the opencv library to display the image of the program
say up front
With the birth of CahtGPT, an openAI company in the United States, artificial intelligence has reawakened. Under the background of such an era, the potential theme of "intelligence + everything" has evolved. Global intelligence has become an inevitable trend. Artificial intelligence is an irreplaceable product of the development of the times. As a college student, I am willing to contribute to the development of the times! ! !
creative description
I found that when we are lecturing at school, or the teacher is lecturing at school, we will constantly click the mouse to switch ppt and other functions of the computer. This is very inconvenient for me, so I developed this manual script based on this problem. It can perfectly run on any computer, and then call the camera to realize the main functions.
Gesture Recognition Palm Detection
At present, the research directions of gesture recognition at this stage are mainly divided into: gesture recognition based on wearable devices and gesture recognition based on vision methods. Gesture recognition based on wearable devices mainly obtains a large amount of sensor data by wearing gloves with a large number of sensors on the hand, and analyzes the data. Although the accuracy of this method is relatively high, it is difficult to be practically applied in daily life due to the high cost of the sensor. At the same time, the sensor glove will cause inconvenience to the user and affect further emotional analysis, so this method is more applied. In some unique relatively professional instruments. The focus of this project is on the research of gestures based on visual methods. Here, the framework of Mediapipe is used as an example to facilitate readers to better reproduce and understand related fields.
Gesture recognition based on vision methods is mainly divided into static gesture recognition and dynamic gesture recognition. From the perspective of text understanding, dynamic gesture recognition will definitely be more difficult than static gesture recognition, but static gestures are a special state of dynamic gestures. We can detect continuous dynamic videos through static gesture recognition frame by frame, and further analyze before and after Frame relationship to improve the gesture system.
MediaPipe uses a single-stage target detection algorithm SSD in training the palm model. It is optimized by using three operations at the same time: 1.NMS; 2.encoder-decoder feature extractor; 3.focal loss. NMS is mainly used to suppress multiple repeated frames of a single object identified by the algorithm, and obtain the detection frame with the highest confidence; the encoder-decoder feature extractor is mainly used for larger scene context perception, even small objects (similar to the retanet method) ; The focal loss is extracted from RetinaNet, which mainly solves the problem of unbalanced positive and negative samples. This is a technique that can increase points for target detection in an open environment. Using the techniques described above, MediaPie achieves an average accuracy of 95.7% in palm detection. Without using 2 and 3, the resulting baseline is only 86.22%. An increase of 9.48 points indicates that the model can accurately identify the palm. As for why the palm detector is used instead of the hand, the main reason is that the author believes that training the hand detector is more complicated, and the features that can be learned are not obvious, so the palm detector is made.
code and explanation
import the corresponding library
import cv2
import autopy
import numpy as np
import time
import math
import mediapipe as mp
Create a class for detecting left-handed and right-handed labels:
class handDetector():
def __init__(self, mode=False, maxHands=2, model_complexity=1, detectionCon=0.8, trackCon=0.8):
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.trackCon = trackCon
self.model_complexity = model_complexity
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.model_complexity,self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
self.tipIds = [4, 8, 12, 16, 20]
def findHands(self, img, draw=True):
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
print(self.results.multi_handedness) # 获取检测结果中的左右手标签并打印
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
return img
def findPosition(self, img, draw=True):
self.lmList = []
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
for id, lm in enumerate(handLms.landmark):
h, w, c = img.shape
cx, cy = int(lm.x * w), int(lm.y * h)
# print(id, cx, cy)
self.lmList.append([id, cx, cy])
if draw:
cv2.circle(img, (cx, cy), 12, (255, 0, 255), cv2.FILLED)
return self.lmList
def fingersUp(self):
fingers = []
# 大拇指
if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
fingers.append(1)
else:
fingers.append(0)
# 其余手指
for id in range(1, 5):
if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
fingers.append(1)
else:
fingers.append(0)
# totalFingers = fingers.count(1)
return fingers
def findDistance(self, p1, p2, img, draw=True, r=15, t=3):
x1, y1 = self.lmList[p1][1:]
x2, y2 = self.lmList[p2][1:]
cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
if draw:
cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
length = math.hypot(x2 - x1, y2 - y1)
return length, img, [x1, y1, x2, y2, cx, cy]
main function:
Detect gestures and draw skeleton information
def main():
pTime = 0
cTime = 0
cap = cv2.VideoCapture(0)
detector = handDetector()
while True:
success, img = cap.read()
img = detector.findHands(img)
Get the list of coordinate points
lmList = detector.findPosition(img)
Call the opencv library to realize the image
if len(lmList) != 0:
print(lmList[4])
cTime = time.time()
fps = 1 / (cTime - pTime)
pTime = cTime
cv2.putText(img, 'fps:' + str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow('Image', img)
cv2.waitKey(1)
Set the number of the camera [internal and external cameras]
##############################
wCam, hCam = 1000, 1000
frameR = 100
smoothening = 5
##############################
cap = cv2.VideoCapture(0)
Set the width and height of the display screen of the camera
cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
plocX, plocY = 0, 0
clocX, clocY = 0, 0
detector = handDetector()
wScr, hScr = autopy.screen.size()
Detect hand key point coordinates
while True:
success, img = cap.read()
# 1. 检测手部 得到手指关键点坐标
img = detector.findHands(img)
cv2.rectangle(img, (frameR, frameR), (wCam - frameR, hCam - frameR), (0, 255, 0), 2, cv2.FONT_HERSHEY_PLAIN)
lmList = detector.findPosition(img, draw=False)
Determine whether the index and middle fingers are extended
if len(lmList) != 0:
x1, y1 = lmList[8][1:]
x2, y2 = lmList[12][1:]
fingers = detector.fingersUp()
Judgment conditions, if only the index finger is stretched out, then enter the movement-only mode
if fingers[1] and fingers[2] == False:
Coordinate conversion, index finger in the window coordinates to convert the coordinates of the mouse on the desktop
x3 = np.interp(x1, (frameR, wCam - frameR), (0, wScr))
y3 = np.interp(y1, (frameR, hCam - frameR), (0, hScr))
clocX = plocX + (x3 - plocX) / smoothening
clocY = plocY + (y3 - plocY) / smoothening
autopy.mouse.move(wScr - clocX, clocY)
cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
plocX, plocY = clocX, clocY
Judgment conditions, if the index finger and middle finger are both extended, and the distance between the two fingers is detected to be short enough [within the set distance] corresponding to the "mouse click event"
if fingers[1] and fingers[2]:
length, img, pointInfo = detector.findDistance(8, 12, img)
if length < 40:
cv2.circle(img, (pointInfo[4], pointInfo[5]),
15, (0, 255, 0), cv2.FILLED)
autopy.mouse.click()
Call the opencv library to display the image of the program
cTime = time.time()
fps = 1 / (cTime - pTime)
pTime = cTime
cv2.putText(img, f'fps:{int(fps)}', (15, 25),
cv2.FONT_HERSHEY_PLAIN, 2, (255, 0, 255), 2)
cv2.imshow("I am Ai XiaoMiao", img)
k=cv2.waitKey(1) & 0xFF
if k == ord(' '): # 退出
break
release resources
#释放摄像头
cap.release()
#释放内存
cv2.destroyAllWindows()
complete learning code
#coding=utf-8
import cv2
import autopy
import numpy as np
import time
import math
import mediapipe as mp
class handDetector():
def __init__(self, mode=False, maxHands=2, model_complexity=1, detectionCon=0.8, trackCon=0.8):
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.trackCon = trackCon
self.model_complexity = model_complexity
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.model_complexity,self.detectionCon, self.trackCon)
self.mpDraw = mp.solutions.drawing_utils
self.tipIds = [4, 8, 12, 16, 20]
def findHands(self, img, draw=True):
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
print(self.results.multi_handedness) # 获取检测结果中的左右手标签并打印
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms, self.mpHands.HAND_CONNECTIONS)
return img
def findPosition(self, img, draw=True):
self.lmList = []
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
for id, lm in enumerate(handLms.landmark):
h, w, c = img.shape
cx, cy = int(lm.x * w), int(lm.y * h)
# print(id, cx, cy)
self.lmList.append([id, cx, cy])
if draw:
cv2.circle(img, (cx, cy), 12, (255, 0, 255), cv2.FILLED)
return self.lmList
def fingersUp(self):
fingers = []
# 大拇指
if self.lmList[self.tipIds[0]][1] > self.lmList[self.tipIds[0] - 1][1]:
fingers.append(1)
else:
fingers.append(0)
# 其余手指
for id in range(1, 5):
if self.lmList[self.tipIds[id]][2] < self.lmList[self.tipIds[id] - 2][2]:
fingers.append(1)
else:
fingers.append(0)
# totalFingers = fingers.count(1)
return fingers
def findDistance(self, p1, p2, img, draw=True, r=15, t=3):
x1, y1 = self.lmList[p1][1:]
x2, y2 = self.lmList[p2][1:]
cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
if draw:
cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), t)
cv2.circle(img, (x1, y1), r, (255, 0, 255), cv2.FILLED)
cv2.circle(img, (x2, y2), r, (255, 0, 255), cv2.FILLED)
cv2.circle(img, (cx, cy), r, (0, 0, 255), cv2.FILLED)
length = math.hypot(x2 - x1, y2 - y1)
return length, img, [x1, y1, x2, y2, cx, cy]
def main():
pTime = 0
cTime = 0
cap = cv2.VideoCapture(0)
detector = handDetector()
while True:
success, img = cap.read()
img = detector.findHands(img) # 检测手势并画上骨架信息
lmList = detector.findPosition(img) # 获取得到坐标点的列表
# k = cv2.waitKey(1) & 0xFF # 判断按键
if len(lmList) != 0:
print(lmList[4])
cTime = time.time()
fps = 1 / (cTime - pTime)
pTime = cTime
cv2.putText(img, 'fps:' + str(int(fps)), (10, 70), cv2.FONT_HERSHEY_PLAIN, 3, (255, 0, 255), 3)
cv2.imshow('Image', img)
cv2.waitKey(1)
##############################
wCam, hCam = 1000, 1000
frameR = 100
smoothening = 5
##############################
cap = cv2.VideoCapture(0) # 若使用笔记本自带摄像头则编号为0 若使用外接摄像头 则更改为1或其他编号
#设置摄像头的呈现画面的宽高
cap.set(3, wCam)
cap.set(4, hCam)
pTime = 0
plocX, plocY = 0, 0
clocX, clocY = 0, 0
detector = handDetector()
wScr, hScr = autopy.screen.size()
# print(wScr, hScr)
while True:
success, img = cap.read()
# 1. 检测手部 得到手指关键点坐标
img = detector.findHands(img)
cv2.rectangle(img, (frameR, frameR), (wCam - frameR, hCam - frameR), (0, 255, 0), 2, cv2.FONT_HERSHEY_PLAIN)
lmList = detector.findPosition(img, draw=False)
# k = cv2.waitKey() & 0xFF # 判断按键
# 2. 判断食指和中指是否伸出
if len(lmList) != 0:
x1, y1 = lmList[8][1:]
x2, y2 = lmList[12][1:]
fingers = detector.fingersUp()
# 3. 若只有食指伸出 则进入移动模式
if fingers[1] and fingers[2] == False:
# 4. 坐标转换: 将食指在窗口坐标转换为鼠标在桌面的坐标
# 鼠标坐标
x3 = np.interp(x1, (frameR, wCam - frameR), (0, wScr))
y3 = np.interp(y1, (frameR, hCam - frameR), (0, hScr))
# smoothening values
clocX = plocX + (x3 - plocX) / smoothening
clocY = plocY + (y3 - plocY) / smoothening
autopy.mouse.move(wScr - clocX, clocY)
cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
plocX, plocY = clocX, clocY
# 5. 若是食指和中指都伸出 则检测指头距离 距离够短则对应鼠标点击
if fingers[1] and fingers[2]:
length, img, pointInfo = detector.findDistance(8, 12, img)
if length < 40:
cv2.circle(img, (pointInfo[4], pointInfo[5]),
15, (0, 255, 0), cv2.FILLED)
autopy.mouse.click()
cTime = time.time()
fps = 1 / (cTime - pTime)
pTime = cTime
cv2.putText(img, f'fps:{int(fps)}', (15, 25),
cv2.FONT_HERSHEY_PLAIN, 2, (255, 0, 255), 2)
cv2.imshow("I am Ai XiaoMiao", img)
k=cv2.waitKey(1) & 0xFF
if k == ord(' '): # 退出
break
#释放摄像头
cap.release()
#释放内存
cv2.destroyAllWindows()