Gesture recognition is a human-computer interaction technology that realizes the operation and control of computers, smart phones, smart TVs and other devices by recognizing human gestures.
1. Opencv implements hand tracking (locating key points of the hand)
3. Opencv practical project gesture recognition - gesture control mouse
4. opencv actual combat project gesture recognition - gesture control keyboard
to be continued
This column records the author's learning journey and will continue to be updated. Welcome to subscribe and learn together
This project uses Google's open source framework mediapipe , which has a lot of models for us to use, such as face detection, body detection, hand detection, etc.
The code needs to use opencv HandTraqckModule module mediapipe module and a keyboard control module pynput, cvzone module
One, HandTraqckModule module
In the previous article, there is a tutorial on encapsulating the hand detection module. Here is a brief introduction. There are new modules that can be easily learned.
import cv2
import mediapipe as mp
import math
class HandDetector:
"""
Finds Hands using the mediapipe library. Exports the landmarks
in pixel format. Adds extra functionalities like finding how
many fingers are up or the distance between two fingers. Also
provides bounding box info of the hand found.
"""
def __init__(self, mode=False, maxHands=2, detectionCon=0.5, minTrackCon=0.5):
"""
:param mode: In static mode, detection is done on each image: slower
:param maxHands: Maximum number of hands to detect
:param detectionCon: Minimum Detection Confidence Threshold
:param minTrackCon: Minimum Tracking Confidence Threshold
"""
self.mode = mode
self.maxHands = maxHands
self.detectionCon = detectionCon
self.minTrackCon = minTrackCon
self.mpHands = mp.solutions.hands
self.hands = self.mpHands.Hands(self.mode, self.maxHands,
self.detectionCon, self.minTrackCon)
self.mpDraw = mp.solutions.drawing_utils
self.tipIds = [4, 8, 12, 16, 20]
self.fingers = []
self.lmList = []
def findHands(self, img, draw=True):
"""
Finds hands in a BGR image.
:param img: Image to find the hands in.
:param draw: Flag to draw the output on the image.
:return: Image with or without drawings
"""
imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)
self.results = self.hands.process(imgRGB)
if self.results.multi_hand_landmarks:
for handLms in self.results.multi_hand_landmarks:
if draw:
self.mpDraw.draw_landmarks(img, handLms,
self.mpHands.HAND_CONNECTIONS)
return img
def findPosition(self, img, handNo=0, draw=True):
"""
Finds landmarks of a single hand and puts them in a list
in pixel format. Also finds the bounding box around the hand.
:param img: main image to find hand in
:param handNo: hand id if more than one hand detected
:param draw: Flag to draw the output on the image.
:return: list of landmarks in pixel format; bounding box
"""
xList = []
yList = []
bbox = []
bboxInfo = []
self.lmList = []
if self.results.multi_hand_landmarks:
myHand = self.results.multi_hand_landmarks[handNo]
for id, lm in enumerate(myHand.landmark):
h, w, c = img.shape
px, py = int(lm.x * w), int(lm.y * h)
xList.append(px)
yList.append(py)
self.lmList.append([px, py])
if draw:
cv2.circle(img, (px, py), 5, (255, 0, 255), cv2.FILLED)
xmin, xmax = min(xList), max(xList)
ymin, ymax = min(yList), max(yList)
boxW, boxH = xmax - xmin, ymax - ymin
bbox = xmin, ymin, boxW, boxH
cx, cy = bbox[0] + (bbox[2] // 2), \
bbox[1] + (bbox[3] // 2)
bboxInfo = {"id": id, "bbox": bbox, "center": (cx, cy)}
if draw:
cv2.rectangle(img, (bbox[0] - 20, bbox[1] - 20),
(bbox[0] + bbox[2] + 20, bbox[1] + bbox[3] + 20),
(0, 255, 0), 2)
return self.lmList, bboxInfo
def fingersUp(self):
"""
Finds how many fingers are open and returns in a list.
Considers left and right hands separately
:return: List of which fingers are up
"""
if self.results.multi_hand_landmarks:
myHandType = self.handType()
fingers = []
# Thumb
if myHandType == "Right":
if self.lmList[self.tipIds[0]][0] > self.lmList[self.tipIds[0] - 1][0]:
fingers.append(1)
else:
fingers.append(0)
else:
if self.lmList[self.tipIds[0]][0] < self.lmList[self.tipIds[0] - 1][0]:
fingers.append(1)
else:
fingers.append(0)
# 4 Fingers
for id in range(1, 5):
if self.lmList[self.tipIds[id]][1] < self.lmList[self.tipIds[id] - 2][1]:
fingers.append(1)
else:
fingers.append(0)
return fingers
def findDistance(self, p1, p2, img, draw=True):
"""
Find the distance between two landmarks based on their
index numbers.
:param p1: Point1 - Index of Landmark 1.
:param p2: Point2 - Index of Landmark 2.
:param img: Image to draw on.
:param draw: Flag to draw the output on the image.
:return: Distance between the points
Image with output drawn
Line information
"""
if self.results.multi_hand_landmarks:
x1, y1 = self.lmList[p1][0], self.lmList[p1][1]
x2, y2 = self.lmList[p2][0], self.lmList[p2][1]
cx, cy = (x1 + x2) // 2, (y1 + y2) // 2
if draw:
cv2.circle(img, (x1, y1), 15, (255, 0, 255), cv2.FILLED)
cv2.circle(img, (x2, y2), 15, (255, 0, 255), cv2.FILLED)
cv2.line(img, (x1, y1), (x2, y2), (255, 0, 255), 3)
cv2.circle(img, (cx, cy), 15, (255, 0, 255), cv2.FILLED)
length = math.hypot(x2 - x1, y2 - y1)
return length, img, [x1, y1, x2, y2, cx, cy]
def handType(self):
"""
Checks if the hand is left or right
:return: "Right" or "Left"
"""
if self.results.multi_hand_landmarks:
if self.lmList[17][0] < self.lmList[5][0]:
return "Right"
else:
return "Left"
def main():
cap = cv2.VideoCapture(0)
detector = HandDetector(detectionCon=0.8, maxHands=1)
while True:
# Get image frame
success, img = cap.read()
# Find the hand and its landmarks
img = detector.findHands(img)
lmList, bboxInfo = detector.findPosition(img)
print(detector.handType())
# Display
cv2.imshow("Image", img)
cv2.waitKey(1)
if __name__ == "__main__":
main()
-
Imported libraries: Necessary libraries are imported, including OpenCV (
cv2
) for image processing and display, Mediapipe (mediapipe
) for hand detection and tracking, and math libraries (math
). -
HandDetector
Class: This is the main gesture detector class, providing several methods to handle hand detection and analyzing gestures.-
__init__
Method: Initialize the parameters of the detector, such as detection mode, maximum detection lot, confidence threshold for detection and tracking, etc. -
findHands
Approach: Find hands in a given image, optionally plotting the detections. -
findPosition
Method: Find the keypoint locations (landmarks) of individual hands and store them in a list in pixel format, and calculate the bounding box information of the hand at the same time. -
fingersUp
Method: Determine how many fingers are open in the gesture, and return the result as a list. -
findDistance
Method: Computes the distance between two specified keypoints and plots the result on the image. -
handType
Method: Determine whether the hand type is left-handed or right-handed.
-
I won't go into details
This function is available in a special package called cvzone, but I don’t know if it’s a version problem, something is missing, and it can’t run, so I can only check the module by hand.
Below is the code of the main function
import cv2
from cvzone.HandTrackingModule import HandDetector
from HandTrackingModule import *
from time import sleep
import numpy as np
import cvzone
from pynput.keyboard import Controller
cap = cv2.VideoCapture(0)
cap.set(3, 1280)
cap.set(4, 720)
detector =HandDetector(detectionCon=0.5)
keys = [["Q", "W", "E", "R", "T", "Y", "U", "I", "O", "P"],
["A", "S", "D", "F", "G", "H", "J", "K", "L", ";"],
["Z", "X", "C", "V", "B", "N", "M", ",", ".", "/"]]
finalText = ""
keyboard = Controller()
def drawAll(img, buttonList):
for button in buttonList:
x, y = button.pos
w, h = button.size
cvzone.cornerRect(img, (button.pos[0], button.pos[1], button.size[0], button.size[1]),
20, rt=0)
cv2.rectangle(img, button.pos, (x + w, y + h), (255, 0, 255), cv2.FILLED)
cv2.putText(img, button.text, (x + 20, y + 65),
cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
return img
#
# def drawAll(img, buttonList):
# imgNew = np.zeros_like(img, np.uint8)
# for button in buttonList:
# x, y = button.pos
# cvzone.cornerRect(imgNew, (button.pos[0], button.pZXos[1], button.size[0], button.size[1]),
# 20, rt=0)
# cv2.rectangle(imgNew, button.pos, (x + button.size[0], y + button.size[1]),
# (255, 0, 255), cv2.FILLED)
# cv2.putText(imgNew, button.text, (x + 40, y + 60),
# cv2.FONT_HERSHEY_PLAIN, 2, (255, 255, 255), 3)
#
# out = img.copy()
# alpha = 0.5
# mask = imgNew.astype(bool)
# print(mask.shape)
# out[mask] = cv2.addWeighted(img, alpha, imgNew, 1 - alpha, 0)[mask]
# return out
class Button():
def __init__(self, pos, text, size=[85, 85]):
self.pos = pos
self.size = size
self.text = text
buttonList = []
for i in range(len(keys)):
for j, key in enumerate(keys[i]):
buttonList.append(Button([100 * j + 50, 100 * i + 50], key))
while True:
success, img = cap.read()
img = detector.findHands(img)
lmList, bboxInfo = detector.findPosition(img)
img = drawAll(img, buttonList)
if lmList:
for button in buttonList:
x, y = button.pos
w, h = button.size
if x < lmList[8][0] < x + w and y < lmList[8][1] < y + h:
cv2.rectangle(img, (x - 5, y - 5), (x + w + 5, y + h + 5), (175, 0, 175), cv2.FILLED)
cv2.putText(img, button.text, (x + 20, y + 65),
cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
l, _, _ = detector.findDistance(8, 12, img, draw=False)
print(l)
## when clicked
if l < 30:
keyboard.press(button.text)
cv2.rectangle(img, button.pos, (x + w, y + h), (0, 255, 0), cv2.FILLED)
cv2.putText(img, button.text, (x + 20, y + 65),
cv2.FONT_HERSHEY_PLAIN, 4, (255, 255, 255), 4)
finalText += button.text
sleep(0.15)
cv2.rectangle(img, (50, 350), (700, 450), (175, 0, 175), cv2.FILLED)
cv2.putText(img, finalText, (60, 430),
cv2.FONT_HERSHEY_PLAIN, 5, (255, 255, 255), 5)
cv2.imshow("Image", img)
cv2.waitKey(1)
-
Import library: import the required library, including OpenCV (
cv2
) for image processing and display, HandDetector in Mediapipe for hand detection,cvzone
for drawing button appearance,numpy
for array processing,pynput.keyboard
Controller in Mediapipe for simulating keyboard input,time
Used for delay. -
Set camera parameters: Set the resolution of the camera to 1280x720 through OpenCV.
-
Create a HandDetector instance: use the HandDetector class to create a gesture detector instance, and set the detection confidence threshold to 0.5.
-
Create button list: Create a list containing the information of virtual keyboard buttons, and the key layout is
keys
defined by nested lists. -
Create
Button
class: the class used to create virtual buttons, each button contains position, text content and size. -
Main loop: Enter an infinite loop for processing real-time camera capture image frames.
-
Read Image Frames: Capture image frames from the camera.
-
Hand Detection: Use a gesture detector to find hands and keypoints in an image.
-
Draw Button: Call
drawAll
the function to draw a virtual button on the image. -
Iterate through the list of buttons: Each button is checked to see if a finger is touching the button.
-
If the finger is within the range of the button, draw a highlight effect.
-
Calculate the distance between the finger touch point and the center of the button, if it is less than a certain threshold, simulate a keyboard press and record the input.
-
-
Draw Typed Text: Draws the typed text on the image.
-
Display image: Display the processed image through OpenCV.
-
Wait for keyboard input: Waits for 1 millisecond to keep the image window responsive.
-
-
Run the main program: execute the main loop to handle real-time camera capture and gesture recognition.
If you encounter any problems, you can leave a message in the comment area, and everyone can learn from each other!