Gesture recognition based on opencv-mediapipe

The previous article introduced gesture recognition based on opencv. If you run my code, you will find that the effect of finding the outline of the hand in the code is not ideal. At that time, I was looking for a solution on the Internet, and I just found the mediapip library, and then I used the two libraries of opencv and mediapipe to rewrite the code for gesture recognition. The effect is not bad, write an article to record it.

1. Introduction to mediapipe

Mediapipe is an open source project of Google, which can provide open source, cross-platform common machine learning (machine learning) solutions. Mediapipe is actually a tool library of integrated machine learning vision algorithms, including various models such as face detection, face key points, gesture recognition, avatar segmentation and pose recognition.

Since I mainly do gesture recognition, I will briefly explain the hand detection module of the library, so that everyone can better understand the source code of the final gesture recognition. (If you want to know about other modules, you can click here to learn more: mediapipe official introduction

First, initialize the hand detection module

mphand = mp.solutions.hands
hands = mphand.Hands()

mpHand.Hands parameter	Detailed parameters
static_image_mode=False	If set to False, reduces latency, ideal for processing video frames. If set to True, suitable for processing a batch of potentially irrelevant static images. Defaults to False.
max_num_hands=2	The maximum lot size to detect. Default is 2
model_complexity=1	Complexity of hand landmark model: 0 or 1. Landmark accuracy and inference latency generally increase with model complexity. Default is 1
min_detection_confidence=0.5	The minimum confidence value in the hand detection model for a detection to be considered successful. Default is 0.5
min_tracking_confidence=0.5	The minimum confidence value in the landmark tracking model, indicating that the hand landmark is considered successfully tracked, otherwise hand detection is automatically invoked on the next input image. Setting it to a higher value can improve the robustness of the solution at the cost of higher latency. Ignored if static_image_mode is True, where hand detection is simply run on each image. Default is 0.5

If you don't quite understand the meaning of the above parameters, it doesn't matter, just default it. I think as long as you know the meaning of the two parameters max_num_hands and min_detection_confidence

Then there is the gesture detection process

hand = hands.process(img)

The parameter img is the picture to be detected. Since the picture read by opencv is in BGR mode, the img here must be converted to RGB mode.

Finally, read the test results

 finger = []
    for handlms in hand.multi_hand_landmarks:
        for lm in handlms.landmark:
        	img_height,img_width,_ =  img.shape
        	#这里检测返回的x,y的坐标是基于原图像大小的比例坐标，
        	#所以要与原图像相乘得到真实的坐标
            x, y = int(lm.x * img_width), int(lm.y * img_height)
            finger.append([x, y])

If you print the finger list, you will see 21 coordinate values. These 21 coordinates are midiapipe's detection based on the position coordinates of 21 key points of the human hand on the original image. The key points of hand detection corresponding to these 21 coordinate values are as follows:

insert image description here

2. Gesture recognition ideas

Everyone has already understood the basic usage of mediapipe, now explain the ideas and key skills

train of thought

First use opencv to call the camera, read the image from the camera, reverse the image (the image read by the camera is opposite to the reality), and convert the image to RGB mode. Then use mediapipe for hand detection and store it in a list. The angle formed by the fingers is then used to determine whether the fingers are bent.

Key points explained

The whole program seems to be no difficulty, the main key is how to detect whether the finger is bent.

I read in the book that the fingertip is subtracted from the root of the finger, and the positive and negative results of the observation are used to judge the bending of the finger. For example, in the above picture, I subtract the y value of 6 coordinate points from the y value of 8 coordinate points (if the y value is not understood here, click here ), if the result is positive, it means that the finger is bent, and if it is negative, it means Fingers stretched.
But there are two problems here :
first , the bending of the thumb is different from the bending of the other four fingers (here you can stretch out your hand to observe it, you can understand), for example, I use 4 coordinates in the above picture Subtract the y value from the 1 coordinate (here the y value is the same as above), but not everyone’s y value of the 4 coordinate must be greater than the y value of the 1 coordinate when everyone’s thumb is bent. At this time, it is impossible to judge whether the thumb is bent.
For this problem, I once wanted to set a threshold to solve it, but I need to constantly debug the size of this threshold, and everyone's finger size is different. This threshold may be suitable for you, but not for others.
Second , I don’t know if you found out that the above examples are all based on the situation where the fingertips of the hands are facing up, if the fingertips of the hands are facing down, left, and right. What to do in these situations? Is it necessary to write more judgment conditions and analyze them one by one. This is a huge amount of code, so I don't recommend it very much.

Introduce angles to solve problems
Introduce trigonometric functions and use the angles formed by fingers to judge
For example, here I use coordinates 5, 6, and 7 as vertices to form a triangle, and here I
use coordinates 5 and 6 as the sides. a,
the side formed by coordinates 6 and 7 is b,
the side formed by coordinates 5 and 7 is c,
and then use the cosine function to calculate the angle of vertex 6, the formula is:
$a^ 2+b^2-c^2/2ab$
Then set a threshold of 155 to judge whether the finger is bent.

3. Complete code

Here I have commented the code to help you read the program

import cv2
import numpy
import mediapipe as mp
import math

#创建手部检测的对象
mphand = mp.solutions.hands
hands = mphand.Hands()
mpdraw = mp.solutions.drawing_utils

cap = cv2.VideoCapture(0)

while cap.isOpened():
	#对读取的图像进行反转，转换为RGB模式，读出图片大小
    ret,frame = cap.read()
    frame = cv2.flip(frame,1)
    img =cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
     h,w,_ = frame.shape
     
     #进行手部的检测
    hand = hands.process(img)
    
   #判断是否检测到了手部
    if hand.multi_hand_landmarks:
        count = 0	#统计手指的伸出个数
        for handlms in hand.multi_hand_landmarks:
            finger = []
            finger_point = []
            
            #这里我将每一根手指的四个坐标整合到一个列表中，大家可以打印finger，进行详细的查看
            for id,lm in enumerate(handlms.landmark):
                x,y = int(lm.x*w),int(lm.y*h)
                if id == 0:
                    pass
                elif id % 4 == 0:
                    finger_point.append([x,y])
                    finger.append(finger_point)
                    finger_point = []
                else:
                    finger_point.append([x,y])

			#遍历每一根手指列表，计算其构成的三角形的三边长，这里使用2，6，10，14，18所对应的角进行判断
            for id,point in enumerate(finger):
                a = math.hypot((point[0][0]-point[1][0]),(point[0][1]-point[1][1]))
                b = math.hypot((point[1][0]-point[2][0]),(point[1][1]-point[2][1]))
                c = math.hypot((point[0][0]-point[2][0]),(point[0][1]-point[2][1]))

              #在计算value值的时候，除数可能为零，以及当三点在一点直线上，都会抛出异常，所以要捕获异常
                try :   
                	value = (a**2+b**2-c**2)/(2*a*b)
                	#这里的value为弧度制，乘上57转换为角度制，当然你也可以选择直接使用弧度制
                    angle = math.acos(value)*57
                except ValueError:
                    angle = 180
                except ZeroDivisionError:
                    angle = 0
                print(angle)   
				
				#当角度大于155的时候记为手指伸出
                if angle >= 155:
                    count += 1
                else:
                    pass
                    
            #在手部绘制关键点位置
            mpdraw.draw_landmarks(frame,handlms,mphand.HAND_CONNECTIONS)
		
		#将手指检测的结果显示在图像上
        cv2.putText(frame,str(count),(int((1/9)*w),int((1/9)*h)),cv2.FONT_HERSHEY_COMPLEX,1,(0,255,0),1)
     
     #展示图片   
    cv2.imshow('img',frame)
    
    #按下Esc退出循环
    c = cv2.waitKey(25)
    if c == 27:
        break
cap.release()
cv2.destroyAllWindows()

4. Ending

Use the mediapipe library to detect hands, which allows us to do many interesting things through gestures, such as controlling the computer mouse through gestures, drawing with gestures, etc. I am also trying to do these, and I will share with you my ideas and difficulties encountered after I finish.
The final creation is not easy, I hope everyone will support and like it. I look forward to sharing and learning with you and making progress together.
insert image description here