Article directory

update diary
foreword
Preparation
Recognize hand model
Identify video input method
gesture recognition method
full code
epilogue

update diary

Update diary:
2022.04.18: In response to the needs of netizens, the hand recognition model code has been updated in the mp library. Now it can run normally! !

foreword

The wave of artificial intelligence is sweeping the world. A concept that has been talked about for decades, and the related technology has been developing faster and faster in recent years. Terms such as machine learning, deep learning, and computer vision have gradually entered people's lives, and they all belong to the category of artificial intelligence.

Computer vision is a branch of the field of artificial intelligence. Computer vision is actually an interdisciplinary subject, including computer science, mathematics, engineering, physics, biology, and psychology. Many scientists believe that computer vision opens the way for the development of artificial intelligence.

To put it simply, computer vision is to give the computer a pair of eyes to observe the world, and then use the computer's excellent brain to quickly calculate and serve human beings.

Today we will explain in simple terms, briefly introduce the gesture recognition method in Python computer vision, recognize gestures - numbers (one, two, three, four, five and thumbs up). If you like this article or it is helpful to you, don't forget to like + pay attention!

Preparation

In this article, we will use Python's OpenCV module and hand model module mediapipe. In Python's pip installation method, the installation method is as follows:

opencv is a commonly used image recognition module

Mediapipe is a multimedia machine learning model application framework developed and open sourced by Google.

pip install opencv-python
pip install mediapipe

If your computer is equipped with Anaconda, it is recommended to install the corresponding modules in the Anaconda environment command line to build a more specific machine learning environment

After you have installed the OpenCV and mediapipe modules, you can write in the Python code

import cv2
import mediapipe as mp

If the operation is successful, then your Opencv-python module is installed successfully, so let's start today's topic now!

Recognize hand model

Since gesture recognition is to be done, it is necessary to find the hand information of our incoming image. Here we will use the mediapipe model to find the hand model, complete the recognition module of the hand model, and name it, and we will introduce it as a module in the subsequent gesture recognition content

HandTrackingModule.py

# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
MOTTO >>> STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.

@ By: CaixyPromise
@ Date: 2021-10-17

"""

import cv2
import mediapipe as mp

class HandDetector:
    """
    使用mediapipe库查找手。导出地标像素格式。添加了额外的功能。
    如查找方式，许多手指向上或两个手指之间的距离。而且提供找到的手的边界框信息。
    """
    def __init__(self, mode=False, maxHands=2, detectionCon=0.5, minTrackCon = 0.5):
        """
        :param mode: 在静态模式下，对每个图像进行检测
        :param maxHands: 要检测的最大手数
        :param detectionCon: 最小检测置信度
        :param minTrackCon: 最小跟踪置信度
        """
        self.mode = mode
        self.maxHands = maxHands
        self.modelComplex = False
        self.detectionCon = detectionCon
        self.minTrackCon = minTrackCon

		# 初始化手部识别模型
        self.mpHands = mp.solutions.hands
        self.hands = self.mpHands.Hands(self.mode, self.maxHands, self.modelComplex,
                                        self.detectionCon, self.minTrackCon)
        self.mpDraw = mp.solutions.drawing_utils	# 初始化绘图器
        self.tipIds = [4, 8, 12, 16, 20]			# 指尖列表
        self.fingers = []
        self.lmList = []

    def findHands(self, img, draw=True):
        """
        从图像(BRG)中找到手部。
        :param img: 用于查找手的图像。
        :param draw: 在图像上绘制输出的标志。
        :return: 带或不带图形的图像
        """
        imgRGB = cv2.cvtColor(img, cv2.COLOR_BGR2RGB) # 将传入的图像由BGR模式转标准的Opencv模式——RGB模式，
        self.results = self.hands.process(imgRGB)

        if self.results.multi_hand_landmarks:
            for handLms in self.results.multi_hand_landmarks:
                if draw:
                    self.mpDraw.draw_landmarks(img, handLms,
                                               self.mpHands.HAND_CONNECTIONS)
        return img

    def findPosition(self, img, handNo=0, draw=True):
        """
        查找单手的地标并将其放入列表中像素格式。还可以返回手部周围的边界框。
        :param img: 要查找的主图像
        :param handNo: 如果检测到多只手，则为手部id
        :param draw: 在图像上绘制输出的标志。(默认绘制矩形框)
        :return: 像素格式的手部关节位置列表；手部边界框
        """

        xList = []
        yList = []
        bbox = []
        bboxInfo =[]
        self.lmList = []
        if self.results.multi_hand_landmarks:
            myHand = self.results.multi_hand_landmarks[handNo]
            for id, lm in enumerate(myHand.landmark):
                h, w, c = img.shape
                px, py = int(lm.x * w), int(lm.y * h)
                xList.append(px)
                yList.append(py)
                self.lmList.append([px, py])
                if draw:
                    cv2.circle(img, (px, py), 5, (255, 0, 255), cv2.FILLED)
            xmin, xmax = min(xList), max(xList)
            ymin, ymax = min(yList), max(yList)
            boxW, boxH = xmax - xmin, ymax - ymin
            bbox = xmin, ymin, boxW, boxH
            cx, cy = bbox[0] + (bbox[2] // 2), \
                     bbox[1] + (bbox[3] // 2)
            bboxInfo = {
    
    "id": id, "bbox": bbox,"center": (cx, cy)}

            if draw:
                cv2.rectangle(img, (bbox[0] - 20, bbox[1] - 20),
                              (bbox[0] + bbox[2] + 20, bbox[1] + bbox[3] + 20),
                              (0, 255, 0), 2)

        return self.lmList, bboxInfo

    def fingersUp(self):
        """
        查找列表中打开并返回的手指数。会分别考虑左手和右手
        ：return：竖起手指的列表
        """
        if self.results.multi_hand_landmarks:
            myHandType = self.handType()
            fingers = []
            # Thumb
            if myHandType == "Right":
                if self.lmList[self.tipIds[0]][0] > self.lmList[self.tipIds[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            else:
                if self.lmList[self.tipIds[0]][0] < self.lmList[self.tipIds[0] - 1][0]:
                    fingers.append(1)
                else:
                    fingers.append(0)
            # 4 Fingers
            for id in range(1, 5):
                if self.lmList[self.tipIds[id]][1] < self.lmList[self.tipIds[id] - 2][1]:
                    fingers.append(1)
                else:
                    fingers.append(0)
        return fingers

    def handType(self):
        """
        检查传入的手部是左还是右
        ：return: "Right" 或 "Left"
        """
        if self.results.multi_hand_landmarks:
            if self.lmList[17][0] < self.lmList[5][0]:
                return "Right"
            else:
                return "Left"

Identify video input method

After completing the acquisition and recognition of the hand model, now we need to transfer the content to the computer so that it can perform hand recognition and gesture recognition. Here we will use OpenCV to input the content, turn on the computer's camera to obtain the content, and use the HandTrackingModule module we just wrote as the hand recognition module.

Main.py


# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
MOTTO >>> STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.

@ By: CaixyPromise
@ Date: 2021-10-17

"""
import cv2
from HandTrackingModule import HandDetector

class Main:
    def __init__(self):
        self.camera = cv2.VideoCapture(0,cv2.CAP_DSHOW) # 以视频流传入
        self.camera.set(3, 1280) # 设置分辨率
        self.camera.set(4, 720)
        
    def Gesture_recognition(self):
        while True:
            self.detector = HandDetector()
            frame, img = self.camera.read()
            img = self.detector.findHands(img) # 找到你的手部
            lmList, bbox = self.detector.findPosition(img) # 获取你手部的方位
             
            cv2.imshow("camera", img)
            if cv2.getWindowProperty('camera', cv2.WND_PROP_VISIBLE) < 1:
                break
            # 通过关闭按钮退出程序
            cv2.waitKey(1)   
            # if cv2.waitKey(1) & 0xFF == ord("q"):
            #     break # 按下q退出

Now, when we run the program, the program will run your computer's default camera. When you show your hand, an image will be sent out to enclose your hand, and the main joint points of your hand will be drawn.

Among them, the main joint points of your hand have been marked with serial numbers. There are 21 joint points in your hand , and the fingertips are 4 8 12 16 20

The specific joints are divided into:

insert image description here

gesture recognition method

Through the previous explanation, we have completed the hand acquisition and recognition, and the input of the recognition content, so let's start writing our gesture recognition method now. Here, we use the fingersUp() method in the recognition module.

Find the Main.py file we just wrote (recognition content input method), when we find and draw our hand position, the findPosition() method at this time will get the specific position of your hand, where lmList is the joint position Orientation (type: list) , bbox is border orientation (type: dict) , both are empty when no content is recognized . Therefore, we only need to write that when there is data ( non-empty ) in the array, we can judge the finger, then we can write as


# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
MOTTO >>> STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.

@ By: CaixyPromise
@ Date: 2021-10-17

"""
def Gesture_recognition(self):
    while True:
        self.detector = HandDetector()
        frame, img = self.camera.read()
        img = self.detector.findHands(img)
        lmList, bbox = self.detector.findPosition(img)
        
        if lmList:
          x1, x2, x3, x4, x5 = self.detector.fingersUp()

As mentioned above in the fingersUp() method, the fingersUp() method will return an array with a length of 5 counting from the thumb, the raised finger is marked as 1, and the lowered mark is 0.

Our purpose this time is to write a gesture that recognizes common digital gestures in our lives and a gesture that praises the thumb. Combined with our life, recognizing your gestures can be written as

# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
MOTTO >>> STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.

@ By: CaixyPromise
@ Date: 2021-10-17

"""
def Gesture_recognition(self):
    while True:
        self.detector = HandDetector()
        frame, img = self.camera.read()
        img = self.detector.findHands(img)
        lmList, bbox = self.detector.findPosition(img)
        
        if lmList:
          x1, x2, x3, x4, x5 = self.detector.fingersUp()
          if (x2 == 1 and x3 == 1) and (x4 == 0 and x5 == 0 and x1 == 0):
            # TWO
          elif (x2 == 1 and x3 == 1 and x4 == 1) and (x1 == 0 and x5 == 0):
            # THREE
          elif (x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1) and (x1 == 0):
            # FOUR
          elif x1 == 1 and x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1:
            # FIVE
          elif x2 == 1 and (x1 == 0, x3 == 0, x4 == 0, x5 == 0):
            # ONE
          elif x1 and (x2 == 0, x3 == 0, x4 == 0, x5 == 0):
            # NICE_GOOD

After completing the basic recognition, we need to output the content expression. Here we combine the hand box orientation returned by bbox, and then use the putText method in opencv to realize the output of the recognition result.


# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
MOTTO >>> STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.

@ By: CaixyPromise
@ Date: 2021-10-17

"""
def Gesture_recognition(self):
    while True:
        self.detector = HandDetector()
        frame, img = self.camera.read()
        img = self.detector.findHands(img)
        lmList, bbox = self.detector.findPosition(img)

        if lmList:
            x_1, y_1 = bbox["bbox"][0], bbox["bbox"][1]
            x1, x2, x3, x4, x5 = self.detector.fingersUp()

            if (x2 == 1 and x3 == 1) and (x4 == 0 and x5 == 0 and x1 == 0):
                cv2.putText(img, "2_TWO", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
            elif (x2 == 1 and x3 == 1 and x4 == 1) and (x1 == 0 and x5 == 0):
                cv2.putText(img, "3_THREE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
            elif (x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1) and (x1 == 0):
                cv2.putText(img, "4_FOUR", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
            elif x1 == 1 and x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1:
                cv2.putText(img, "5_FIVE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
            elif x2 == 1 and (x1 == 0, x3 == 0, x4 == 0, x5 == 0):
                cv2.putText(img, "1_ONE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
            elif x1 and (x2 == 0, x3 == 0, x4 == 0, x5 == 0):
                cv2.putText(img, "GOOD!", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                            (0, 0, 255), 3)
        cv2.imshow("camera", img)
        if cv2.getWindowProperty('camera', cv2.WND_PROP_VISIBLE) < 1:
            break
        cv2.waitKey(1)

Now that we have completed gesture recognition and result output, we can verify the effect of our code by running the complete code.

full code

The complete code is as follows


# -*- coding:utf-8 -*-

"""

CODE >>> SINCE IN CAIXYPROMISE.
STRIVE FOR EXCELLENT.
CONSTANTLY STRIVING FOR SELF-IMPROVEMENT.
@ by: caixy
@ date: 2021-10-1

"""

import cv2
from HandTrackingModule import HandDetector

class Main:
    def __init__(self):
        self.camera = cv2.VideoCapture(0,cv2.CAP_DSHOW)
        self.camera.set(3, 1280)
        self.camera.set(4, 720)

    def Gesture_recognition(self):
        while True:
            self.detector = HandDetector()
            frame, img = self.camera.read()
            img = self.detector.findHands(img)
            lmList, bbox = self.detector.findPosition(img)

            if lmList:
                x_1, y_1 = bbox["bbox"][0], bbox["bbox"][1]
                x1, x2, x3, x4, x5 = self.detector.fingersUp()

                if (x2 == 1 and x3 == 1) and (x4 == 0 and x5 == 0 and x1 == 0):
                    cv2.putText(img, "2_TWO", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
                elif (x2 == 1 and x3 == 1 and x4 == 1) and (x1 == 0 and x5 == 0):
                    cv2.putText(img, "3_THREE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
                elif (x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1) and (x1 == 0):
                    cv2.putText(img, "4_FOUR", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
                elif x1 == 1 and x2 == 1 and x3 == 1 and x4 == 1 and x5 == 1:
                    cv2.putText(img, "5_FIVE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
                elif x2 == 1 and (x1 == 0, x3 == 0, x4 == 0, x5 == 0):
                    cv2.putText(img, "1_ONE", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
                elif x1 and (x2 == 0, x3 == 0, x4 == 0, x5 == 0):
                    cv2.putText(img, "GOOD!", (x_1, y_1), cv2.FONT_HERSHEY_PLAIN, 3,
                                (0, 0, 255), 3)
            cv2.imshow("camera", img)
            if cv2.getWindowProperty('camera', cv2.WND_PROP_VISIBLE) < 1:
                break
            cv2.waitKey(1)
            # if cv2.waitKey(1) & 0xFF == ord("q"):
            #     break

if __name__ == '__main__':
    Solution = Main()
    Solution.Gesture_recognition()

The effect is clear at a glance, the computer has successfully recognized your gesture and output the content. Go and try it!

epilogue

The gesture recognition content of this computer vision is finished. This is also my first tweet about artificial intelligence computer vision. We will continue to output articles about artificial intelligence in the future. If there are any mistakes or errors in this writing If you have any doubts, I hope you can point them out in the comment area, let us make progress together and learn together.

It’s not easy to create, if you think this article is useful to you, don’t forget to like and watch + follow!

In the next article, we will introduce the advanced gesture recognition - the recognition of dynamic gestures. We will post the article as soon as possible on the WeChat official account: "01 Programming Cabin". Don't forget to pay attention to our official account so as not to miss it. !