AI Project 5: Seal Action Recognition

If the article is an original article, please indicate the source of the original article when reprinting it.

Thanks to Mr. Enpei for fully implementing the project and making the code open source for everyone to communicate and learn from.

Mr. Enpei’s open source address, if you are interested, you can reproduce it. GitHub - enpeizhao/CVprojects: computer vision projects | Fun AI projects related to computer vision (Python, C++)

1. Introduction

From the small project "Naruto Seal Recognition" I saw on Enpei's git, I found that the open source code of the boss requires a GPU. My computer does not have a GPU, and the environment has been installed, but it has not been reproduced successfully, so I used yolov5 as a trick. way to achieve the same function.

Yolov5 installation and training were mentioned earlier. If you are not familiar with yolov5 training, you can read the previous article.

2. Training set

Prepare materials, there are seven types of gestures for training, collect 30 pictures for each type, and use lableImg to label all pictures.

label label

3. Training

The training uses AutoDL cloud training, there are not many samples, the training takes about 20-30 minutes, and a 3090 graphics card is used.

One thing to note here is that the pt files trained directly using the image provided by AutoDL have been unable to be recognized and detected, so I switched back to the yolov5-5 version and the training was normal.

The effect is okay, but if you want to reproduce it, you have to train yourself.

4. Code

1. Chinese PNG images needed to generate subtitles

The PNG image is to display the recognized words. It is not easy to display the center using OPENCV, so prepare the PNG in advance and directly overlay the PNG image in the video.

'''
生成字幕需要的中文PNG图片
'''

from PIL import Image,ImageDraw,ImageFont

def generate(name='你好',color_label='green'):

    filename = './png_label/'+name+'.png'
   
    # 背景
    bg = Image.new("RGBA",(400,100),(0,0,0,0))
    # 添加文字
    d = ImageDraw.Draw(bg)
    font  = ImageFont.truetype('./fonts/MSYH.ttc',80,encoding="utf-8")

    if color_label == 'green':
        color = (0,255,0,255)
    else:
        color = (255,0,0,255)
        

    d.text((0,0),name,font=font,fill=color)
    # 保存
    bg.save(filename)

    print('ok: '+ name)

        
generate('火遁豪火球之术','red')

After running, a PNG image will be generated.

2. Complete code

import ctypes
import cv2
import numpy as np
import time

# 多进程
from multiprocessing import Process, Value
import threading

import torch
import sys
from playsound import playsound


class Ai_tello:
    def __init__(self):
        # ************************************ 绘制 相关 *********************************
        self.png_dict = {}
        # 获取
        self.getPngList()

        # 加载 yolov5模型
        self.model = torch.hub.load('./yolov5', 'custom', './weights/pose.pt',source='local') 
        # 置信度阈值
        self.model.conf = 0.5
        print('self.model.conf = 0.5')

        self.take_off_time = None
        # 结印动作顺序
        self.yolo_action_seq = ['ani_1', 'ani_2','ani_3', 'ani_4', 'ani_5', 'ani_6', 'ani_7']
        # 状态机,1表示当前动作已做完(击中)
        self.yolo_action_status = [0, 0, 0, 0, 0, 0, 0]

    def getPngList(self):
        '''
        读取PNG图片,追加进png_dict
        '''
        palm_action = {'ani_1': '巳', 'ani_2': '未', 'ani_3': '申',
            'ani_4': '亥', 'ani_5': '午', 'ani_6': '寅', 'ani_7':'火遁豪火球之术'}

        for name in palm_action.values():
            filename = './png_label/'+name+'.png'
            png_img = self.readPngFile(filename, 0.9)
            self.png_dict[name] = png_img

        print('PNG文字标签加载完毕')
        
    def playVoice(self, fileName,mode):
        """
        播放音乐
        """
        playsound(fileName)
        
    def backPlay(self,fileName):
        """
        后台播放
        """
        t = threading.Thread(target=self.playVoice, args=(fileName,'voice'))
        t.start()

    def readPngFile(self, fileName, scale=0.5):
        '''
        读取PNG图片
        '''
        # 解决中文路径问题
        png_img = cv2.imdecode(np.fromfile(fileName, dtype=np.uint8), -1)
        # 转为BGR,变成3通道
        png_img = cv2.cvtColor(png_img, cv2.COLOR_RGB2BGR)
        png_img = cv2.resize(png_img, (0, 0), fx=scale, fy=scale)
        return png_img

    def addOverylay(self, frame, overlay, l, t):
        '''
        添加标签png覆盖
        '''
        # 解决l、t超界
        l = max(l, 0)
        t = max(t, 0)
        # 覆盖显示
        overlay_h, overlay_w = overlay.shape[:2]
        # 覆盖范围
        overlay_l, overlay_t = l, t
        overlay_r, overlay_b = (l + overlay_w), (overlay_t+overlay_h)
        # 遮罩
        overlay_copy = cv2.addWeighted(
            frame[overlay_t:overlay_b, overlay_l:overlay_r], 1, overlay, 20, 0)
            
        frame[overlay_t:overlay_b, overlay_l:overlay_r] = overlay_copy

    
    def cameraProcess(self):
        '''
        视频流处理:动作识别、绘制等
        '''
        print('cameraProcess');
        cap = cv2.VideoCapture(0)
        # 动作
        palm_action = {'ani_1':'巳','ani_2':'未','ani_3':'申','ani_4':'亥','ani_5':'午','ani_6':'寅','ani_7':'火遁豪火球之术'}
    
        triger_time = time.time()
    
        while True:
           
            # 读取视频帧
            ret,frame = cap.read()
            if frame is None:
                continue;

            frame = cv2.flip(frame, 1)

            
            # 转为RGB
            img_cvt = cv2.cvtColor(frame,cv2.COLOR_BGR2RGB)
         
            if self.take_off_time != None:  
                if time.time() - triger_time >= 2:
                    label_zh = palm_action['ani_7']
                    overlay = self.png_dict[label_zh]
                    self.addOverylay(frame,overlay,l,200)
            else:
                # 目标检测推理
                results = self.model(img_cvt)
                results_arr = results.pandas().xyxy[0].to_numpy()

                # 解析目标检测结果
                for item in results_arr:

                    # 标签ID
                    ret_label_id = item[-2]
                    # 标签名称
                    ret_label_text = item[-1]
                    # 置信度
                    ret_conf = item[-3]
                

                    # ani_1,ani_2....ani_6
                    # 结印动作,且置信度要求高一些
                    if 'ani_' in  ret_label_text and ret_conf >= 0.7:

                        l,t,r,b = item[:4].astype('int')
                        # 绘制
                        cv2.rectangle(frame,(l,t),(r,b),(0,255,20),2)
                        # 绘制动作中文png                        
                        label_zh = palm_action[ret_label_text]
                        print(ret_label_text)
                        # 拿到对应中文文字的数组图片
                        overlay = self.png_dict[label_zh]
                        # 覆盖绘制
                        self.addOverylay(frame,overlay,l,t-100)
                        cv2.putText(frame,'{}%'.format(round(ret_conf*100,2)),(l+80,t-20),cv2.FONT_ITALIC,1.5,(255,0,255),2)


                        # 状态机列表中第一个0的索引
                        first_0_index = next(i for i,x in enumerate(self.yolo_action_status) if x == 0 )
                        # 对应动作名 ['ani_1', 'ani_2','ani_3', 'ani_4', 'ani_5', 'ani_6']
                        check_action_name = self.yolo_action_seq[first_0_index]

                        # 动作匹配
                        if ret_label_text == check_action_name:
                            # 赋值1
                            self.yolo_action_status[first_0_index] = 1
                            # 检查是否完毕
                            if self.yolo_action_status == [1,1,1,1,1,1,1]:

                                self.take_off_time = time.time()
                                print('动作全部匹配完')
                                
                                self.backPlay('火遁豪火球.mp3')
                                
                                # 计时
                                triger_time = time.time()

                            else:
                                print('击中一个动作,当前列表为'+str(self.yolo_action_status))
                        else:
                            print('未击中动作,当前列表为'+str(self.yolo_action_status))
                
            cv2.imshow('demo', frame)
            if cv2.waitKey(10) & 0xFF == ord('q'):
                break
            
        cv2.destroyAllWindows()


if __name__ == '__main__':

    # 实例化
    ai_tello = Ai_tello()
    ai_tello.cameraProcess()
    

There are a few things to note about the code:

1. Load the yolov5 model

When loading the yolov5 model, an error may occur. The problem encountered is that the versions are different. The yolov5-5 version was used for training, and the loaded file is not the same, and an error occurs.

2. Identification process

To identify the state machine method used, define the array [0, 0, 0, 0, 0, 0, 0] in advance with a total of 7 elements.

When the corresponding action is detected, it is set to 1. When the array is all 1, the sound is played.

5. Effect Demonstration

"The Technique of Fireball in Seal Recognition"_bilibili_bilibili

If there is any infringement or you need the complete code, please contact the blogger in time.

Guess you like

Origin blog.csdn.net/weixin_38807927/article/details/132790691