Android realizes hand detection and gesture recognition (can run in real time, including Android source code)

Android realizes hand detection and gesture recognition (can run in real time, including Android source code)

Table of contents

Android realizes hand detection and gesture recognition (can run in real time, including Android source code)

1 Introduction

2. Method of Gesture Recognition

(1) Gesture recognition method based on multi-target detection

(2) Recognition method based on hand detection + gesture classification

3. Description of Gesture Recognition Dataset

(1) HaGRID gesture recognition dataset

(2) Custom data set

4. Gesture recognition training based on YOLOv5

5. Android deployment of gesture recognition model

(1) Convert the Pytorch model to the ONNX model

(2) Convert ONNX model to TNN model

(3) Deploy the gesture recognition model on the Android side

(4) Some abnormal error solutions

6. Gesture recognition test results

7. Project source code download


1 Introduction

This blog is the Android sequel of " YOLOv5-Based Gesture Recognition System (Including Gesture Recognition Dataset + Training Code) ", which mainly shares the transplantation of Python-trained YOLOv5 gesture recognition model to the Android platform. We will develop a simple gesture recognition Android Demo. Demo supports 18 common gesture recognitions such as one, two, ok, etc. You can also customize the type of gesture recognition trained according to business needs.

Considering that the original YOLOv5 model has a relatively large amount of calculation, I have developed a very lightweight gesture recognition model yolov5s05 based on YOLOv5s. From the effect point of view, the performance of Android gesture recognition Demo is still top-notch, the average precision mAP_0.5=0.99421, mAP_0.5:0.95=0.82706. The APP can achieve real-time gesture recognition effect on ordinary Android mobile phones. The CPU (4 threads) takes about 30ms and the GPU takes about 25ms, which basically meets the performance requirements of the business.

First show the Android Demo effect:

[Android APP experience] https://download.csdn.net/download/guyuealian/86666991

[Android source code download]  Android realizes hand detection and gesture recognition

[Respect originality, please indicate the source for reprinting] https://blog.csdn.net/guyuealian/article/details/126994546


2. Method of Gesture Recognition

(1) Gesture recognition method based on multi-target detection

The gesture recognition method based on multi-target detection can be done in one step, and the gesture category is directly regarded as the category of multiple target detection for training.

  1. This solution adopts the one-stage method, direct end-to-end training, the task is simple and the speed is fast;
  2. Adding new categories or data requires manual drawing and labeling gestures, which is costly
  3. The number of samples of different gesture categories that need to be balanced
  4. easy to deploy

(2) Recognition method based on hand detection + gesture classification

In this method, a general hand detection model is first trained (no gestures are distinguished, only hand frames are detected), then the hand area is cropped, and a gesture classifier is trained to complete the classification and recognition of different gestures.

  1. The solution adopts the two-stage method, which can improve the performance of the detection model and the classification model respectively.
  2. The hand detection model does not distinguish between gestures, but only detects the hand frame, and the detection accuracy is high.
  3. The gesture classification model can be very lightweight
  4. Gesture classification data is relatively easy to collect (you can collect a hands-on video, so that the images cropped after hand detection are all actions of the same category, reducing the cost of manually drawing frames to mark gestures)
  5. Due to the two-stage method for detection-recognition, the speed is relatively slow

Considering the HaGRID gesture recognition data set, all pictures have been marked with gesture categories and detection frames, so it is easier to use the " gesture recognition method based on multi-target detection ". This blog is a gesture recognition method based on multi-target detection. There are many multi-target detection methods, such as Faster-RCNN, YOLO series, SSD, etc., can be used. This blog will use YOLOv5 for gesture recognition training for multi-target detection.

If your data set only has some detection frames, but the data set of gesture classification pictures is relatively easy to collect, it is recommended to use the "based on hand detection + gesture classification recognition method". After all, this solution is relatively low in labeling costs. If you need this solution, you can contact me on WeChat official account.


3. Description of Gesture Recognition Dataset

(1) HaGRID gesture recognition dataset

The original HaGRID dataset is very large, and the pictures are all high-resolution (1920 × 1080) 200W pixels. To download the complete HaGRID dataset requires at least 716GB of hard disk space. In addition, because it is an external network link, the download may often drop.

Considering these problems, I streamlined and reduced the resolution of the HaGRID data set. At present, the entire data set has been compressed to about 18GB, which can meet the task requirements of gesture recognition classification and detection. In order to be different from the original data set, the data set is called It is the Light-HaGRID dataset , which is a relatively lightweight gesture recognition dataset.

  • Provide gesture recognition data set, a total of 18 gesture categories, each category contains about 7000 pictures, a total of 123731 pictures (12W+)
  • Provide the json annotation format files of all images, that is, the annotation format of the original HaGRID dataset
  • Provide XML annotation format files of all pictures, that is, the format converted to VOC dataset
  • Provide pictures of all gesture areas, and the hand area of ​​each label box is cropped and saved in the Classification folder
  • Can be used for gesture target detection model training
  • Can be used for gesture classification recognition model training

 For " HaGRID Gesture Recognition Dataset Instructions and Downloads ", please refer to my other blog,

HaGRID Gesture Recognition Dataset Usage Instructions and Download_PKing666666's Blog-CSDN Blog

(2) Custom data set

If you need to add/delete category data for training, or need to customize a data set for training, please refer to the following steps:

  1. Collect gesture pictures, no less than 200 pictures are recommended
  2. Use Labelme and other labeling tools to label the gesture drawing frame: labelme tool: GitHub - wkentaro/labelme: Image Polygonal Annotation with Python (polygon, rectangle, circle, line, point and image-level flag annotation).
  3. Convert label format to VOC data format, refer to tool: labelme/labelme2voc.py at main wkentaro/labelme GitHub
  4. Generate a list of training set train.txt and verification set val.txt files
  5. Modify the data path of train and val in engine/configs/voc_local.yaml
  6. restart training


4. Gesture recognition training based on YOLOv5

Considering that the CPU/GPU performance of the mobile phone is relatively weak, the running speed of direct deployment of yolov5s is very slow, so the Android deployment here only considers the yolov5s05 model. It is reduced by half, and the model input is reduced from the original 640×640 to 320×320. From the performance point of view, yolov5s05 is more than 5 times faster than yolov5s, and the mAP has dropped by 5% (0.87605 → 0.82706). For mobile phones, this accuracy is still acceptable.

Official YOLOv5:   https://github.com/ultralytics/yolov5 

The following is a comparison of the parameter amount and calculation amount of yolov5s05 and yolov5s:

Model input-size params(M) GFLOPs Gesture recognition mAP(0.5:0.95)
yolov5s 640×640 7.2 16.5 0.87605
yolov5s05 320×320 1.7 1.1 0.82706

The training process of yolov5s05 and yolov5s is exactly the same, but the configuration files are different; due to the length, this blog will not go into details. For the detailed training process, please refer to: "Gesture recognition system based on YOLOv5 (including gesture recognition data set + training code ) "

5. Android deployment of gesture recognition model

(1) Convert the Pytorch model to the ONNX model

After training the yolov5s05 or yolov5s model, you need to convert the model to the ONNX model and use onnx-simplifier to simplify the network structure

# 转换yolov5s05模型
python export.py --weights "runs/yolov5s05_320/weights/best.pt" --img-size 320 320

# 转换yolov5s模型
python export.py --weights "runs/yolov5s_640/weights/best.pt" --img-size 640 640

GitHub: https://github.com/daquexian/onnx-simplifier
Install:  pip3 install onnx-simplifier 

(2) Convert ONNX model to TNN model

At present, there are many deployment methods for the CNN model. You can use deployment tools such as TNN, MNN, NCNN, and TensorRT. I use TNN for Android deployment:

TNN conversion tool:

(3) Deploy the gesture recognition model on the Android side

The project implements the Android version of the gesture recognition Demo. The deployment framework uses TNN, which supports multi-threaded CPU and GPU accelerated inference, and can be processed in real time on ordinary mobile phones. The core algorithms of the Android source code are all implemented in C++, and the upper layer is called through the JNI interface.

If you want to deploy your own trained model in this Android Demo, you can convert the trained Pytorch model to ONNX, then convert it to a TNN model, and then replace the TNN model with your model.

package com.cv.tnn.model;

import android.graphics.Bitmap;

public class Detector {

    static {
        System.loadLibrary("tnn_wrapper");
    }


    /***
     * 初始化模型
     * @param model: TNN *.tnnmodel文件文件名(含后缀名)
     * @param root:模型文件的根目录,放在assets文件夹下
     * @param model_type:模型类型
     * @param num_thread:开启线程数
     * @param useGPU:关键点的置信度,小于值的坐标会置-1
     */
    public static native void init(String model, String root, int model_type, int num_thread, boolean useGPU);

    /***
     * 检测
     * @param bitmap 图像(bitmap),ARGB_8888格式
     * @param score_thresh:置信度阈值
     * @param iou_thresh:  IOU阈值
     * @return
     */
    public static native FrameInfo[] detect(Bitmap bitmap, float score_thresh, float iou_thresh);
}

(4) Some abnormal error solutions

  • Appeared during TNN reasoning: Permute param got wrong size

Official YOLOv5:   https://github.com/ultralytics/yolov5 

If you directly use the official YOLOv5 code to convert the TNN model, this error Permute param got wrong size will appear when deploying TNN. This is because TNN supports up to 4 dimensions for calculation, and YOLOv5 uses 5 dimensions for output. You need to modify the model/yolo.py file

 The export.py file sets model.model[-1].export = True:

"""Export a YOLOv5 *.pt model to TorchScript, ONNX, CoreML formats

Usage:
    $ python path/to/export.py --weights yolov5s.pt --img 640 --batch 1
"""

import argparse
import sys
import time
from pathlib import Path

import torch
import torch.nn as nn
from torch.utils.mobile_optimizer import optimize_for_mobile

FILE = Path(__file__).absolute()
sys.path.append(FILE.parents[0].as_posix())  # add yolov5/ to path

from models.common import Conv
from models.yolo import Detect
from models.experimental import attempt_load
from utils.activations import Hardswish, SiLU
from utils.general import colorstr, check_img_size, check_requirements, file_size, set_logging
from utils.torch_utils import select_device


def export_torchscript(model, img, file, optimize):
    # TorchScript model export
    prefix = colorstr('TorchScript:')
    try:
        print(f'\n{prefix} starting export with torch {torch.__version__}...')
        f = str(file.with_suffix('.torchscript.pt'))
        ts = torch.jit.trace(model, img, strict=False)
        (optimize_for_mobile(ts) if optimize else ts).save(f)
        print(f'{prefix} export success, saved as {f} ({file_size(f):.1f} MB)')
        return ts
    except Exception as e:
        print(f'{prefix} export failure: {e}')


def export_onnx(model, img, file, opset, train, dynamic, simplify):
    # ONNX model export
    prefix = colorstr('ONNX:')
    try:
        check_requirements(('onnx', 'onnx-simplifier'))
        import onnx

        print(f'\n{prefix} starting export with onnx {onnx.__version__}...')
        f = file.with_suffix('.onnx')
        torch.onnx.export(model, img, f, verbose=False, opset_version=opset,
                          training=torch.onnx.TrainingMode.TRAINING if train else torch.onnx.TrainingMode.EVAL,
                          do_constant_folding=not train,
                          input_names=['images'],
                          output_names=['output'],
                          dynamic_axes={'images': {0: 'batch', 2: 'height', 3: 'width'},  # shape(1,3,640,640)
                                        'output': {0: 'batch', 1: 'anchors'}  # shape(1,25200,85)
                                        } if dynamic else None)

        # Checks
        model_onnx = onnx.load(f)  # load onnx model
        onnx.checker.check_model(model_onnx)  # check onnx model
        # print(onnx.helper.printable_graph(model_onnx.graph))  # print

        # Simplify
        if simplify:
            try:
                import onnxsim

                print(f'{prefix} simplifying with onnx-simplifier {onnxsim.__version__}...')
                model_onnx, check = onnxsim.simplify(
                    model_onnx,
                    dynamic_input_shape=dynamic,
                    input_shapes={'images': list(img.shape)} if dynamic else None)
                assert check, 'assert check failed'
                onnx.save(model_onnx, f)
            except Exception as e:
                print(f'{prefix} simplifier failure: {e}')
        print(f'{prefix} export success, saved as {f} ({file_size(f):.1f} MB)')
        print(f"{prefix} run --dynamic ONNX model inference with detect.py: 'python detect.py --weights {f}'")
    except Exception as e:
        print(f'{prefix} export failure: {e}')


def export_coreml(model, img, file):
    # CoreML model export
    prefix = colorstr('CoreML:')
    try:
        import coremltools as ct

        print(f'\n{prefix} starting export with coremltools {ct.__version__}...')
        f = file.with_suffix('.mlmodel')
        model.train()  # CoreML exports should be placed in model.train() mode
        ts = torch.jit.trace(model, img, strict=False)  # TorchScript model
        model = ct.convert(ts, inputs=[ct.ImageType('image', shape=img.shape, scale=1 / 255.0, bias=[0, 0, 0])])
        model.save(f)
        print(f'{prefix} export success, saved as {f} ({file_size(f):.1f} MB)')
    except Exception as e:
        print(f'\n{prefix} export failure: {e}')


def run(weights='./yolov5s.pt',  # weights path
        img_size=(640, 640),  # image (height, width)
        batch_size=1,  # batch size
        device='cpu',  # cuda device, i.e. 0 or 0,1,2,3 or cpu
        include=('torchscript', 'onnx', 'coreml'),  # include formats
        half=False,  # FP16 half-precision export
        inplace=True,  # set YOLOv5 Detect() inplace=True
        train=False,  # model.train() mode
        optimize=False,  # TorchScript: optimize for mobile
        dynamic=False,  # ONNX: dynamic axes
        simplify=True,  # ONNX: simplify model
        opset=12,  # ONNX: opset version
        ):
    t = time.time()
    include = [x.lower() for x in include]
    img_size *= 2 if len(img_size) == 1 else 1  # expand
    file = Path(weights)

    # Load PyTorch model
    device = select_device(device)
    assert not (device.type == 'cpu' and half), '--half only compatible with GPU export, i.e. use --device 0'
    model = attempt_load(weights, map_location=device)  # load FP32 model
    names = model.names

    # Input
    gs = int(max(model.stride))  # grid size (max stride)
    img_size = [check_img_size(x, gs) for x in img_size]  # verify img_size are gs-multiples
    img = torch.zeros(batch_size, 3, *img_size).to(device)  # image size(1,3,320,192) iDetection

    # Update model
    if half:
        img, model = img.half(), model.half()  # to FP16
    model.train() if train else model.eval()  # training mode = no Detect() layer grid construction
    for k, m in model.named_modules():
        if isinstance(m, Conv):  # assign export-friendly activations
            if isinstance(m.act, nn.Hardswish):
                m.act = Hardswish()
            elif isinstance(m.act, nn.SiLU):
                m.act = SiLU()
        elif isinstance(m, Detect):
            m.inplace = inplace
            m.onnx_dynamic = dynamic
            # m.forward = m.forward_export  # assign forward (optional)

    # for _ in range(2):
    #     y = model(img)  # dry runs
    print(f"\n{colorstr('PyTorch:')} starting from {weights} ({file_size(weights):.1f} MB)")

    # Exports
    if 'torchscript' in include:
        model.model[-1].export = True  # TNN不支持5个维度,修改输出格式
        export_torchscript(model, img, file, optimize)
    if 'onnx' in include:
        model.model[-1].export = True  # TNN不支持5个维度,修改输出格式
        export_onnx(model, img, file, opset, train, dynamic, simplify=simplify)
    if 'coreml' in include:
        export_coreml(model, img, file)

    # Finish
    print(f'\nExport complete ({time.time() - t:.2f}s)'
          f"\nResults saved to {colorstr('bold', file.parent.resolve())}"
          f'\nVisualize with https://netron.app')


def parse_opt():
    """
    python export.py --weights "runs/yolov5s05_320/weights/best.pt" --img-size 320 320
    python export.py --weights "runs/yolov5s_640/weights/best.pt" --img-size 640 640
    """
    weights = "runs/yolov5s_640/weights/best.pt"  # 模型文件yolov5s_640
    input_size = [640, 640]
    # weights = "runs/yolov5s05_320/weights/best.pt"  # 模型文件yolov5s05_320
    # input_size = [320, 320]
    # default = ['torchscript', 'onnx', 'coreml']
    default = ['onnx']
    parser = argparse.ArgumentParser()
    parser.add_argument('--weights', type=str, default=weights, help='weights path')
    parser.add_argument('--img-size', nargs='+', type=int, default=input_size, help='image (height, width)')
    parser.add_argument('--batch-size', type=int, default=1, help='batch size')
    parser.add_argument('--device', default='cpu', help='cuda device, i.e. 0 or 0,1,2,3 or cpu')
    parser.add_argument('--include', nargs='+', default=default, help='include formats')
    # parser.add_argument('--half', action='store_true', help='FP16 half-precision export')
    # parser.add_argument('--inplace', action='store_true', help='set YOLOv5 Detect() inplace=True')
    # parser.add_argument('--train', action='store_true', help='model.train() mode')
    # parser.add_argument('--optimize', action='store_true', help='TorchScript: optimize for mobile')
    # parser.add_argument('--dynamic', action='store_true', help='ONNX: dynamic axes')
    # parser.add_argument('--simplify', action='store_true', help='ONNX: simplify model')
    parser.add_argument('--opset', type=int, default=12, help='ONNX: opset version')
    opt = parser.parse_args()
    return opt


def main(opt):
    set_logging()
    print(colorstr('export: ') + ', '.join(f'{k}={v}' for k, v in vars(opt).items()))
    run(**vars(opt))


if __name__ == "__main__":
    opt = parse_opt()
    main(opt)
  • The effect of TNN reasoning is very poor, and the detection frame is a mess

 Most of this problem is due to incorrect model parameter settings. You need to modify the C++ reasoning code YOLOv5Param model parameters according to your own model.


struct YOLOv5Param {
    ModelType model_type;                  // 模型类型,MODEL_TYPE_TNN,MODEL_TYPE_NCNN等
    int input_width;                       // 模型输入宽度,单位:像素
    int input_height;                      // 模型输入高度,单位:像素
    bool use_rgb;                          // 是否使用RGB作为模型输入(PS:接口固定输入BGR,use_rgb=ture时,预处理将BGR转换为RGB)
    bool padding;
    int num_landmarks;                     // 关键点个数
    NetNodes InputNodes;                   // 输入节点名称
    NetNodes OutputNodes;                  // 输出节点名称
    vector<YOLOAnchor> anchors;
    vector<string> class_names;            // 类别集合
};

input_width and input_height are the input size of the model; vector<YOLOAnchor> anchors need to correspond, note that the original anchor of the Python version of yolov5s is

anchors:
  - [10,13, 16,30, 33,23]  # P3/8
  - [30,61, 62,45, 59,119]  # P4/16
  - [116,90, 156,198, 373,326]  # P5/32

For yolov5s05, because the input size changed from 640 to 320, the anchor also needs to be adjusted accordingly, so I re-clustered the anchor of the gesture data before training, and the anchor of input 320×320 was obtained as:

anchors:
  - [ 12,19,  17,28, 22,34 ]
  - [ 25,47,  33,41, 34,59 ]
  - [ 49,54,  46,79, 70,92 ]

Therefore, the model parameters YOLOv5Param of the C++ version of yolov5s and yolov5s05 are set as follows


//YOLOv5s模型参数
static YOLOv5Param YOLOv5s_GESTURE_640 = {MODEL_TYPE_TNN,
                                          640,
                                          640,
                                          true,
                                          true,
                                          0,
                                          {
   
   {
   
   {"images", nullptr}}}, //InputNodes
                                               {
   
   {
   
   {"boxes", nullptr},   //OutputNodes
                                                 {"scores", nullptr}}},
                                          {
                                                       {"434", 32,
                                                        {
   
   {116, 90}, {156, 198}, {373, 326}}},
                                                       {"415", 16, {
   
   {30, 61}, {62, 45}, {59, 119}}},
                                                       {"output", 8,
                                                        {
   
   {10, 13}, {16, 30}, {33, 23}}}, //
                                               },
                                          GESTURE_NAME};

//YOLOv5s05模型参数
static YOLOv5Param YOLOv5s05_GESTURE_ANCHOR_320 = {MODEL_TYPE_TNN,
                                                   320,
                                                   320,
                                                   true,
                                                   true,
                                                   0,
                                                   {
   
   {
   
   {"images", nullptr}}}, //InputNodes
                                                      {
   
   {
   
   {"boxes", nullptr},   //OutputNodes
                                                        {"scores", nullptr}}},
                                                   {
                                                              {"434", 32,
                                                               {
   
   {49, 54}, {46, 79}, {70, 92}}},
                                                              {"415", 16,
                                                               {
   
   {25, 47}, {33, 41}, {34, 59}}},
                                                              {"output", 8,
                                                               {
   
   {12, 19}, {17, 28}, {22, 34}}}, //
                                                      },
                                                   GESTURE_NAME};
  • Running APP flashback: dlopen failed: library "libomp.so" not found

Reference solution: Solve dlopen failed: library “libomp.so” not found_PKing666666's blog-CSDN blog_dlopen failed 


6. Gesture recognition test results

 Android APP experience  https://download.csdn.net/download/guyuealian/86666991  

The APP can achieve real-time gesture recognition effect on ordinary Android mobile phones. The CPU (4 threads) takes about 30ms and the GPU takes about 25ms, which basically meets the performance requirements of the business.


7. Project source code download

[Android APP experience] https://download.csdn.net/download/guyuealian/86666991

  Complete set of Android gesture recognition project source code content download: ​​​​​​​​Android implements hand detection and gesture recognition 

  1. Provide a fast version of yolov5s05 gesture recognition, which can detect and recognize in real time on ordinary mobile phones, CPU (4 threads) about 30ms, GPU about 25ms
  2. Provide high-precision version yolov5s gesture recognition, CPU (4 threads) about 250ms, GPU about 100ms
  3. A complete set of Android gesture recognition project source code

  4. ​Android Demo supports picture, video, camera test

  5. All dependent libraries have been configured and can be built and run directly. If there is a crash during operation, please refer to dlopen failed: library “libomp.so” not found  to solve it.

Guess you like

Origin blog.csdn.net/guyuealian/article/details/126994546