Hand key point detection 4: Android implements hand key point detection (hand posture estimation) with source code for real-time detection

Table of contents

1 Introduction

2. Hand key point detection (hand posture estimation) method

(1)Top-Down (top-down) method

(2) Bottom-Up (bottom-up) method:

3. Hand key point detection model training

4. Android deployment of hand key point detection model

(1) Convert Pytorch model to ONNX model

(2) Convert ONNX model to TNN model

(3) Deployment model on Android terminal

(4) Android test results 

(5) Crash when running the APP: dlopen failed: library "libomp.so" not found

5.Android project source code download

6. C++ implements hand key point detection


1 Introduction

This article is an Android development article in the project "Hand key point detection (hand posture estimation)" series of articles. The hand detection YOLOv5 model and the hand key point detection LiteHRNet and Mobilenet-v2 models are deployed to the Android platform. We will develop a simple Android Demo that can detect hand key points in real time.

The project will teach you step by step how to deploy the trained hand detection and hand key point detection models to the Android platform, including how to convert to ONNX and TNN models, and transplant them to Android for deployment to achieve a hand detection and hand key point detection model. Android Demo APP for key point detection. The APP can achieve real-time detection and recognition effects on ordinary Android phones. The CPU (4 threads) takes about 50ms and the GPU takes about 30ms, which basically meets the performance requirements of the business.

Respect originality, please indicate the source when reprintinghttps://blog.csdn. net/guyuealian/article/details/133931698

 AndroidHand key point detection (hand posture estimation)APP Demo experience:https: //download.csdn.net/download/guyuealian/88418582

  


For more project "Hand key point detection (hand posture estimation)" series of articles, please refer to:

  


2. Hand key point detection (hand posture estimation) method

Hand key point detection (hand posture estimation) method, there are currently two mainstream methods: one isTop- Down (top-down) method, the other isBottom-Up (bottom-up) method ;

(1)Top-Down(Jijijishita)Method

Separate hand detection and hand key point estimation, first perform hand target detection on the image, and locate the hand position; then crop each hand image, and then estimate the key points of each hand; this type of method is often compared Slow, but the attitude estimation accuracy is higher. At present, the mainstream models mainly include CPN, Hourglass, CPM, Alpha Pose, HRNet, etc.

(2)Bottom-Up(Joshigejiji)Method:

First estimate the key points of all hands in the image, and then combine them into hand instances through the Grouping method; therefore, this type of method is often faster and less accurate when testing inference. A typical example is COCO's 2016 human body key point detection champion Open Pose.

Generally speaking, Top-Down has higher accuracy, while Bottom-Up has faster speed;Based on current research Generally speaking, the Top-Down method has been studied more and its accuracy is higher than the bottom-up method.

This project is improved based on the open source HRNet. Please refer to GitHub for the HRNet project.

HRNet: https://github.com/leoxiaobin/deep-high-resolution-net.pytorch


3. Hand key point detection model training

This project uses the YOLOv5 model for hand detection. The hand key point detection is improved based on the open source HRNet to build a complete set of training and testing processes for hand key point detection; in order to facilitate subsequent model engineering and Android platform deployment , the project supports lightweight model LiteHRNet and Mobilenet model training and testing, and provides multiple versions of Python/C++/Android; the lightweight Mobilenet-v2 model is in Real-time detection results can be achieved on ordinary Android phones. The CPU (4 threads) is about 50ms and the GPU is about 30ms, which basically meets the performance needs of the business

For the method of hand key point detection model training, please refer to another blog post: Pytorch implements hand key point detection (hand posture estimation) including training code and data setHand key point detection 3: Pytorch implements hand key point detection (hand posture estimation) including training code and data set - CSDN Blog

The following table shows the calculation amount and parameter amount of HRNet, as well as the lightweight models LiteHRNet and Mobilenet, as well as its detection accuracy AP; the high-precision detection model HRNet-w32, AP can reach 0.8570, but its parameter amount and calculation amount are relatively large, and it is not Suitable for deployment on the mobile terminal; LiteHRNet18 and Mobilenet-v2 have relatively small parameters and calculations, so they are suitable for deployment on the mobile terminal; although the theoretical calculation and parameter quantities of LiteHRNet18 are lower than those of Mobilenet-v2, in actual tests, it was found that Mobilenet-v2 Runs faster. The lightweight Mobilenet-v2 model can achieve real-time detection results on ordinary Android phones. The CPU (4 threads) takes about 50ms and the GPU takes about 30ms, which basically meets the performance requirements of the business.

Model input-size params(M) GFLOPs AP
HRNet-w32 192×192 28.48M 5734.05M 0.8570
LiteHRNet18 192×192 1.10M 182.15M 0.8023
Mobilenet-v2 192×192 2.63M 529.25M 0.7574

HRNet-w32 has too many parameters and calculations to be deployed on Android phones. The Android version of this project only supports the deployment of LiteHRNet and Mobilenet-v2 models; the C++ version can support the deployment of HRNet-w32, LiteHRNet and Mobilenet-v2 models. 


4. Android deployment of hand key point detection model

Currently, there are many deployment methods for CNN models. You can use deployment tools such as TNN, MNN, NCNN, and TensorRT. I use TNN for deployment on Android. The deployment process can be divided into four steps: Train the model->Convert the model to an ONNX model->Convert the ONNX model to a TNN model->Deploy the TNN model on the Android side.

(1) Convert Pytorch model to ONNX model

After training the Pytorch model, we need to convert the model to an ONNX model for subsequent model deployment.

  • The original project provides a conversion script. You only need to modify model_file and config_file to your model path.
  •  convert_torch_to_onnx.py implements the script to convert Pytorch model to ONNX model
python libs/convert_tools/convert_torch_to_onnx.py
"""
This code is used to convert the pytorch model into an onnx format model.
"""
import os
import torch.onnx
from pose.inference import PoseEstimation
from basetrainer.utils.converter import pytorch2onnx


def load_model(config_file, model_file, device="cuda:0"):
    pose = PoseEstimation(config_file, model_file, device=device)
    model = pose.model
    config = pose.config
    return model, config


def convert2onnx(config_file, model_file, device="cuda:0", onnx_type="kp"):
    """
    :param model_file:
    :param input_size:
    :param device:
    :param onnx_type:
    :return:
    """
    model, config = load_model(config_file, model_file, device=device)
    model = model.to(device)
    model.eval()
    model_name = os.path.basename(model_file)[:-len(".pth")]
    onnx_file = os.path.join(os.path.dirname(model_file), model_name + ".onnx")
    # dummy_input = torch.randn(1, 3, 240, 320).to("cuda")
    input_size = tuple(config.MODEL.IMAGE_SIZE)  # w,h
    input_shape = (1, 3, input_size[1], input_size[0])
    pytorch2onnx.convert2onnx(model,
                              input_shape=input_shape,
                              input_names=['input'],
                              output_names=['output'],
                              onnx_file=onnx_file,
                              opset_version=11)


if __name__ == "__main__":
    model_file = "../../work_space/hand/mobilenet_v2_21_192_192_custom_coco_20230928_065444_0934/model/best_model_153_0.7574.pth"
    config_file = "../../work_space/hand/mobilenet_v2_21_192_192_custom_coco_20230928_065444_0934/mobilenetv2_hand_192_192.yaml"
    convert2onnx(config_file, model_file)

(2) Convert ONNX model to TNN model

Currently, there are many deployment methods for CNN models. You can use TNN, MNN, NCNN, and TensorRT and other deployment tools. I use TNN for deployment on Android.

TNN conversion tool:

​​

(3) Deployment model on Android terminal

The project implements the Android version of hand detection and hand key point detection Demo. The deployment framework uses TNN, supports multi-threaded CPU and GPU accelerated reasoning, and can be processed in real time on ordinary mobile phones. The project Android source code and core algorithms are all implemented in C++, and the upper layer is called through the JNI interface.

If you want to deploy your own trained classification model in this Android Demo, you can convert the trained Pytorch model to ONNX, then convert it to a TNN model, and then replace your model with the TNN model.​ 

HRNet-w32 has too many parameters and calculations to be deployed on Android phones. The Android version of this project only supports the deployment of LiteHRNet and Mobilenet-v2 models; the C++ version can support the deployment of HRNet-w32, LiteHRNet and Mobilenet-v2 models. 

  • This is the project Android source code JNI interface, Java part
package com.cv.tnn.model;

import android.graphics.Bitmap;

public class Detector {

    static {
        System.loadLibrary("tnn_wrapper");
    }


    /***
     * 初始化检测模型
     * @param dets_model: 检测模型(不含后缀名)
     * @param pose_model: 识别模型(不含后缀名)
     * @param root:模型文件的根目录,放在assets文件夹下
     * @param model_type:模型类型
     * @param num_thread:开启线程数
     * @param useGPU:是否开启GPU进行加速
     */
    public static native void init(String dets_model, String pose_model, String root, int model_type, int num_thread, boolean useGPU);

    /***
     * 返回检测和识别结果
     * @param bitmap 图像(bitmap),ARGB_8888格式
     * @param score_thresh:置信度阈值
     * @param iou_thresh:  IOU阈值
     * @param pose_thresh:  关键点阈值
     * @return
     */
    public static native FrameInfo[] detect(Bitmap bitmap, float score_thresh, float iou_thresh, float pose_thresh);
}
  • This is the Android project source code JNI interface, C++ part
#include <jni.h>
#include <string>
#include <fstream>
#include "src/yolov5.h"
#include "src/pose_detector.h"
#include "src/Types.h"
#include "debug.h"
#include "android_utils.h"
#include "opencv2/opencv.hpp"
#include "file_utils.h"

using namespace dl;
using namespace vision;

static YOLOv5 *detector = nullptr;
static PoseDetector *pose = nullptr;

JNIEXPORT jint JNI_OnLoad(JavaVM *vm, void *reserved) {
    return JNI_VERSION_1_6;
}

JNIEXPORT void JNI_OnUnload(JavaVM *vm, void *reserved) {

}


extern "C"
JNIEXPORT void JNICALL
Java_com_cv_tnn_model_Detector_init(JNIEnv *env,
                                    jclass clazz,
                                    jstring dets_model,
                                    jstring pose_model,
                                    jstring root,
                                    jint model_type,
                                    jint num_thread,
                                    jboolean use_gpu) {
    if (detector != nullptr) {
        delete detector;
        detector = nullptr;
    }
    std::string parent = env->GetStringUTFChars(root, 0);
    std::string dets_model_ = env->GetStringUTFChars(dets_model, 0);
    std::string pose_model_ = env->GetStringUTFChars(pose_model, 0);
    string dets_model_file = path_joint(parent, dets_model_ + ".tnnmodel");
    string dets_proto_file = path_joint(parent, dets_model_ + ".tnnproto");
    string pose_model_file = path_joint(parent, pose_model_ + ".tnnmodel");
    string pose_proto_file = path_joint(parent, pose_model_ + ".tnnproto");
    DeviceType device = use_gpu ? GPU : CPU;
    LOGW("parent     : %s", parent.c_str());
    LOGW("useGPU     : %d", use_gpu);
    LOGW("device_type: %d", device);
    LOGW("model_type : %d", model_type);
    LOGW("num_thread : %d", num_thread);
    YOLOv5Param model_param = YOLOv5s05_320;//模型参数
    detector = new YOLOv5(dets_model_file,
                          dets_proto_file,
                          model_param,
                          num_thread,
                          device);

    PoseParam pose_param = POSE_MODEL_TYPE[model_type];//模型类型
    pose = new PoseDetector(pose_model_file,
                            pose_proto_file,
                            pose_param,
                            num_thread,
                            device);
}

extern "C"
JNIEXPORT jobjectArray JNICALL
Java_com_cv_tnn_model_Detector_detect(JNIEnv *env, jclass clazz, jobject bitmap,
                                      jfloat score_thresh, jfloat iou_thresh, jfloat pose_thresh) {
    cv::Mat bgr;
    BitmapToMatrix(env, bitmap, bgr);
    int src_h = bgr.rows;
    int src_w = bgr.cols;
    // 检测区域为整张图片的大小
    FrameInfo resultInfo;
    // 开始检测
    if (detector != nullptr) {
        detector->detect(bgr, &resultInfo, score_thresh, iou_thresh);
    } else {
        ObjectInfo objectInfo;
        objectInfo.x1 = 0;
        objectInfo.y1 = 0;
        objectInfo.x2 = (float) src_w;
        objectInfo.y2 = (float) src_h;
        objectInfo.label = 0;
        resultInfo.info.push_back(objectInfo);
    }

    int nums = resultInfo.info.size();
    LOGW("object nums: %d\n", nums);
    if (nums > 0) {
        // 开始检测
        pose->detect(bgr, &resultInfo, pose_thresh);
        // 可视化代码
        //classifier->visualizeResult(bgr, &resultInfo);
    }
    //cv::cvtColor(bgr, bgr, cv::COLOR_BGR2RGB);
    //MatrixToBitmap(env, bgr, dst_bitmap);
    auto BoxInfo = env->FindClass("com/cv/tnn/model/FrameInfo");
    auto init_id = env->GetMethodID(BoxInfo, "<init>", "()V");
    auto box_id = env->GetMethodID(BoxInfo, "addBox", "(FFFFIF)V");
    auto ky_id = env->GetMethodID(BoxInfo, "addKeyPoint", "(FFF)V");
    jobjectArray ret = env->NewObjectArray(resultInfo.info.size(), BoxInfo, nullptr);
    for (int i = 0; i < nums; ++i) {
        auto info = resultInfo.info[i];
        env->PushLocalFrame(1);
        //jobject obj = env->AllocObject(BoxInfo);
        jobject obj = env->NewObject(BoxInfo, init_id);
        // set bbox
        //LOGW("rect:[%f,%f,%f,%f] label:%d,score:%f \n", info.rect.x,info.rect.y, info.rect.w, info.rect.h, 0, 1.0f);
        env->CallVoidMethod(obj, box_id, info.x1, info.y1, info.x2 - info.x1, info.y2 - info.y1,
                            info.label, info.score);
        // set keypoint
        for (const auto &kps : info.keypoints) {
            //LOGW("point:[%f,%f] score:%f \n", lm.point.x, lm.point.y, lm.score);
            env->CallVoidMethod(obj, ky_id, (float) kps.point.x, (float) kps.point.y,
                                (float) kps.score);
        }
        obj = env->PopLocalFrame(obj);
        env->SetObjectArrayElement(ret, i, obj);
    }
    return ret;
}

(4) Android test results 

Android Demo can achieve real-time detection results on ordinary mobile phone CPU/GPU; CPU (4 threads) takes about 50ms, and GPU takes about 30ms, which basically meets the performance requirements of the business.

AndroidHand key point detection (hand posture estimation)APP Demo experience:https: //download.csdn.net/download/guyuealian/88418582

      

      

(5) Crash when running the APP: dlopen failed: library "libomp.so" not found

Reference solution:
Solution to dlopen failed: library “libomp.so“ not found_PKing666666’s blog-CSDN blog_dlopen failed

 For Android SDK and NDK related version information, please refer to: 

Android Studio 4.1.1

 


5.Android project source code download

Android project source code download address:Android implements hand key point detection (hand posture estimation) with source code for real-time detection

The complete set of Android project source code includes:

  1. Android Demo source code supports YOLOv5 hand detection
  2. Android Demo source code supports lightweight model LiteHRNet and Mobilenet-v2 hand key point detection
  3. Android Demo can be detected in real time on ordinary mobile phone CPU/GPU, about 50ms for CPU and 30ms for GPU.
  4. Android Demo supports image, video, and camera testing
  5. All dependent libraries have been configured and can be built and run directly. If a crash occurs during operation, please refer todlopen failed: library “libomp.so“ not found  solved.

6. C++ implements hand key point detection

Guess you like

Origin blog.csdn.net/guyuealian/article/details/133931698