My open source project - Use OnnxRuntime to deploy RTMPose on the CPU side for real-time 2D pose estimation

1 RTMPose

RTMPose paper address: https://arxiv.org/abs/2303.07399 .

RTMPose is a 2D pose estimation framework in the Top-Down paradigm. Momomomodified Simcc, which is lighter and more effective, and has more industrial application characteristics.

The highlight of RTMPose is the industrial-level reasoning speed and accuracy, which is also highlighted in the abstract of his paper. You can read the abstract of his paper carefully.

Recent studies on 2D pose estimation have achieved excellent performance on public benchmarks, yet its application in the industrial community still suffers from heavy model parameters and high latency. In order to bridge this gap, we empirically explore key factors in pose estimation including paradigm, model architecture, training strategy, and deployment, and present a high-performance real-time multi-person pose estimation framework, RTMPose, based on MMPose. Our RTMPose-m achieves 75.8% AP on COCO with 90+ FPS on an Intel i7-11700 CPU and 430+ FPS on an NVIDIA GTX 1660 Ti GPU, and RTMPose-l achieves 67.0% AP on COCO-WholeBody with 130+ FPS. To further evaluate RTMPose’s capability in critical real-time applications, we also report the performance after deploying on the mobile device. Our RTMPose-s achieves 72.2% AP on COCO with 70+ FPS on a Snapdragon 865 chip, outperforming existing open-source libraries. Code and models are released at this https URL.

From the introduction of the abstract of the paper, the RTMPose-m model can achieve 75.8% AP on COCO , and can reach 90+FPS with ONNXRuntime on Intel i7-11700 CPU , and 430+FPS with TensorRT on NVIDIA GTX 1660 Ti GPU . **RTMPose-s achieves 70+FPS with ncnn deployment on the mobile phone Snapdragon865 chip with a performance of 72.2% AP .

It's so powerful, I must have it! ! ! !

RTMPose has been integrated into MMPose, Github address: https://github.com/open-mmlab/mmpose/tree/dev-1.x/projects/rtmpose

At that time, I took a closer look at RTMPose's README.md document. Its tutorial on model deployment relies heavily on MMDeploy, and I personally think that deep integration of MMDeploy will be prohibitive (of course this is just a personal opinion, don't spray), and I have more Experience with local deployment and server deployment models, so in this article we do not rely on MMDeploy, but use the OnnxRuntime CPU C++ SDK to deploy the onnx model exported by RTMDetnano+RTMPose-m to the local CPU. No GPU is needed to perform real-time 2D pose estimation. The old i5-7400 4H I tested can also perform real-time estimation, so hurry up!

I would also like to thank Jinglao, the author of RTMPose ( Jinglao Zhihu homepage ), for his Lightspeed Merge of Pr for this example of mine.

2 Use OnnxRuntime to deploy RTMDetnano+RTMPose on the CPU side

Well, this section will introduce in detail how to use OnnxRuntime to deploy the RTMDetnano+RTMPose model on the CPU side. In this tutorial, a Top-Down 2D pose estimation example based on RTMDetnano+RTMPose will be implemented, and RTMDetnano will detect people. Then crop the corresponding image area according to the detection frame and feed it to RTMPose for pose estimation, and perform a simple C++ class for real-time 2D pose estimation of frame skipping detection. Well, let us start happily.

The code example in this article has been open source: https://github.com/HW140701/RTMPose-Deploy , and a precompiled package is provided. If you are too lazy to compile, you can download the precompiled package and run it directly. Of course, you need to have VC runtime on your Windows computer. Anyone who is interested can click on a star , thank you.

The code in this article mainly shows the pre- and post-processing methods of data based on RTMDetnano+RTMPose. Those who are interested can learn from it.

The sample code in this article has been submitted to MMPose dev1.x: https://github.com/open-mmlab/mmpose/pull/2316 .

2.1 Download Onnx model and convert Onnx model

Find the example of the onnx model exported by default from the README.md document of RTMPose , and the download address is:

https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-cpu.zip

After the decompression is complete, you will see the onnx model of RTMDetnano+RTMPose, named end2end.onnx.

However, the RTMPose here is the 17 human body key points of the coco17 dataset. If you need other RTMPose onnx models, please refer to RTMPose's README.md to export the onnx model.

Then the target detector is RTMDetnano. Of course, you can also use other target detectors, such as various YOOOOOOOOOOOO. I personally think that the target detector has little impact on the subsequent pose estimation. Of course, this target detector is not a retarded detector.

2.2 Implementing a Top-Down 2D pose estimation example based on RTMDetnano+RTMPose

2.2.1 Target detection based on RTMDetnano

Since the RTMDetnano provided in the above link has dynamic dimensions in batch_size, image_height, and image_width, the width and height of the input image are not fixed during implementation.

The values ​​of image_mean and image_std used for input image normalization come from the pipeline.json files corresponding to each model in the https://download.openmmlab.com/mmpose/v1/projects/rtmpose/rtmpose-cpu.zip compressed package. .

For more details, please refer to the warehouse code.

After inputting the picture, after reasoning, the warehouse will select the detection frame with the category of 0 and the highest confidence as the area to be detected for subsequent pose estimation. That is to say, this example currently only poses the person with the highest target detection probability in the picture. It is estimated, but multi-person pose estimation is also relatively simple to expand, and there is no large amount of tasks.

2.2.2 Attitude estimation based on RTMPose

After the target detection is completed, the area of ​​the detection frame is cropped according to the detection frame and affine transformation, because the input dimension required by RTMPose is 1x3x256x192, and then the cropped image is preprocessed and fed to RTMPose for recognition, and 17 keys are obtained The coordinates of the point on 256x192, and then reverse the coordinates to the original input image through anti-affine transformation to get the correct coordinates.

For more details, please refer to the warehouse code.

2.2.3 Real-time video stream pose estimation: RTMPoseTracker

After the inference classes of RTMDetnano and RTMPoseTracker are built, we built a simple RTMPoseTracker for real-time video stream detection. By default, RTMPoseTracker only performs target detection every 10 frames, which can greatly reduce the single-frame inference delay. To achieve real-time 2D pose estimation performance.

If you are interested, you can visit my personal website: https://www.stubbornhuang.com/

Guess you like

Origin blog.csdn.net/HW140701/article/details/130431418