Carrying forward the past, PAM-PoseAnythingModel: A Graph-Based Approach for Category-Agnostic Pose Estimation estimates the posture of all objects

proposed SAM: SegmentAnythingModel in the first half of this year, which was a sensation in the CV field. Subsequently, many interesting application results evolved based on SAM. The previous series of blog posts are as follows. If you are interested, you can go there by yourself. Reading:
"Segment Anything Model (SAM) - rolled up, the big CV model that claims to split everything is here"

"Segment Anything Model (SAM) - Segment everything, image segmentation practice with predictive prompt input"

"SAM-FAST: Accelerating Generative AI with PyTorch: Segment Anything, Fast is based on the official PyTorch team to develop native SAM to speed up 8 times"

Last month, at the end of November, another team published a brand new research result, which I will call PAM: PoseAnythingModel. The official text is herehere a>, as shown below:

Traditional 2D pose estimation models are limited by their class-specific designs, making them applicable only to predefined object classes. This limitation becomes particularly challenging when dealing with new objects due to the lack of relevant training data.

To address this limitation, the authors introduce class-agnostic pose estimation (CAPE). CAPE aims to achieve keypoint localization for arbitrary object classes using a single model, requiring minimal supporting images with annotated keypoints. This method is not only able to generate object poses based on arbitrary keypoint definitions, but also significantly reduces the associated costs, paving the way for versatile and adaptive pose estimation applications.

The article proposes a new CAPE method that exploits the inherent geometric relationships between key points through a newly designed graph transformer decoder. By capturing and incorporating this critical structural information, our approach improves the accuracy of keypoint localization, marking a significant departure from traditional CAPE techniques that treat keypoints as isolated entities.

The authors validate our method on the MP-100 benchmark, a comprehensive dataset including more than 20,000 images spanning more than 100 categories. Our method outperforms the state-of-the-art by a large margin, achieving significant improvements of 2.16% and 1.82% in the 1-shot and 5-shot settings, respectively. Furthermore, the end-to-end training of the proposed method demonstrates scalability and efficiency compared with previous CAPE methods.

If you want to know more about the relevant technical details, you can read the original paper by yourself.

The author's blog ishere, as follows:

When the author published the paper, he also open sourced the project address, which is here, as shown below:

It's not very popular at the moment.

If you want to get started and experience using it, the author provides a mirror that can be used directly. The operation method is as follows:

docker pull orhir/pose_anything
docker run --name pose_anything -v {DATA_DIR}:/workspace/PoseAnything/PoseAnything/data/mp100 -it orhir/pose_anything /bin/bash

The development and training of the model is based on the python3.8 environment. PyTorch is version 2.0.1, and CUDA is version 12.1. Developers can use it as a reference.

The following two basic libraries need to be installed in advance:

mmcv-full=1.6.2
mmpose=0.29.0

After the installation is complete, execute:

python setup.py develop

If you want to use the pre-trained Swin Transformer used in the author's paper, you can download the corresponding weight file fromhere.

The training execution command is as follows:

python train.py --config [path_to_config_file]  --work-dir [path_to_work_dir]

The author completed the overall development, training and evaluation work based on the MP-100 data set. The relevant models are as follows:
 

1-Shot Models

Setting split 1 split 2 split 3 split 4 split 5
Tiny 91.06 88024 86.09 86.17 85.78
link / config link / config link / config link / config link / config
Small 93.66 90.42 89.79 88.68 89.61
link / config link / config link / config link / config link / config

5-Shot Models

Setting split 1 split 2 split 3 split 4 split 5
Tiny 94.18 91.46 90.50 90.18 89.47
link / config link / config link / config link / config link / config
Small 96.51 92.15 91.99 92.01 92.36
link / config link / config link / config link / config link / config

The evaluation execution command is as follows:

python test.py [path_to_config_file] [path_to_pretrained_ckpt]

I will find time to set up the environment in detail later for field testing.

Guess you like

Origin blog.csdn.net/Together_CZ/article/details/134941158