Wholebody 3D keypoint estimation: start from H3WB



foreword

This work is the first attempt to detect 3D full-body poses. The data set we use is a 3D whole body key point data set based on Human3.6M.


1. H3WB

H3WB: Human3.6M 3D WholeBody Dataset and Benchmark
It is a large-scale 3D whole body pose estimation dataset. It is an extension of Human3.6M, which contains 133 key points, and its skeleton expansion is the same as that of COCO whole body.
insert image description here

1. Download

The original pictures can be downloaded from Human3.6M official website:
Link: Human3.6M

Note: To enter the official website, you need to apply for an account to log in to download the content inside. Here, we enter: Download->Training Data->By subject, download all the listed Videos.
The video of each subject is relatively long, 6-11G in size, download slowly.

Refer to H3WB's official website to provide scripts to process Human3.6m videos and establish links from pictures to corresponding labels.

The script code is as follows (example):

import cv2
import os

def convert_mp4_to_image(inpath, outpath, each_x_frame=1):
    print("load "+inpath)
    vidcap = cv2.VideoCapture(inpath)
    success, image = vidcap.read()
    count = 0
    while success:
        if count % each_x_frame == 0:
            cv2.imwrite(outpath+str(count).zfill(4)+".jpg", image)  # save frame as JPEG file
        success, image = vidcap.read()
        if success:
            count += 1
            if count % 100 == 0:
                print('Finish frame: ', count)
                # time.sleep(1)
    print("Finish all ", count, " images")


def convert_h36m_mp4_to_image(base_path, each_x_frame=1):
    subjects = ['S1', 'S5', 'S6', 'S7', 'S8']
    # subjects = ['S1', 'S5', 'S6', 'S7', 'S8', 'S9', 'S11']
    for subject in subjects:
        inpath_base = base_path+subject+"/Videos"
        outpath_base = base_path+subject+"/Images"
        if not os.path.exists(outpath_base):
            os.makedirs(outpath_base)
        videos = os.listdir(inpath_base)
        for video in videos:
            inpath = inpath_base + "/" + video
            outpath = outpath_base + "/" + video[:-4]
            if not os.path.exists(outpath):
                os.makedirs(outpath)
            outpath = outpath + "/frame_"
            convert_mp4_to_image(inpath, outpath, each_x_frame)

if __name__ == "__main__":
    path = "./"
    convert_h36m_mp4_to_image(path+'Human36m/')

The corresponding annotation download links are as follows:
H3WB annotations
and are placed in the datasets/json/ folder by default

2. Label format

Each json file refers to the following format, but not every json contains all these values.
Json structure (example):

XXX.json --- sample id --- 'image_path'
                        |
                        -- 'bbox' --- 'x_min'
                        |          |- 'y_min'
                        |          |- 'x_max'
                        |          |- 'y_max'
                        |
                        |- 'keypont_2d' --- joint id --- 'x'
                        |                             |- 'y'
                        |
                        |- 'keypont_3d' --- joint id --- 'x'
                                                      |- 'y'
                                                      |- 'z'
                        

The author also provides scripts for processing Json files, details: json_loader

3. Task analysis

The end-to-end 3D full human body pose estimation we are going to do is to give a GRB picture and give the full 3D pose of the human body in the picture. The official website also gives the analysis process:
1. Use RGBto3D_train.json for training and verification. It contains 80K image paths, bounding boxes and 2D keypoints.
2. It contains the same samples as 2Dto3D_train.json, so 2D keypoints can also be accessed, if necessary.
3. Use RGBto3D_test_img.json for testing on the leaderboard. It contains 20K image paths and bounding boxes. The image ids for this test set are shuffled.

verify

The author does not provide a validation set, but recommends using 5-fold cross-validation to get the mean and standard deviation.

Evaluate

You can save the predicted 3D whole body results as "XXto3D_pred.json", and provide a downloadable link to this email: [email protected]
with subject Test set evaluation request. The style file can refer to the following link:
json_test_samples


Summarize

This is the status of the H3WB survey. For more details, please refer to: benchmark.md

Guess you like

Origin blog.csdn.net/wqthaha/article/details/131649273