The application of deep learning in the field of human action recognition: open source tools, data set resources and Tendong Cloud GPU computing power are indispensable

Human action recognition detection is a technology that uses computer vision and deep learning technology to monitor and analyze human posture and movements in real time. The technology aims to extract information about human postures, movements and behaviors from images or videos to allow for deeper recognition and understanding of human activities.

The basic steps of human action recognition detection include:

  1. Data collection: Collecting image or video data containing human movements, which can be done through cameras, depth sensors, or other sensors.

  2. Preprocessing: Preprocess the collected data, including image denoising, color adjustment, etc., to ensure the quality of the input data.

  3. Feature extraction: Extract key features in images or videos, such as the position, posture and other information of human joint points.

  4. Model training: Use a deep learning model, such as a convolutional neural network (CNN) or a recurrent neural network (RNN), to train the extracted features so that the model can recognize and learn different human actions.

  5. Real-time detection: Deploy the trained model to detect real-time images or videos and identify human postures and movements.

The above steps usually require the support of large-scale GPU computing.

In order to better support researchers and developers to study action recognition, detection, and classification technologies in depth, this article will introduce a series of related open source toolboxes, projects, and data set resources. The comprehensive use of these resources will provide comprehensive and powerful support for the development of action-related technologies.

toolbox

MMSkeleton

MMSkeleton is an open source video analysis toolbox based on skeleton action recognition released by the Multimedia Laboratory (MMLab) of the Chinese University of Hong Kong. It has leading capabilities in human skeleton recognition in videos, supports framework pre-training models, and provides multiple data sets. .

Open source address: https://github.com/open-mmlab/mmskeleton

MMAction2

MMAction2 is also an open source toolbox for video understanding based on PyTorch released by MMLab. It is an upgraded version of MMAction. It currently supports four mainstream video understanding tasks, namely Action Recognition, Skeleton based Action Recognition, Spatio-Temporal Action Detection and Temporal. Action Localization. MMAction2 supports 28 types of video understanding models and 22 types of video understanding data sets.

Open source address: https://github.com/open-mmlab/mmaction2

PYSKL

PYSKL is built on the basis of the open source project MMAction2, a toolbox for action recognition on skeleton data using PYTorch. The toolbox supports a variety of skeleton action recognition algorithms, including GCN- and CNN-based methods.

Open source address: https://github.com/kennymckormick/pyskl

data set

EPIC-KITCHENS-100

EPIC-KITCHENS-100 is a large-scale dataset about first-person activities in the kitchen. It is also an expanded version of the Epic-kitchens-55 dataset. The total video duration has increased from 55 hours to 100 hours, and contains more than 90,000 videos. action segments, 97 verb categories and 300 noun categories.

Paper link: https://arxiv.org/abs/2006.13256

Download address: https://epic-kitchens.github.io/2023

COIN

COIN is a large-scale video analysis data set jointly open sourced by Tsinghua University and Meitu. It contains 11,827 tutorial videos, covering 180 tasks in 12 fields in daily life. It can be used for research on video action temporal localization (temporal localization), video behavior analysis and understanding in complex scenes.

  • Paper link: https://arxiv.org/abs/1903.02874

  • Download address: https://coin-dataset.github.io/

HOLLYWOOD2

HOLLYWOOD2 is a human action video data set, which contains 3669 clipped videos. The total length of the video is approximately 20.1 hours, including 12 different types of human actions and 10 scenes. These video clips are from 69 Hollywood movies.

Actions such as: answering the phone, driving, eating, hugging, kissing, etc.

Scenes such as: outdoors, in cars, kitchens, offices, shopping malls, hotels, etc.

Download address: https://www.di.ens.fr/~laptev/actions/hollywood2/

UCF Sports

UCF Sports is a dataset mainly about sports, collected from TV news from BBC and ESPN, and contains 150 video clips with video resolution of 720x480.

运动种类为:Diving、Golf Swing、Kicking、Lifting 、Riding Horse、Running、SkateBoarding、Swing-Bench、Swing-Side、Walking。

Download address: https://www.crcv.ucf.edu/data/UCF_Sports_Action.php

UCF101

The UCF101 data set is collected from YouTube and is an expanded version of the UCF50 data set. It has increased from 50 action categories to 101 categories, with a total of 13,320 videos. These videos are all uploaded by users, including camera movements, various lighting conditions, Partial occlusion, low-quality frames and other characteristics.

In addition, this data set mainly includes 5 categories of actions: human-object interaction, simple body movements, human-human interaction, musical instrument playing, and sports.

Download address: https://www.crcv.ucf.edu/data/UCF101.php


As a leading computing power service provider, Trend Cloud is not only committed to providing flexible and cost-controllable computing power, but also has scalable storage solutions. In addition, Trend Cloud also provides users with rich and diverse data set resources, including large-scale action recognition data sets such as Kinetics-400 and UCF101.

For data sets exceeding 100GB likeKinetics-400, local downloading and training take a lot of time. However, on Tendong Cloud, users can use it with just one click, which greatly Improved user experience.

Overall, GPU computing power plays a crucial role in the development of AI technology. It not only promotes technological innovation, but also lays the foundation for the widespread application of AI technology in various fields. As GPU computing power continues to improve, we can expect to see more powerful and intelligent motion recognition technology, bringing a richer and more convenient experience to our lives.

Guess you like

Origin blog.csdn.net/m0_49711991/article/details/134929923