1. Classification of Video Timing Action Recognition Algorithms
According to the working mode of the network, video timing action recognition algorithms can be roughly divided into four categories:
- Method using 2D convolution
- Using the method of 3D convolution
- double flow method
- The method of introducing VLAD
1.1 Method using 2D convolution
- "TSM: Temporal Shift Module for Efficient Video Understanding" algorithm detailed explanation
- "TEA: Temporal Excitation and Aggregation for Action Recognition" algorithm detailed explanation
- "TDN: Temporal Difference Networks for Efficient Action Recognition" paper detailed explanation
- "No frame left behind: Full Video Action Recognition" algorithm details
1.2 Method using 3D convolution
- "Learning Spatiotemporal Features with 3D Convolutional Networks" (C3D) algorithm details
- 《Rethinking Spatiotemporal Feature Learning: Speed-Accuracy Trade-offs in Video Classification》(S3D)算法详解
- Detailed Algorithm Explanation of "ECO: Efficient Convolutional Network for Online Video Understanding"
- "Learning Spatio-Temporal Representation with Pseudo-3D Residual Networks" (P3D) Algorithm Details
- Detailed explanation of "SlowFast Networks for Video Recognition" algorithm
- "X3D: Expanding Architectures for Efficient Video Recognition" algorithm detailed explanation
1.2 Dual flow method
- Detailed explanation of "Two-Stream Convolutional Networks for Action Recognition in Videos" (TSN) algorithm
- Detailed explanation of "Temporal Segment Networks: Towards Good Practices for Deep Action Recognition" (TSM) algorithm
1.3 The method of introducing VLAD
2. Introduction to common data sets
Sports-1M数据集介绍:
* 1.1 millions运动视频
* 487个视频类
UCF101数据集介绍:
* 13320个视频片段
* 9.5K训练,3.7K测试视频
* 视频帧大小320*240
* 总共101类,内容包含化妆刷牙、爬行、理发、弹奏乐器、体育运动五大类。
* 每类动作由25个人做动作,每人做4-7组
ActivatyNet数据库介绍
* 人类动作识别数据库
* v1.3版本中有19994段视频,包含200类
* 10024段视频为训练集,4926段视频为验证集,5044段视频为测试集
* 测试集label没有公开,一般就是使用验证集来作为测试集
HMDB51数据介绍
* 6766个视频
* 51个动作类别
* 内容包括人面部、肢体、和物体交互的动作这几大类
Kinetic-400 数据库介绍
* 240k训练视频,20k验证,35k测试
* 400类人类动作类别
* 内容为画画、大笑、拥抱、除草等
* 每个视频大约10秒
* 数据来源于YouTube
Kinetic-600 数据库介绍
* Kinetic-400数据库的扩展
* 600类人类动作类别
* 总共500k段视频
Charades 数据库介绍
* 9848段视频
* 157类室内日常行为
* 多标签
* 每个视频大约30s
For other video task introductions, please check-article <<Introduction to Mainstream Video Action Algorithm Tasks>>