【计算机科学】【2016】单目视频三维人体姿态估计的深度学习模型

在这里插入图片描述

本文为立陶宛维尔纽斯格迪米纳斯技术大学（作者：Agnė Grinciūnaitė）的硕士论文，共68页。

有一种视觉系统，它可以很容易地识别、跟踪人体的位置、运动和行为，而不需要任何额外的感知手段。这个系统拥有一个称为大脑的处理器，只经过几个月的训练就能称职地完成以上任务。通过更多的训练，它能够将获得的技能应用于更复杂的任务，例如理解所观察对象的个人态度、意图和情感状态。这个系统被称为人类，是迄今为止对今天的人工智能创造者最有启发性的艺术品。

令人印象深刻的是，复杂计算机视觉和机器学习的实现是最近才通过应用各种深度学习方法获得的。令人惊讶的是，深度神经网络如此之快地变得流行起来，不仅在研究界，而且在商业界也得到了广泛的应用。卷积神经网络以相当大的优势完成计算机视觉中的一些挑战，因此吸引了每个人的注意，从而产生重大影响。这些网络是由已知的神经生理学和认知功能所需的特性所激发的。

本文的目的是从观察者的角度探讨卷积神经网络处理人类在时空中感知他人位置的能力。采用一种新的三维卷积方法，从单目摄像机捕获的运动数据中提取有价值的特征，并直接回归到3D摄像机坐标空间中的关节位置。研究表明，这种神经网络能够在选定数据集上达到最先进的处理能力。所获得的结果指出，改进的算法实现可以用于真实世界的各类应用，如人机交互、增强和虚拟现实、机器人技术、监视、智能家庭等。

There exists a visual system which caneasily recognize and track human body position, movements and actions withoutany additional sensing. This system has the processor called brain and it iscompetent after being trained for some months. With a little bit more trainingit is also able to apply acquired skills for more complicated tasks such asunderstanding inter-personal attitudes, intentions and emotional states of theobserved moving person. This system is called a human being and is so far themost inspirational piece of art for today’s artificial intelligence creators.The most impressive results of complex computer vision and machine learningtasks were recently achieved by applying various deep learning methods. It isamazing how fast deep neural networks became popular and broadly used not onlyin research community but also in commercial world. The major impact was madeby convolutional neural networks being able to beat some challenges in computervision by quite a big margin and attract everybody’s attention. These networksare motivated by the known neurophysiology of the brain and its functionalproperties required for cognition. The goal of this thesis is to explore thecapabilities of convolutional neural network to deal with easily manageabletask for human-beings - perceiving other human’s location in spacetime from theperspective of the viewer. New approach of incorporating 3D convolutions to extractvaluable features from motion data captured by monocular video camera anddirectly regress to joint positions in 3D camera coordinate space is used. Thisresearch shows the ability of such a network to achieve state of the artresults on selected dataset. The achieved results imply that improvedrealization could possibly be used in real-world applications such ashuman-computer interaction, augmented and virtual reality, robotics,surveillance, smart homes, etc.

1 引言
2 理论基础
3 项目相关工作
4 数据集
5 三维卷积神经网络
6 实验及结果
7 结论

下载英文原文地址：

http://page5.dfpan.com/fs/4l7cajc2c2411229163/

更多精彩文章请关注微信号：在这里插入图片描述

【计算机科学】【2016】单目视频三维人体姿态估计的深度学习模型

猜你喜欢