论文阅读：《 Lip Reading Sentences in the Wild》 - 代码天地

论文阅读：《 Lip Reading Sentences in the Wild》

其他 2018-10-31 06:51:23 阅读次数: 0

论文：https://arxiv.org/abs/1611.05358
原文：http://www.hankcs.com/nlp/cs224n-lip-reading.html

唇语翻译

将视频处理为以嘴唇为中心的图片序列，给或不给语音，预测正在讲的话。

hankcs.com 2017-06-30 下午3.16.19.png

这些数据可能来自新闻直播：

hankcs.com 2017-06-30 下午3.16.41.png

动画演示：

这里唇语和语音的识别、卡拉OK效果式的对齐，都是模型自动完成的。

架构

hankcs.com 2017-06-30 下午3.40.00.png

视觉和听觉两个模块或者混合交火或者单独使用，每次输出一个字母。

视觉

取嘴唇时序上的sliding window，先喂给CNN，再喂给LSTM，生成一个output vector$s$：

hankcs.com 2017-06-30 下午3.42.20.png

听觉

类似地，取音频上的窗口分片：

hankcs.com 2017-06-30 下午3.44.03.png

Attention与Spell

将上述两个LSTM输出的output states送入一个带两个attention拓展的LSTM：

hankcs.com 2017-06-30 下午3.48.34.png

Curriculum Learning

hankcs.com 2017-06-30 下午3.52.37.png

通常训练seq2seq模型时喂进去的是完整的句子，但Curriculum Learning的手法是，一次只喂几个单词，逐步增长。这样可以加快收敛速度，降低过拟合。

Scheduled Sampling

hankcs.com 2017-06-30 下午5.03.54.png

通常训练递归模型的时候，使用的是前一个时刻的“标准答案”的one-hot向量，而这里根据前一个时刻的预测结果采样，让训练和测试统一起来。

数据集

hankcs.com 2017-06-30 下午5.06.20.png

来自BBC新闻的五千个小时的视频，对齐字幕，做了嘴唇位置等预处理。

结果

hankcs.com 2017-06-30 下午5.11.40.png

有趣之处在于，他们将模型效果与专业做唇语翻译的公司做了对比，发现比专业人士还要准，而且错误率低了20个百分点。（竟然还有公司专门做这个）

在同时输入语音和唇语的情况下，错误可以做到更低。

猜你喜欢

转载自blog.csdn.net/u011239443/article/details/83417820

论文阅读：《 Lip Reading Sentences in the Wild》

wav2lip：Accurately Lip-syncing Videos In The Wild

Towards Accurate Multi-person Pose Estimation in the Wild 论文阅读

DensePose:Dense Human Pose Estimation In The Wild 论文阅读笔记

论文：DensePose: Dense Human Pose Estimation In The Wild 阅读笔记

Paper Reading: Pose-Aware Face Recognition in the wild

【paper reading】SfSNet: Learning Shape, Reflectance and Illuminance of Faces in the Wild

【论文阅读】Efficient Reading of Papers in Science and Technology

A Convolutional Neural Network for Modelling Sentences阅读笔记

Distributed Representations of Sentences and Documents阅读笔记

论文笔记-Generating Sentences by Editing Prototypes

论文阅读:《Towards accurate multi-person pose estimation in the wild》CVPR 2017

论文阅读CTD+TLOC_Detecting Curve Text in the Wild_New Dataset and New Solution

论文阅读CENet-Detecting Text in the Wild with Deep character Embedding Network

【论文阅读笔记】Towards Accurate Multi-person Pose Estimation in the Wild

论文阅读：Towards Stable Test-time Adaptation in Dynamic Wild World

【论文阅读】ConGNN:Context-consistent cross-graph neural network for group emotion recognition in the wild

【论文阅读】Group Emotion Recognition in the Wild using Pose Estimation and LSTM Neural Networks

ICDAR2017 Competition on Reading Chinese Text in the Wild(RCTW-17) 介绍

Paper reading: High-Fidelity Pose and Expression Normalization for Face Recognition in the Wild（HPEN）

论文阅读——Gated-Attention Readers for Machine Reading Comprehension

阅读论文《Difficulty Controllable Generation of Reading Comprehension Questions》

论文阅读 | A Robust Adversarial Training Approach to Machine Reading Comprehension

English Reading(英语阅读)

A Convolutional Neural Network for Modelling Sentences (DCNN) 阅读笔记

《DensePose: Dense Human Pose Estimation In The Wild》阅读笔记

【论文代码调测】A Convolutional Neural Network for Modelling Sentences

论文阅读笔记（二十三）【ECCV2018】：Robust Anchor Embedding for Unsupervised Video Person Re-Identiﬁcation in the Wild

【论文笔记】Recursive Recurrent Nets with Attention Modeling for OCR in the Wild

《Learning from Synthetic Data for Crowd Counting in the Wild》论文笔记

今日推荐

NetBSD 禁止提交由 AI 生成的代码

Apache Doris 2.0.10 版本正式发布！

开源日报 | 大模型开战；大模型独角兽被曝卖身；周鸿祎建议谷歌开源所有产品；最大开源AI社区提供1000万美元共享GPU

开源日报 | Chrome内置Gemini的意义不在于Gemini；中国AI追随之路的五大误区；ECharts创始人“下海”养鱼；谷歌I/O开发者大会什么都有，只是没有惊喜

微软回应中国区AI团队“打包赴美”传闻

基于大语言模型的开源知识库问答系统 MaxKB GitHub Star 数量突破 5,000 个！

周排行

static方法和非static方法的区别（java）

如何查找计算机专业paper

java.lang.ClassFormatError: Incompatible magic value 0 in class file com/sitecha

跳跃游戏II

stm32_之【建立工程】

TeaWeb v0.0.9 发布，统计底层优化、主机监控功能改进

事件分发 -----控制字体大小

JavaScript DOM练习（动态表格添加） December 25，2019

JSF Scope & CDI

实现从零搭建一个登录注册页面（附源代码）

每日归档

更多

2024-05-19(0)

2024-05-18(4)

2024-05-17(34)

2024-05-16(6)

2024-05-15(24)

2024-05-14(0)

2024-05-13(18)

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)