image caption笔记（四）：Image Captioning with Semantic Attention - 代码天地

image caption笔记（四）：Image Captioning with Semantic Attention

编程语言 2018-12-03 11:19:47 阅读次数: 0

文章来自cvpr2016

image caption常见的方法包括top-down和bottom-up。Top-down直接做图像到文本的端到端学习，而bottom-up先抽取出一些关键词，再把关键词组合成一句话。Top-down比较难处理一些图像的细节，因为它将整张图片作为输入。而bottom-up不容易做端到端的学习，将抽取的特征组成一句话也很困难。因此文章在端到端模型中引入了attention机制，结合了两种方法的优势。

与《show,attend and tell》相比，同样是引入了注意力机制，不同点在于：

在《show,attend and tell》中，注意力是以固定的分辨率在空间上建模的。在每次重复迭代时，该算法计算一组与预定义的空间位置相对应的注意权值。相反，本文可以在图像中任何分辨率的任何地方使用概念。事实上，本文甚至可以使用在图像中没有直接视觉存在的概念。

在RNN的输入、输出均引入了注意力机制。

关于属性的选取，共有两种方法。第一种没看懂，第二种就是在caption中挑取频率高的单词作为属性。

也是只在起始时刻输入图像特征。后续不再使用。

关键在于输入和输出时刻的两个注意力模型。

先说输入时刻的注意力模型

首先是根据前一个单词与不同属性的相关性分配权重（都是从词汇库中得出的所以都用y表示）用E降一下维度

本文使用双线性函数表述模型相关性：

然后组合属性和前一时刻的输出得到当前时刻的输入

输出的注意力机制与此类似，先计算权重

然后根据当前时刻的隐藏态和组合属性计算输出的softmax概率值

最后是损失函数，包括三部分，后两部分是对权重的限制。第一部分就是希望预测的vocab大小的softmax概率向量中，gt位置的概率尽可能的接近1.

猜你喜欢

转载自blog.csdn.net/zlrai5895/article/details/84669031

image caption笔记（四）：Image Captioning with Semantic Attention

《Image Captioning with Semantic Attention》笔记

论文笔记：Image Captioning with Semantic Attention

【Image captioning】Attention on Attention for Image Captioning之训练与调试

Coherent Semantic Attention for Image Inpainting

Paying More Attention to Saliency: Image Captioning with Saliency and Context Attention

（四十二）：Aligning Linguistic Words and Visual Semantic Units for Image Captioning

《SCA-CNN：Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning》论文笔记

论文笔记：Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning

论文笔记：Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

《Knowing When to Look: Adaptive Attention via A Visual Sentinel for Image Captioning》笔记

《SCA-CNN: Spatial and Channel-wise Attention in Convolutional Networks for Image Captioning》论文笔记

《Unsupervised Image Captioning》阅读笔记

《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》

Knowing When to Look: Adaptive Attention via a Visual Sentinel for Image Captioning部分代码

论文：Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering-阅读总结

（五十九）：Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

CVPR 2018 Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering

【Image captioning】S2 Transformer for Image Captioning 实现流程

【Image captioning】 Collaborative Transformer for Image Captioning实现流程

【image captioning】CaMEL: Mean Teacher Learning for Image Captioning（实现流程）

Image Captioning论文合辑

Unpaired/Partially/Unsupervised Image Captioning

23 image captioning，visula question

《Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering》——2018 CVPR论文笔记

论文笔记：Contrastive Learning for Image Captioning

[ Continuously Update ] The Paper List of Image Captioning

Paper Reading - Convolutional Image Captioning ( CVPR 2018 )

Image Captioning 领域论文整理（转载）

Google开源库Image Captioning部署记录

今日推荐

《美国对全球网络空间安全与发展的威胁和破坏》报告发布

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

周排行

Python环境安装与基础语法（1）——计算机基础知识

IMU预积分

ADAS中的LDW、FCW、BSD、LCA、ACC、AEB、APA、DMS代表的含义

B站笔试两道题

skyeye arm 硬件虚拟机环境的搭建

Web前端静态页面示例

数组-合并排序数组 II-简单

springcloud之版本问题启动报错

面向对象-------------匿名对象(六)

输入URL到页面呈现中间发生了什么？

每日归档

更多

2024-04-30(1)

2024-04-29(40)

2024-04-28(0)

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)