《Stacked Cross Attention for Image-Text Matching》 - 代码天地

《Stacked Cross Attention for Image-Text Matching》

其他 2018-08-20 00:41:22 阅读次数: 0

ECCV 2018

主要思路：分别对文本和图像应用attention的机制，学习比较好的文本和图像表示，然后再在共享的子空间中利用hard triplet loss度量文本和图像之间的相似性。

图像特征：采用ResNet-101的Faster R-CNN网络对每一个图像产生k个目标区域，提取每一个目标对象的特征，嵌入矩阵变换为h维的vector

文本特征：文本的每一个word得到one-hot vector，embedding后为300维的vector，再用双向GRU得到h维的vector（bi-directional GRU）

计算每一个proposal vector和attended sentence vector之间的余弦距离，根据计算的余弦距离,再进行average polling

相似度（余弦相似度）：

average polling：

采用ResNet-101的Faster R-CNN网络对每一个图像产生多个proposal，提取每一个proposal（proposal vector，mean-pooled convolutional feature）和文本的每一个word的特征（bi-directional GRU），计算每一个word和proposal之间的余弦距离，根据计算的余弦距离，并根据权重形成image vector

同上

Loss Function

文章中用LogSumExp pooling (LSE)，average pooling (AVG)和Sum-Max（SM）等方法度量sentence vector与proposal vector和image vector与word vector的相似性，然后用hard triplet loss训练

总结

先前的工作简单地聚合所有可能的区域和单词对的相似性，而对较多和不太重要的单词或区域没有进行区分。在本文中，提出Stacked Cross Attention，使用图像区域和句子中的单词作为上下文来发现完整的潜在对齐，并推断出图像 - 文本的相似性。

猜你喜欢

转载自blog.csdn.net/qq_33373858/article/details/81509636

《Stacked Cross Attention for Image-Text Matching》

Stacked Cross Attention for Image-Text Matching

Deep Cross-Modal Projection Learning for Image-Text Matching

跨模态检索论文阅读：Discrete-continuous Action Space Policy Gradient-based Attention for Image-Text Matching

论文解读：Stacked Attention Networks for Image Question Answering

《Stacked Attention Networks for Image Question Answering》论文解读与实验

解读 IASM《Interactive Attention for Semantic Text Matching》

图解cross attention

self-attention和cross-attention

论文阅读图片和文本联合训练：IMAGEBERT: CROSS-MODAL PRE-TRAINING WITH LARGE-SCALE WEAK-SUPERVISED IMAGE-TEXT DATA

【科研】浅学Cross-attention？

Cross-Spectral Image Patch Matching by Learning Features of the Spatially Connected Patches 论文阅读

Self -Attention、Multi-Head Attention、Cross-Attention

【论文阅读】MCANet: Medical Image Segmentation with Multi-Scale Cross-Axis Attention

《Dual Attention Networks for Multimodal Reasoning and Matching》

CCNet:Criss-Cross Attention for Semantic Segmentation

「Medical Image Analysis」Note on Deep Stacked Transformations

StackGAN: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks 阅读及实现

《StackGAN:Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks》探析

StackGAN（2017）: Text to Photo-realistic Image Synthesis with Stacked Generative Adversarial Networks

文本匹配（Text Matching）

Stereo Matching文献笔记之（一）：《Cross-Scale Cost Aggregation for Stereo Matching》读后感

Stereo Matching文献笔记：《Cross-Scale Cost Aggregation for Stereo Matching》

【论文精读】ACVNet: Attention Concatenation Volume for Accurate and Efficient Stereo Matching

18-CCNet-Criss-Cross-Attention-for-Semantic-Segmentation

《CCNet：Criss-Cross Attention for Semantic Segmentation》论文笔记

多模态条件机制 Cross Attention 原理及实现

解决ModuleNotFoundError: No module named ‘diffusers.models.cross_attention‘

NIPS2019《Cross Attention Network for Few-shot Classification》

深度学习：cross-attention介绍以及与self-attention的区别

今日推荐

美国拟限制 AI 大模型出口中国和俄罗斯

苹果将与 OpenAI 达成协议，将 ChatGPT 应用于 iPhone

openKylin 社区生态委员会第六次会议圆满召开

阿里云正式发布通义千问 2.5

Python 3.13 发布首个 Beta：实验性自由线程模式和 JIT、改进交互式解释器

Stack Overflow 拿我的代码去训练 AI 大模型，还封了我的账号

Pop!_OS 的 COSMIC 桌面完成 App Store 上架工作

报告：Django 仍然是 74% 开发者的首选

《2024 年一季度互联网投融资运行情况》研究报告

15 年前上了“FFmpeg 耻辱柱”，今天他还得谢谢咱——腾讯QQPlayer一雪前耻？

TIOBE 5 月榜单：Fortran “复活”进入 Top 10

GCC 14.1 发布

周排行

curl的POST请求，封装方法

8.1.1. Integer Types

Java基础 Day05(个人复习整理)

Python - Django - 中间件 process_exception

小L的试卷

【Shell编程】（函数）判断用户是否存在

python(css样式)

spring ant path 匹配原则 - 【笔记】

《JavaScript与JScript从入门到精通》(美)James.Jaworski.中译本.扫描版.pdf

Eclipse运行带参数的java程序

每日归档

更多

2024-05-12(0)

2024-05-11(38)

2024-05-10(38)

2024-05-09(35)

2024-05-08(42)

2024-05-07(14)

2024-05-06(40)

2024-05-05(0)

2024-05-04(7)

2024-05-03(19)