X2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks论文笔记 - 代码天地

X2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks论文笔记

企业开发 2023-12-17 05:16:08 阅读次数: 0

Title：X2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks

1. Motivation

CLIP这一类方法只能进行图片级别的视觉和文本对齐；

也有一些方法利用预训练的目标检测器进行目标级别的视觉和文本对齐，但是只能编码目标内部的特征，无法有效表达多目标上下文关联；

本文致力于进行多粒度（objects, regions, and images）的视觉文本对齐预训练任务；

2. 模型结构

在这里插入图片描述

3. 损失函数

3.1 contrastive loss

文本特征和视觉特征之间的相似性定义：

在这里插入图片描述
3. vision-to-text similarity

在这里插入图片描述
4. text-to-vision similarity

5. GT：one-hot

6. cross-entropy loss

在这里插入图片描述

3.2 matching loss

For each visual concept in a mini-batch, we sample an in-batch hard negative text by following $p^{v2t}(V)$ . （与当前视觉特征越接近的文本越可能被采样）
We also sample one hard negative visual concept for each text.
put the pairs as inputs for the fusion module, and then we use xcls, the output [CLS] embedding of the fusion module, to predict the matching probability $p^{match}$ , and the loss is:

3.3 masked language modeling loss (MLM)

在这里插入图片描述

3.4 bbox loss

在这里插入图片描述

猜你喜欢

转载自blog.csdn.net/xijuezhu8128/article/details/132809885

X2-VLM: All-In-One Pre-trained Model For Vision-Language Tasks论文笔记

【论文解读】One Teacher is Enough? Pre-trained Language Model Distillation from Multiple Teachers

Enriching Pre-trained Language Model with Entity Information for Relation Classification 论文研读

论文阅读 | Pre-trained Models for Natural Language Processing: A Survey

【论文笔记】Enhancing Pre-Trained Language Representations with Rich Knowledge for MRC

【论文笔记】MacBert：Revisiting Pre-trained Models for Chinese Natural Language Processing

【论文笔记】BLIP: Bootstrapping Language-Image Pre-training forUnified Vision-Language Understanding and

ZSSeg: A Simple Baseline for Open-Vocabulary Semantic Segmentation with Pre-trained Vision-language

BioBERT: a pre-trained biomedical language representation model for biomedical text mining

[文献阅读]——AMBERT: A PRE-TRAINED LANGUAGE MODEL WITH MULTI-GRAINED TOKENIZATION

【论文笔记】VLMO: Unified Vision-Language Pre-Training with Mixture-of-Modality-Experts

文献阅读笔记 # CodeBERT: A Pre-Trained Model for Programming and Natural Languages

【论文笔记】BEIT 3 ——Image as a Foreign Language: BEIT Pretraining forAll Vision and Vision-Language Tasks

论文阅读 | ACL2019 Exploring Pre-trained Language Models for Event Extraction and Generation

《Seeing Out of tHe bOx: End-to-End Pre-training for Vision-Language Representation Learning》—论文笔记

论文笔记：Multilingual Multimodal Pre-training for Zero-Shot Cross-Lingual Transfer of Vision-Language Mo

论文笔记：COOKIE: Contrastive Cross-Modal Knowledge Sharing Pre-training for Vision-Language Representati

Using pre-trained word embeddings in a Keras model

论文阅读：Pre-trained Models for Natural Language Processing: A Survey 综述：自然语言处理的预训练模型

Pre-trained Models for Natural Language Processing: A Survey

【计算机视觉】Vision and Language Pre-Trained Models算法介绍合集（三）

跨模态检索论文阅读：(PTP)Position-guided Text Prompt for Vision-Language Pre-training

Pre-trained Convolutional Neural Network学习笔记

【论文笔记】Text Detoxification using Large Pre-trained Neural Models

论文阅读9-Fine-tuning Pre-Trained Transformer Language Models to(远程监督关系抽取,ACL2019,GPT,长尾关系,DISTRE）

跨模态检索论文阅读：Multi-Grained Vision Language Pre-Training: Aligning Texts with VisualConcepts(X-VLM)

论文笔记 --《Unified Language Model Pre-training for Natural Language Understanding a

CPM:A large-scale generative chinese pre-trained lanuage model

LLMs：《GLM-130B: AN OPEN BILINGUAL PRE-TRAINED MODEL》翻译与解读

论文笔记：UC2: Universal Cross-lingual Cross-modal Vision-and-Language Pre-training

今日推荐

火速冲上 GitHub 热榜 —— 开源编程语言、框架哪有这么可爱？

北京人形机器人创新中心发布全球首个纯电驱拟人奔跑的全尺寸人形机器人“天工”

LFOSSA 源来如此公开课 | 掌握云原生未来：CNCF 认证全面攻略与备考秘籍

国产云输入法——仅华为无云端数据上传安全问题

开源日报 | 工业开源项目OGG 1.0；姐姐，你要和我一起配置火狐吗；苹果AI遥遥落后？Fedora 40

开放签电子签章：停止新增，优化体验，前进更进（五一假期前工作）

开源日报 | 中学生开源前端动画引擎；全球首个Llama3 8B中文版开源模型；联想电脑恐出局；Linus讽刺AI炒作

周排行

浏览器对同一域名进行请求的最大并发连接数

React Hook之自定义Hook

【转】MyBatis缓存机制

-Java-泛型

自动化测试常用脚本-发送邮件

LeetCode#859: Buddy Strings

java、Python处理字符串

第二篇の博客

Hadoop伪分布式环境安装

SQL Server进阶（十一）临时表、表变量

每日归档

更多

2024-04-27(56)

2024-04-26(39)

2024-04-25(22)

2024-04-24(36)

2024-04-23(26)

2024-04-22(39)

2024-04-21(0)

2024-04-20(6)

2024-04-19(5)

2024-04-18(0)