以下内容是今年4月份汇总的~
用关键词大概检索出8篇VQA相关论文。其中有两篇研究的是基于外部知识的视觉问答,一篇是场景文本视觉问答,这些都是提出的新模型。另外有两篇是在数据方面做工作,有一篇是鲁棒性研究,有一篇是在研究VQA模型的后门攻击,最后这篇是提出一种推理策略用于模型的训练。
LaTr: Layout-Aware Transformer for Scene-Text VQAhttps://openaccess.thecvf.com/content/CVPR2022/html/Biten_LaTr_Layout-Aware_Transformer_for_Scene-Text_VQA_CVPR_2022_paper.html 为场景文本视觉问答( ST-VQA)提出多模态模型架构。
A Thousand Words Are Worth More Than a Picture: Natural Language-Centric Outside-Knowledge Visual Question Answeringhttps://arxiv.org/abs/2201.05299 对基于外部知识的VQA(OK-VQA)任务,提出一种转换-检索-生成框架 (TRiG) 框架
SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering https://openaccess.thecvf.com/content/CVPR2022/html/Gupta_SwapMix_Diagnosing_and_Regularizing_the_Over-Reliance_on_Visual_Context_in_CVPR_2022_paper.html 提出SwapMix扰动技术,评估VQA模型鲁棒性与数据增强。 SwapMix: Diagnosing and Regularizing the Over-reliance on Visual Context in Visual Question Answering
MuKEA: Multimodal Knowledge Extraction and Accumulation for Knowledge-based Visual Question Answeringhttps://openaccess.thecvf.com/content/CVPR2022/html/Ding_MuKEA_Multimodal_Knowledge_Extraction_and_Accumulation_for_Knowledge-Based_Visual_Question_CVPR_2022_paper.html 提出MuKEA框架,用于完成基于外部知识的VQA(OK-VQA)任务
Grounding Answers for Visual Questions Asked by Visually Impaired Peoplehttps://openaccess.thecvf.com/content/CVPR2022/html/Chen_Grounding_Answers_for_Visual_Questions_Asked_by_Visually_Impaired_People_CVPR_2022_paper.html 提出了VizWiz-VQA-Grounding 数据集
Maintaining Reasoning Consistency in Compositional Visual Question Answeringhttps://openaccess.thecvf.com/content/CVPR2022/html/Jing_Maintaining_Reasoning_Consistency_in_Compositional_Visual_Question_Answering_CVPR_2022_paper.html 提出了一种类似对话的推理方法,用于在回答组合问题及其子问题时保持推理一致性。