Video Question Answering and Reasoning-Thesis Research


Update time-2019.12 first draft


0. Introduction

The first step in learning VQA -pre-thesis research. Investigate the publication of papers at major conferences in recent years to understand the progress in this direction, including CVPR, ICCV, ECCV, ACM MM, and AAAI . After that, I am ready to summarize the commonly used data sets and classic methods.

1. ACM MM

ACM MM is a major international conference in the field of computer science and technology multimedia, focusing on the integration and processing of multi-angle information generated by different digital media. The VQA is part of its multimedia content understanding of the subject there (Understanding Multimedia Content ) The Vision and Language branch.

1.1 ACM MM 2019

  • There are 5 incomplete statistics (including Video / Visual Question Answer)
Essay topic Author
Multi-interaction Network with Object Relation for VideoQA Zhejiang University
Learnable Aggregating Net with Divergent Loss for VideoQA University of Electronic Science and Technology
Question-Aware Tube-Switch Network for VideoQA University of Science and Technology of China
CRA-Net: Composed Relation Attention Network for Visual QA University of Electronic Science and Technology
Erasing-based Attention Learning for Visual QA Institute of Automation, Chinese Academy of Sciences

1.2 ACM MM 2018

  • There are 4 incomplete statistics (including Video / Visual Question Answer)
Essay topic Author unit
Explore Multi-Step Reasoning in Video Question Answering Tianjin University
Fast Parameter Adaptation for Few-shot Image Captioning and Visual Question Answering Southern University of Science and Technology
Object-Difference Attention: A Simple Relational Attention for Visual Question Answering Beijing University of Posts and Telecommunications
Enhancing Visual Question Answering Using Dropout Institute of Automation, Chinese Academy of Sciences

1.3 ACM MM 2017

  • There are 4 incomplete statistics (including Video / Visual Question Answer)
Essay topic Author unit
VideoQA via Hierarchical Dual-Level Attention Network Learning Zhejiang University
VideoQA via Gradually Refined Attention over Appearance and Motion Zhejiang University

2. CVPR

CVPR stands for Conference on Computer Vision and Pattern Recognition, and the Chinese name is International Conference on Computer Vision and Pattern Recognition, which is usually held around June every year.

2.1 CVPR 2019

  • There are 12 incomplete statistics (including Video / Visual Question Answer), but the video-based ones seem to be one
Essay topic Author unit
Heterogeneous Memory Enhanced Multimodal Attention Model for VideoQA Jingdong Research Institute
MUREL: Multimodal Relational Reasoning for Visual Question Answering
OK-VQA: A Visual Question Answering Benchmark Requiring External Knowledge
Deep Modular Co-Attention Networks for Visual Question Answering
Visual Question Answering as Reading Comprehension
Dynamic Fusion With Intra- and Inter-Modality Attention Flow for Visual Question Answering
Cycle-Consistency for Robust Visual Question Answering
GQA: A New Dataset for Real-World Visual Reasoning and Compositional Question Answering
Progressive Attention Memory Network for Movie Story Question Answering
Transfer Learning via Unsupervised Task Discovery for Visual Question Answering
Explicit Bias Discovery in Visual Question Answering Models
Answer Them All! Toward Universal Visual Question Answering Models

2.2 CVPR 2018

  • 不完全统计有 15 篇(包括Video / Visual Question Answer),但是基于视频的好像就一篇
论文题目 作者单位
Motion-Appearance Co-Memory Networks for Video Question Answering
* Tips and Tricks for Visual Question Answering: Learnings From the 2017 Challenge
Don’t Just Assume; Look and Answer: Overcoming Priors for Visual Question Answering
Learning Answer Embeddings for Visual Question Answering
Cross-Dataset Adaptation for Visual Question Answering
Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering
Improved Fusion of Visual and Language Representations by Dense Symmetric Co-Attention for Visual Question Answering
Visual Question Generation as Dual Task of Visual Question Answering
Focal Visual-Text Attention for Visual Question Answering
Visual Question Answering With Memory-Augmented Networks
Visual Question Reasoning on General Dependency Tree
Differential Attention for Visual Question Answering
Learning Visual Knowledge Memory Networks for Visual Question Answering
IVQA: Inverse Visual Question Answering
Customized Image Narrative Generation via Interactive Visual Question Generation and Answering

2.3 CVPR 2017

  • 不完全统计有 9 篇(包括Video / Visual Question Answer),没有基于视频的
论文题目 作者单位
Graph-Structured Representations for Visual Question Answering
Knowledge Acquisition for Visual Question Answering via Iterative Querying
The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions
TGIF-QA: Toward Spatio-Temporal Reasoning in Visual Question Answering
End-To-End Concept Word Detection for Video Captioning, Retrieval, and Question Answering
Empirical Evaluation of Visual Question Answering for Novel Objects
Multi-Level Attention Networks for Visual Question Answering
A Dataset and Exploration of Models for Understanding Video Data Through Fill-In-The-Blank Question-Answering
Making the v in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering

3.3 CVPR 2016

  • 不完全统计有 8 篇(包括Video / Visual Question Answer),没有基于视频的,而且看起来是刚起步
论文题目 作者单位
Stacked Attention Networks for Image Question Answering
Image Question Answering Using Convolutional Neural Network With Dynamic Parameter Prediction
Where to Look: Focus Regions for Visual Question Answering
Ask Me Anything: Free-Form Visual Question Answering Based on Knowledge From External Sources
MovieQA: Understanding Stories in Movies Through Question-Answering
Answer-Type Prediction for Visual Question Answering
Visual7W: Grounded Question Answering in Images
Yin and Yang: Balancing and Answering Binary Visual Questions

3. ICCV

ICCV 全称 International Conference on Computer Vision, 中文名为国际计算机视觉大会,每两年在全世界范围内召开一次,录用率比较低,所以在业内评价较高,是三大CV顶会中公认级别最高的。

3.1 ICCV 2019

  • 不完全统计有 5 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Compact Trilinear Interaction for Visual Question Answering
Why Does a Visual Question Have Different Answers?
Scene Text Visual Question Answering
Multi-Modality Latent Interaction Network for Visual Question Answering
Relation-Aware Graph Attention Network for Visual Question Answering

3.2 ICCV 2017

  • 不完全统计有 6 篇(包括Video / Visual Question Answer)
论文题目 作者单位
Learning to Reason: End-To-End Module Networks for Visual Question Answering
Structured Attentions for Visual Question Answering
Multi-Modal Factorized Bilinear Pooling With Co-Attention Learning for Visual Question Answering
An Analysis of Visual Question Answering Algorithms
MUTAN: Multimodal Tucker Fusion for Visual Question Answering
MarioQA: Answering Questions by Watching Gameplay Videos

3.3 ICCV 2015

  • 听名字感觉像是第一篇
论文题目 作者单位
VQA: Visual Question Answering

4. AAAI

发布了20 篇原创文章 · 获赞 1 · 访问量 514

Guess you like

Origin blog.csdn.net/qq_41341454/article/details/103569017