Summary of Multimodal Papers

review

  1. Multimodal Machine Learning: A Survey and Taxonomy
    paper URL: https://arxiv.org/pdf/1705.09406.pdf
    Chinese translation: Multimodal Machine Learning: A Survey and Taxonomy (multimodal review)
  2. Multimodal Learning with Transformers: A Survey
    paper URL: https://arxiv.org/pdf/2206.06488.pdf
    Chinese translation: ​300+ documents! A detailed explanation of the latest developments in Transformer-based multimodal learning (the content is incomplete, it is recommended to read the original text)
  3. A Survey of Open Dialogue Technology Research Summary
    : A Survey of Open Dialogue Systems Research
  4. A Survey of Research Progress in Natural Language Generation in Task-Based Dialogue Systems

tutorial

  1. Vision-Language Pretraining: Current Trends and the Future
    网址:https://vlp-tutorial-acl2022.github.io/

Model

  1. Transformer
    paper URL: Attention Is All You Need
  2. BERT
    paper website: BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding
    Li Mu video explanation: BERT paper intensive reading paragraph by paragraph [paper intensive reading]
  3. ViLT
    paper URL: ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
    Source code URL: https://github.com/dandelin/vilt
    bryanyzhu video explanation: ViLT paper intensive reading [paper intensive reading]
    personal notes: [paper & model explanation 】ViLT: Vision-and-Language Transformer Without Convolution or Region Supervision
  4. VL-BEiT
    paper URL: VL-BEiT: Generative Vision-Language Pretraining
    Related papers: BEiT: BERT Pre-Training of Image Transformers
  5. CLIP
    paper website: Learning Transferable Visual Models From Natural Language Supervision
    source code website: https://github.com/OpenAI/CLIP
    bryanyzhu video explanation: CLIP paper intensive reading paragraph by paragraph [paper intensive reading]
    personal notes: [paper & model explanation] CLIP ( Learning Transferable Visual Models From Natural Language Supervision)
  6. VideoBERT
    paper URL: VideoBERT: A Joint Model for Video and Language Representation Learning
    Source URL: https://github.com/ammesatyajit/VideoBERT
    Personal Notes: [Paper & Model Explanation] VideoBERT: A Joint Model for Video and Language Representation Learning
  7. Two-Stream Convolutional Networks for Action Recognition in Videos
    Paper URL: https://arxiv.org/abs/1406.2199
    Personal Notes: [Paper & Model Explanation] Two-Stream Convolutional Networks for Action Recognition in Videos

Guess you like

Origin blog.csdn.net/Friedrichor/article/details/126939715