Wensheng’s SOTA work in the field of video Show-1: Papers and code interpretation

Diffusion Models video generation-blog summary

Foreword:The recent text-to-video paper Show-1 achieved first place in the FVD and CLIPSIM indicators on the MSR-VTT evaluation data set, and the FID indicator Second place on the list. Using a hybrid model method that combines pixel-based VDM and latent space-based VDM for text-to-video generation can not only achieve high generation indicators, but also greatly reduce inference resource consumption. This blog explains the paper and code in detail.

Table of contents

Contribution overview

Detailed explanation of method

Guess you like

Origin blog.csdn.net/qq_41895747/article/details/133763751