Interpretation of Stable Video Diffusion: Detailed interpretation of data cleaning technology in video generation tasks

Diffusion Models video generation-blog summary

Foreword: Stable Video Diffusion has been open source for more than a week. The technical report "Stable Video Diffusion: Scaling Latent Video Diffusion Models to Large Datasets" describes the data cleaning part in great detail. Although there is no open source source code, bloggers are trying to reproduce the operations. This blog first sorts out the data cleaning part of Stable Video Diffusion.

Disadvantages of the original collected data set

(1) Generative video models are sensitive to motion inconsistencies, such as cuts that are often included in raw and unprocessed video data.

(2) The impact of subtitles. Ideally each video has multiple subtitles corresponding to it.

Cascading clips

Three cutters were used running at different frame rates and different thresholds to detect sudden changes and slow changes such as fading.

key frame clipping

Extracts the timestamps of keyframes in the source video and captures detected cuts to the nearest keyframe timestamp that does not intersect the detected cut.

Light flows

Guess you like

Origin blog.csdn.net/qq_41895747/article/details/134547907