Stability AI open source video generation model Stable Video Diffusion

Stability AI has released the open source video generation model Stable Video Diffusion, which is based on the company's existing Stable Diffusion text-to-image model and can generate videos by animating existing images.

Main features

  • text to video
  • image to video
  • 14 or 25 frames, 576 x 1024 resolution
  • Multi-view generation
  • frame interpolation
  • Support 3D scenes
  • Control cameras via LoRA

Stable Video Diffusion provides two models, namely SVD and SVD-XT. Among them, SVD converts still images into 14-frame 576x1024 video, while SVD-XT increases the number of frames to 24 under the same architecture.

Both can produce video at 3 to 30 frames per second. The white paper shows that the two models are initially trained on a dataset of millions of videos and then "fine-tuned" on smaller datasets on the order of hundreds of thousands to millions.

Stability AI says it is developing a new web platform, including a text-to-video interface. This tool will demonstrate the practical application of Stable Video Diffusion in advertising, education, entertainment and other fields.

Open source address

おすすめ

転載: www.oschina.net/news/267563/stable-video-diffusion