[AI Generated Video Tool] Alibaba launches an AI tool to generate images and videos. It is free and available in China without any restrictions. It is much better than GEN2.

Hello everyone, my name is Long Yi, and I focus on sharing AI light sideline projects. Today I will share with you an open source AI tool for image generation and video recently launched by Alibaba. It is currently available for free, with no limit on the number of times it can be used. The effect is very good. I have to say it is much more fragrant than RunwayGen2.

It can generate videos with similar targets and the same semantics based on the static images and text input by the user. The generated videos have the characteristics of high definition (1280 * 720), wide screen (16:9), coherent timing, and good texture.

Insert image description here

The name of this project is I2VGen-XL, a basic model for high-definition video generation developed by Alibaba DAMO Academy. It is designed to solve the task of generating high-definition videos based on input images. The generated video also supports secondary modification and high-definition. If you are not satisfied, you can regenerate it with a few more clicks. If you are not satisfied with the content, you can also enter prompt words to adjust the video content, camera movement, movement direction, etc., and output high-definition video.

The specific principle is as follows. The total number of parameters is about 3.7 billion. In summary, it is divided into two stages, which is to generate a picture sequence by inputting pictures, and then fine-tune it through text and generate high-definition pictures.

Insert image description here

The following shows a specific generated case. Since videos cannot be placed in the article, I chose to insert gif images, which will greatly reduce the display effect. If you want to have a complete high-definition, you can go to my video account to watch the latest video introduction. The left is the original image, and the right is the generated effect.

Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here
Insert image description here

It also supports self-deployment, but the requirements for equipment are relatively high. I2VGen-XL contains 2 models: image generation video model MS-Image2Video and video generation video model MS-Vid2Vid.

MS-Image2Video is built on Stable Diffusion. As shown in the schematic diagram given, spatio-temporal modeling is performed in the latent space through a specially designed spatio-temporal UNet and the final video is reconstructed through the decoder.

Run under the environment configuration of 1*A100 (it can be run on a single card, and the video memory for graphing video model requires 20G, and the video memory for video generation requires 28G)

torch2.0.1+cu117,python>=3.8

Install miniconda

wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh

Keep [ENTER], the last option is yes

sh Miniconda3-latest-Linux-x86_64.sh

conda virtual environment setup

conda create --name ms-sft python=3.8
conda activate ms-sft

Install the latest ModelScope

pip install "modelscope" --upgrade -f https://pypi.org/project/modelscope/

Make sure your system has the ffmpeg command installed. If not, you can install it with the following command

sudo apt-get update && apt-get install ffmpeg libsm6 libxext6  -y

Install dependent libraries

pip install xformers==0.0.20
pip install torch==2.0.1
pip install torchsde
pip install open_clip_torch>=2.0.2
pip install opencv-python-headless
pip install opencv-python 
pip install einops>=0.4
pip install rotary-embedding-torch
pip install fairscale 
pip install scipy
pip install imageio
pip install pytorch-lightning

Download the model, link the model and reply to the Ai video in the background to get it.

Use the following code to download and infer the model.

The first step: Tusheng Video (required video memory card 20G)

from modelscope.pipelines import pipeline
from modelscope.outputs import OutputKeys

pipe = pipeline(task="image-to-video", model='damo/Image-to-Video', model_revision='v1.1.0')

IMG_PATH: your image path (url or local file)

IMG_PATH = './example.png'
output_video_path = pipe(IMG_PATH, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
print(output_video_path)

Step 2: Increase the video resolution (required 28G of video memory card)

pipe =pipeline(task="video-to-video", model='damo/Video-to-Video', model_revision='v1.1.0')

# VID_PATH: your video path
# TEXT : your text description
VID_PATH = './output.mp4'
TEXT = 'A lovely little fox is among the flowers.'
p_input = {
    
    
            'video_path': VID_PATH,
            'text': TEXT
        }

output_video_path = pipe(p_input, output_video='./output.mp4')[OutputKeys.OUTPUT_VIDEO]
print(output_video_path)

If you don’t want to deploy and want to have sex for nothing, please reply to the Ai video in the background of the official Z account [Long Yi’s Programming Life] to get the experience link.

For more interesting Ai information, check out my previous works!

Okay, the content sharing ends here. I am Long Yi, and I will continue to share AI+ self-media information with you. Please pay more attention and like it!

Guess you like

Origin blog.csdn.net/weixin_43658159/article/details/132587001