A complete tutorial on generating videos using Stable-Diffusion

This article is a complete guide on how to use cuda and Stable-Diffusion to generate video, cuda will be used to accelerate video generation, and Kaggle's TESLA GPU can be used to execute our model for free.

 #install the diffuser package
 #pip install --upgrade pip
 !pipinstall--upgradediffuserstransformersscipy
 
 #load the model from stable-diffusion model card
 importtorch
 fromdiffusersimportStableDiffusionPipeline
 
 fromhuggingface_hubimportnotebook_login

model loading

The model's weights are released under the CreateML OpenRail-M license. This is an open license that does not claim any rights to the generated output and prohibits us from knowingly producing illegal or harmful content. If you have questions about this license, you can read it here

https://huggingface.co/CompVis/stable-diffusion-v1-4

We first have to become a registered user of huggingface Hub and use an access token to get the code to work. We use a notebook, so we need to use notebook_login() to log in

After executing the code, the cell below will display a login screen, requiring the access token to be pasted.

 ifnot (Path.home()/'.huggingface'/'token').exists(): notebook_login()

Then just load the model

 model_id="CompVis/stable-diffusion-v1-4"
 device="cuda"
 pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe=pipe.to(device)

Show image generated from text

 %%time
 #Provide the Keywords 
 prompts= [
     "a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",
     "detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
     "symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
     "Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
     "portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",
     "Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
     "highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",
     "black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
 ]

show

 %%time
 #show the results
 images=pipe(prompts).images
 images
 
 #show a single result
 images[0]

The first text: a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece is imaged below

Display the resulting images together

 #show the results in grid
 fromPILimportImage
 defimage_grid(imgs, rows, cols):
     w,h=imgs[0].size
     grid=Image.new('RGB', size=(cols*w, rows*h))
     fori, imginenumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
     returngrid
 
 grid=image_grid(images, rows=2, cols=4)
 grid
 
 #Save the results
 grid.save("result_images.png")

If you have limited GPU memory (less than 4GB of GPU RAM available), make sure to load StableDiffusionPipeline with float16 precision instead of the default float32 precision as described above. This can be achieved by telling the diffuser to expect weights to be float16 precision:

 %%time
 importtorch
 pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
 pipe=pipe.to(device)
 pipe.enable_attention_slicing()
 
 images2=pipe(prompts)
 images2[0]
 
 grid2=image_grid(images, rows=2, cols=4)
 grid2

If you want to replace the noise scheduler, you also need to pass it to from_pretrained:

 %%time
 fromdiffusersimportStableDiffusionPipeline, EulerDiscreteScheduler
 
 model_id="CompVis/stable-diffusion-v1-4"
 # Use the Euler scheduler here instead
 scheduler=EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
 pipe=StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
 pipe=pipe.to("cuda")
 images3=pipe(prompts)
 images3[0][0]
 
 #save the final output
 grid3.save("results_stable_diffusionv1.4.png")

Look at this picture is the result of changing different schedulers

 #results are saved in tuple
 images3[0][0]
 
 grid3=image_grid(images3[0], rows=2, cols=4)
 grid3
 
 #save the final output
 grid3.save("results_stable_diffusionv1.4.png")

view all pictures

Create a video.

The basic operation has been completed, now let's use Kaggle to generate video

First enter the notebook settings: select GPU in the accelerator,

Then install the required packages with

 pipinstall-Ustable_diffusion_videos
 
 fromhuggingface_hubimportnotebook_login
 notebook_login()
 #Making Videos
 fromstable_diffusion_videosimportStableDiffusionWalkPipeline
 importtorch
 #"CompVis/stable-diffusion-v1-4" for 1.4
 
 pipeline=StableDiffusionWalkPipeline.from_pretrained(
     "runwayml/stable-diffusion-v1-5",
     torch_dtype=torch.float16,
     revision="fp16",
 ).to("cuda")
 #Generate the video Prompts 1
 video_path=pipeline.walk(
     prompts=['environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
             'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000'],
     seeds=[42,333,444,555],
     num_interpolation_steps=50,
     #height=1280,  # use multiples of 64 if > 512. Multiples of 8 if < 512.
     #width=720,   # use multiples of 64 if > 512. Multiples of 8 if < 512.
     output_dir='dreams',        # Where images/videos will be saved
     name='imagine',        # Subdirectory of output_dir where images/videos will be saved
     guidance_scale=8.5,         # Higher adheres to prompt more, lower lets model take the wheel
     num_inference_steps=50,     # Number of diffusion steps per image generated. 50 is good default
    
 )
 
 

Upscaling the image to 4k so that a video can be generated

 fromstable_diffusion_videosimportRealESRGANModel
 model=RealESRGANModel.from_pretrained('nateraw/real-esrgan')
 model.upsample_imagefolder('/kaggle/working/dreams/imagine/imagine_000000/', '/kaggle/working/dreams/imagine4K_00')

add music to video

Adding Music to Video Audio can be added to video by providing an audio file.

 %%capture
 !pipinstallyoutube-dl
 !youtube-dl-fbestaudio--extract-audio--audio-formatmp3--audio-quality0-o"music/thoughts.%(ext)s"https://soundcloud.com/nateraw/thoughts
 
 fromIPython.displayimportAudio
 
 Audio(filename='music/thoughts.mp3')

Here we use youtube-dl to download the audio (you need to pay attention to the copyright of the audio), and then add the audio to the video

 # Seconds in the song.
 audio_offsets= [7, 9]
 fps=8
 
 # Convert seconds to frames
 num_interpolation_steps= [(b-a) *fpsfora, binzip(audio_offsets, audio_offsets[1:])]
 
 
 video_path=pipeline.walk(
     prompts=['blueberry spaghetti', 'strawberry spaghetti'],
     seeds=[42, 1337],
     num_interpolation_steps=num_interpolation_steps,
     height=512,                            # use multiples of 64
     width=512,                             # use multiples of 64
     audio_filepath='music/thoughts.mp3',    # Use your own file
     audio_start_sec=audio_offsets[0],       # Start second of the provided audio
     fps=fps,                               # important to set yourself based on the num_interpolation_steps you defined
     batch_size=4,                          # increase until you go out of memory.
     output_dir='dreams',                 # Where images will be saved
     name=None,                             # Subdir of output dir. will be timestamp by default
 )

You can find the code for this article here:

https://avoid.overfit.cn/post/781a2bd8a4534f7cb2d223c141d37df8

By Bob Rupak Roy

Guess you like

Origin blog.csdn.net/m0_46510245/article/details/128755685