This article is a complete guide on how to use cuda and Stable-Diffusion to generate video, cuda will be used to accelerate video generation, and Kaggle's TESLA GPU can be used to execute our model for free.
#install the diffuser package
#pip install --upgrade pip
!pipinstall--upgradediffuserstransformersscipy
#load the model from stable-diffusion model card
importtorch
fromdiffusersimportStableDiffusionPipeline
fromhuggingface_hubimportnotebook_login
model loading
The model's weights are released under the CreateML OpenRail-M license. This is an open license that does not claim any rights to the generated output and prohibits us from knowingly producing illegal or harmful content. If you have questions about this license, you can read it here
https://huggingface.co/CompVis/stable-diffusion-v1-4
We first have to become a registered user of huggingface Hub and use an access token to get the code to work. We use a notebook, so we need to use notebook_login() to log in
After executing the code, the cell below will display a login screen, requiring the access token to be pasted.
ifnot (Path.home()/'.huggingface'/'token').exists(): notebook_login()
Then just load the model
model_id="CompVis/stable-diffusion-v1-4"
device="cuda"
pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe=pipe.to(device)
Show image generated from text
%%time
#Provide the Keywords
prompts= [
"a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece by artgerm by wlop by alphonse muhca ",
"detailed portrait beautiful Neon Operator Girl, cyberpunk futuristic neon, reflective puffy coat, decorated with traditional Japanese ornaments by Ismail inceoglu dragan bibin hans thoma greg rutkowski Alexandros Pyromallis Nekro Rene Maritte Illustrated, Perfect face, fine details, realistic shaded, fine-face, pretty face",
"symmetry!! portrait of minotaur, sci - fi, glowing lights!! intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha, 8 k ",
"Human, Simon Stalenhag in forest clearing style, trends on artstation, artstation HD, artstation, unreal engine, 4k, 8k",
"portrait of a young ruggedly handsome but joyful pirate, male, masculine, upper body, red hair, long hair, d & d, fantasy, roguish smirk, intricate, elegant, highly detailed, digital painting, artstation, concept art, matte, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha ",
"Symmetry!! portrait of a sith lord, warrior in sci-fi armour, tech wear, muscular!! sci-fi, intricate, elegant, highly detailed, digital painting, artstation, concept art, smooth, sharp focus, illustration, art by artgerm and greg rutkowski and alphonse mucha",
"highly detailed portrait of a cat knight wearing heavy armor, stephen bliss, unreal engine, greg rutkowski, loish, rhads, beeple, makoto shinkai and lois van baarle, ilya kuvshinov, rossdraws, tom bagshaw, tom whalen, alphonse mucha, global illumination, god rays, detailed and intricate environment ",
"black and white portrait photo, the most beautiful girl in the world, earth, year 2447, cdx"
]
show
%%time
#show the results
images=pipe(prompts).images
images
#show a single result
images[0]
The first text: a couple holding hands with plants growing out of their heads, growth of a couple, rainy day, atmospheric, bokeh matte masterpiece is imaged below
Display the resulting images together
#show the results in grid
fromPILimportImage
defimage_grid(imgs, rows, cols):
w,h=imgs[0].size
grid=Image.new('RGB', size=(cols*w, rows*h))
fori, imginenumerate(imgs): grid.paste(img, box=(i%cols*w, i//cols*h))
returngrid
grid=image_grid(images, rows=2, cols=4)
grid
#Save the results
grid.save("result_images.png")
If you have limited GPU memory (less than 4GB of GPU RAM available), make sure to load StableDiffusionPipeline with float16 precision instead of the default float32 precision as described above. This can be achieved by telling the diffuser to expect weights to be float16 precision:
%%time
importtorch
pipe=StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe=pipe.to(device)
pipe.enable_attention_slicing()
images2=pipe(prompts)
images2[0]
grid2=image_grid(images, rows=2, cols=4)
grid2
If you want to replace the noise scheduler, you also need to pass it to from_pretrained:
%%time
fromdiffusersimportStableDiffusionPipeline, EulerDiscreteScheduler
model_id="CompVis/stable-diffusion-v1-4"
# Use the Euler scheduler here instead
scheduler=EulerDiscreteScheduler.from_pretrained(model_id, subfolder="scheduler")
pipe=StableDiffusionPipeline.from_pretrained(model_id, scheduler=scheduler, torch_dtype=torch.float16)
pipe=pipe.to("cuda")
images3=pipe(prompts)
images3[0][0]
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
Look at this picture is the result of changing different schedulers
#results are saved in tuple
images3[0][0]
grid3=image_grid(images3[0], rows=2, cols=4)
grid3
#save the final output
grid3.save("results_stable_diffusionv1.4.png")
view all pictures
Create a video.
The basic operation has been completed, now let's use Kaggle to generate video
First enter the notebook settings: select GPU in the accelerator,
Then install the required packages with
pipinstall-Ustable_diffusion_videos
fromhuggingface_hubimportnotebook_login
notebook_login()
#Making Videos
fromstable_diffusion_videosimportStableDiffusionWalkPipeline
importtorch
#"CompVis/stable-diffusion-v1-4" for 1.4
pipeline=StableDiffusionWalkPipeline.from_pretrained(
"runwayml/stable-diffusion-v1-5",
torch_dtype=torch.float16,
revision="fp16",
).to("cuda")
#Generate the video Prompts 1
video_path=pipeline.walk(
prompts=['environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000',
'environment living room interior, mid century modern, indoor garden with fountain, retro,m vintage, designer furniture made of wood and plastic, concrete table, wood walls, indoor potted tree, large window, outdoor forest landscape, beautiful sunset, cinematic, concept art, sunstainable architecture, octane render, utopia, ethereal, cinematic light, –ar 16:9 –stylize 45000'],
seeds=[42,333,444,555],
num_interpolation_steps=50,
#height=1280, # use multiples of 64 if > 512. Multiples of 8 if < 512.
#width=720, # use multiples of 64 if > 512. Multiples of 8 if < 512.
output_dir='dreams', # Where images/videos will be saved
name='imagine', # Subdirectory of output_dir where images/videos will be saved
guidance_scale=8.5, # Higher adheres to prompt more, lower lets model take the wheel
num_inference_steps=50, # Number of diffusion steps per image generated. 50 is good default
)
Upscaling the image to 4k so that a video can be generated
fromstable_diffusion_videosimportRealESRGANModel
model=RealESRGANModel.from_pretrained('nateraw/real-esrgan')
model.upsample_imagefolder('/kaggle/working/dreams/imagine/imagine_000000/', '/kaggle/working/dreams/imagine4K_00')
add music to video
Adding Music to Video Audio can be added to video by providing an audio file.
%%capture
!pipinstallyoutube-dl
!youtube-dl-fbestaudio--extract-audio--audio-formatmp3--audio-quality0-o"music/thoughts.%(ext)s"https://soundcloud.com/nateraw/thoughts
fromIPython.displayimportAudio
Audio(filename='music/thoughts.mp3')
Here we use youtube-dl to download the audio (you need to pay attention to the copyright of the audio), and then add the audio to the video
# Seconds in the song.
audio_offsets= [7, 9]
fps=8
# Convert seconds to frames
num_interpolation_steps= [(b-a) *fpsfora, binzip(audio_offsets, audio_offsets[1:])]
video_path=pipeline.walk(
prompts=['blueberry spaghetti', 'strawberry spaghetti'],
seeds=[42, 1337],
num_interpolation_steps=num_interpolation_steps,
height=512, # use multiples of 64
width=512, # use multiples of 64
audio_filepath='music/thoughts.mp3', # Use your own file
audio_start_sec=audio_offsets[0], # Start second of the provided audio
fps=fps, # important to set yourself based on the num_interpolation_steps you defined
batch_size=4, # increase until you go out of memory.
output_dir='dreams', # Where images will be saved
name=None, # Subdir of output dir. will be timestamp by default
)
You can find the code for this article here:
https://avoid.overfit.cn/post/781a2bd8a4534f7cb2d223c141d37df8
By Bob Rupak Roy