introduction
Recently, in many places, I have seen the god pictures generated by various bigwigs using AI, so I simply collected data from the Internet and deployed them to experience the charm of AI paintings. This article builds AI painting on colab based on huggingface API.
Steps for usage
1. Huggingface original environment address
2. Configure colab GPU resources
Click Connect
. After the initialization is complete, click here to allocate GPU resources.
Here the environment is initialized. Execute !nvidia-smi to view the allocated GPU configuration. Generally, one is randomly allocated from K80, T4, P100, and V100. If you are lucky enough to be assigned to a V100 machine, the generation of pictures will be much faster.
3. Install third-party libraries
Install related dependent libraries, including diffusers, scipy, ftfy, transformers and accelerate.
4. Load the model
The mainstream models are as follows:
CompVis/stable-diffusion-v1-4
runwayml/stable-diffusion-v1-5
stabilityai/stable-diffusion-2-1-base
stabilityai/stable-diffusion-2-1
You can load each model by modifying the configuration. By comparing multiple sets of generated pictures, the overall effect of the V2 version is better than that of V1, and the quality of the generated pictures is better, but the resource consumption is relatively higher.
Model load cuda.
5. Image generation
Generate the picture you want by modifying the prompt
6. Multiple image generation
If you want to generate multiple images for the same text prompt, simply enter the same text multiple times. We can send a list of text to the model, let's write a helper function to display multiple images
from PIL import Image
def image_grid(imgs, rows, cols):
assert len(imgs) == rows*cols
w, h = imgs[0].size
grid = Image.new('RGB', size=(cols*w, rows*h))
grid_w, grid_h = grid.size
for i, img in enumerate(imgs):
grid.paste(img, box=(i%cols*w, i//cols*h))
return grid
num_images = 3
prompt = ["a photograph of an astronaut riding a horse"] * num_images
images = pipe(prompt).images
grid = image_grid(images, rows=1, cols=3)
grid
You can also generate n*m images:
num_cols = 3
num_rows = 2
prompt = ["a photograph of an astronaut riding a horse"] * num_cols
all_images = []
for i in range(num_rows):
images = pipe(prompt).images
all_images.extend(images)
grid = image_grid(all_images, rows=num_rows, cols=num_cols)
grid
7. Parameter configuration
random seed
Running the above cell multiple times will give you a different image each time. If you want deterministic output, you can pass a random seed to the pipeline. Using the same seed will give the same image result every time.
import torch
generator = torch.Generator("cuda").manual_seed(1024)
image = pipe(prompt, generator=generator).images[0]
image
iterations
The number of inference steps can be changed with the num_inference_steps parameter. In general, the more steps used, the better the result. Stable Diffusion, one of the latest models, works well with a relatively small number of steps, and the default value of 50 is recommended. You can use lower numbers if you want faster results.
import torch
generator = torch.Generator("cuda").manual_seed(1024)
image = pipe(prompt, num_inference_steps=15, generator=generator).images[0]
image
The height and width of the generated image
The size of the generated image is controlled by height and width.
import torch
image = pipe(prompt, height=512, width=512).images[0]
image
guidance_scale
A method to increase the adherence to the conditional signal (in this case text) as well as the overall sample quality. In simple terms, classifier-free guidance forces the generation of better matching cues. The default value is 7.5. If you use a very large number, the image may look good, but the variety will be reduced.
import torch
image = pipe(prompt, guidance_scale=7.5).images[0]
image
sample display
east africa landscape, highly detailed, digital painting, concept art, sharp focus, cinematic lighting, fantasy, intricate, elegant, lifelike, photorealistic, illustration, smooth
Refia from final fantasy 3ds, Refia staring through the window a little bit sadly, highly detailed, digital painting, 8k resolution
European and American beauty, good figure,very beautiful,glamorous,8k
cute mini meka, chibi, disney style, manga style, UHD, HDR, 4K, 8K
Cute and adorable cartoon fluffy baby rhea, fantasy, dreamlike, surrealism, super cute, trending on artstation, 8k
以上。