AI Painting Deployment-Stable Diffusion (first experience of huggingface API image generation)

introduction

Recently, in many places, I have seen the god pictures generated by various bigwigs using AI, so I simply collected data from the Internet and deployed them to experience the charm of AI paintings. This article builds AI painting on colab based on huggingface API.

Steps for usage

1. Huggingface original environment address

https://colab.research.google.com/github/huggingface/notebooks/blob/main/diffusers/stable_diffusion.ipynb#scrollTo=AAVZStIokTVv

2. Configure colab GPU resources

Click Connect
insert image description here
. After the initialization is complete, click here to allocate GPU resources.
insert image description here
insert image description here
insert image description here
Here the environment is initialized. Execute !nvidia-smi to view the allocated GPU configuration. Generally, one is randomly allocated from K80, T4, P100, and V100. If you are lucky enough to be assigned to a V100 machine, the generation of pictures will be much faster.
insert image description here

3. Install third-party libraries

Install related dependent libraries, including diffusers, scipy, ftfy, transformers and accelerate.


insert image description here

4. Load the model

The mainstream models are as follows:
CompVis/stable-diffusion-v1-4
runwayml/stable-diffusion-v1-5
stabilityai/stable-diffusion-2-1-base
stabilityai/stable-diffusion-2-1
You can load each model by modifying the configuration. By comparing multiple sets of generated pictures, the overall effect of the V2 version is better than that of V1, and the quality of the generated pictures is better, but the resource consumption is relatively higher.
insert image description here
Model load cuda.
insert image description here

5. Image generation

Generate the picture you want by modifying the prompt
insert image description here
insert image description here

6. Multiple image generation

If you want to generate multiple images for the same text prompt, simply enter the same text multiple times. We can send a list of text to the model, let's write a helper function to display multiple images

from PIL import Image

def image_grid(imgs, rows, cols):
    assert len(imgs) == rows*cols

    w, h = imgs[0].size
    grid = Image.new('RGB', size=(cols*w, rows*h))
    grid_w, grid_h = grid.size
    
    for i, img in enumerate(imgs):
        grid.paste(img, box=(i%cols*w, i//cols*h))
    return grid
num_images = 3
prompt = ["a photograph of an astronaut riding a horse"] * num_images

images = pipe(prompt).images

grid = image_grid(images, rows=1, cols=3)
grid

insert image description here

You can also generate n*m ​​images:

num_cols = 3
num_rows = 2

prompt = ["a photograph of an astronaut riding a horse"] * num_cols

all_images = []
for i in range(num_rows):
  images = pipe(prompt).images
  all_images.extend(images)

grid = image_grid(all_images, rows=num_rows, cols=num_cols)
grid

insert image description here

7. Parameter configuration

random seed

Running the above cell multiple times will give you a different image each time. If you want deterministic output, you can pass a random seed to the pipeline. Using the same seed will give the same image result every time.

import torch

generator = torch.Generator("cuda").manual_seed(1024)

image = pipe(prompt, generator=generator).images[0]

image

iterations

The number of inference steps can be changed with the num_inference_steps parameter. In general, the more steps used, the better the result. Stable Diffusion, one of the latest models, works well with a relatively small number of steps, and the default value of 50 is recommended. You can use lower numbers if you want faster results.

import torch

generator = torch.Generator("cuda").manual_seed(1024)

image = pipe(prompt, num_inference_steps=15, generator=generator).images[0]

image

The height and width of the generated image

The size of the generated image is controlled by height and width.

import torch

image = pipe(prompt, height=512, width=512).images[0]

image

guidance_scale

A method to increase the adherence to the conditional signal (in this case text) as well as the overall sample quality. In simple terms, classifier-free guidance forces the generation of better matching cues. The default value is 7.5. If you use a very large number, the image may look good, but the variety will be reduced.

import torch

image = pipe(prompt, guidance_scale=7.5).images[0]

image

sample display

east africa landscape, highly detailed, digital painting, concept art, sharp focus, cinematic lighting, fantasy, intricate, elegant, lifelike, photorealistic, illustration, smooth
insert image description here

Refia from final fantasy 3ds, Refia staring through the window a little bit sadly, highly detailed, digital painting, 8k resolution
insert image description here
European and American beauty, good figure,very beautiful,glamorous,8k
insert image description here
cute mini meka, chibi, disney style, manga style, UHD, HDR, 4K, 8K
insert image description here
Cute and adorable cartoon fluffy baby rhea, fantasy, dreamlike, surrealism, super cute, trending on artstation, 8k
insert image description here
以上。

Guess you like

Origin blog.csdn.net/qq_43188358/article/details/128756071