Use Stable Diffusion for Ai+ art design (take smart light pole as an example)


Stable Diffusion is a latent diffusion model conditioned on (unpooled) text embeddings from the CLIP ViT-L/14 text encoder.

1. Installation environment

Create and activate a suitably named conda environment:ldm

conda env create -f environment.yaml
conda activate ldm

Update an existing virtual environment:

conda install pytorch torchvision -c pytorch
pip install transformers==4.19.2 diffusers invisible-watermark
pip install -e .

2. Configuration model

2.1 stable diffusion v1

Stable Diffusion v1 refers to a specific configuration of the model architecture that uses a downsampling factor of 8 autoencoders and 860M UNet and CLIP ViT-L/14 text encoders for diffusion models. The model is pre-trained on 256x256 images and then fine-tuned on 512x512 images.
There are four scales in the model:
1. sd-v1-1.ckpt: On laion2B-en, the resolution is 256x256 and iterated for 237k steps.
sd-v1-2.ckptlaion-aesthetics v2 5+
sd-v1-3.ckpt
sd-v1-4.ckpt

insert image description here

2.2 Run and test the generation effect

Once you have the stable-diffusion-v1-*-original weights, link them:

mkdir -p models/ldm/stable-diffusion-v1/
ln -s <path/to/model.ckpt> models/ldm/stable-diffusion-v1/model.ckpt 

Or put the model directly models/ldm/stable-diffusion-v1/below.
Test it out from text to image:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms 

By default, images of size 512x512 are rendered in steps of 50.
Test it out from image to image:

python scripts/img2img.py --prompt "A fantasy landscape, trending on artstation" --init-img <path-to-img.jpg> --strength 0.8

Guess you like

Origin blog.csdn.net/m0_46339652/article/details/128499121