stable-diffusion installation and simple testing

Reference:
https://github.com/CompVis/stable-diffusionUnderstanding
DALL·E 2, Stable Diffusion and Midjourney's working principle
Latent Diffusion Models paper interpretation
[Generative AI] Talking about the principle of image generation model Diffusion Model
[Generative AI] The common routine behind Stable Diffusion, DALL-E, and Imagen

introduce

Stable Diffuson is a latent diffusion model (LDM) text-to-image model that generates images by iterating "denoising" data in a latent representation space, and then decodes the representation results into complete images, allowing text generation to be performed on consumer-grade GPUs On top of that, images are generated in 10 seconds, which greatly reduces the threshold for landing. Diffusion Models (DM) are Transformer-based generative models that sample a piece of data (such as an image) and gradually increase noise over time until the data cannot be identified. The model attempts to return images to their original form, learning how to generate pictures or other data in the process.

The problem with DM is that powerful DMs tend to consume a lot of GPU resources, and due to Sequential Evaluations, the cost of reasoning is quite high. In order to enable DM to train on limited computing resources without affecting its quality and flexibility, Stable Diffusion applies DM to powerful pre-trained autoencoders (Pre-trained Autoencoders).

Training the diffusion model under such a premise makes it possible to achieve an optimal balance between reducing complexity and preserving data details, and significantly improving the visual realism. Introducing a cross attention layer into the model structure makes the diffusion model a powerful and flexible generator for convolution-based high-resolution image generation.

similar:

DALL-E 2, developed by OpenAI, generates images from a text description. It uses a GPT-3 converter model trained with more than 10 billion parameters to interpret natural language input and generate corresponding images. The job of DALL-E 2 is to train two models. The first, Prior, accepts text labels and creates CLIP image embeddings. The second is Decoder, which accepts CLIP image embeddings and generates images. use

Midjourney is also an AI-powered tool that generates images based on user prompts. It is no longer possible to test generate images for free. https://discord.com/channels/662267976984297473

Install

stable-diffusion-v1-5 download interface
Model address: v1-5-pruned-emaonly.ckpt

Some packages failed to install because of network problems, you can try a few more times.
Some source code needs to be changed due to different package versions:

Error 1:

cannot import name 'rank_zero_only' from 'pytorch_lightning.utilities.distributed'

Reference: https://github.com/AUTOMATIC1111/stable-diffusion-webui/discussions/4111

solve:

from pytorch_lightning.utilities.rank_zero import rank_zero_only 

Error 2:

ImportError: cannot import name 'SAFE_WEIGHTS_NAME' from 'transformers.utils' 

Reference: https://github.com/CompVis/stable-diffusion/issues/627

pip install diffusers==0.12.1

Sample code:

python scripts/txt2img.py --prompt "a photograph of an astronaut riding a horse" --plms --ckpt /data1/Projects/stable-diffusion/models/ldm/stable-diffusion-v1/v1-5-pruned-emaonly.ckpt

Results of the:
insert image description here

For the prompt method, please refer to: Getting Started) How to write prompt for Stable Diffusion?

What are some good prompts for Stable Diffusion that you can refer to?

Guess you like

Origin blog.csdn.net/weixin_38235865/article/details/129921925