Use Stable Diffusion XL and SDXL LoRA and ControlNet to generate images

445e87bfb371b2c8bfe100bd097b0042.gif

Stable Diffusion Quick Kit is a Stable Diffusion model rapid deployment toolkit, which includes a set of sample codes, service deployment scripts, and front-end UI, which can help quickly deploy a set of Stable Diffusion prototype services. We have successively released Quick Kit basics, Dreambooth fine-tuning, LoRA usage and fine-tuning. Please refer to the appendix for article links. In this article, we will introduce how to load Stable Diffusion XL (hereinafter referred to as SDXL) and the Lora model and ControlNet model suitable for SDXL through the Stable Diffusion Quick Kit for inference.

01

What is Stable Diffusion XL

(SDXL)

1.1 Overview of Stable Diffusion XL

Stable Diffusion XL is a new image generation model created by Stability AI. Compared with the previous Stable Diffusion 1.5 model, it has mainly made the following optimizations or enhancements:

1) Improvements have been made to the U-Net, VAE, and CLIP Text Encode of the original Stable Diffusion 1.5;

2) A Refiner model is added to the original model to improve the sophistication of the image through the Refiner model;

3) Stability AI first released the test version of Stable Diffusion XL 0.9. Based on the user experience and generated images, it increased the data set in a targeted manner and used RLHF technology to optimize and iteratively launch the official version of Stable Diffusion XL 1.0.

1.2 Introduction to infrastructure

Stable Diffusion XL is a two-stage cascade diffusion model consisting of a Base model and a Refiner model, designed to improve the quality and details of the generated images. Among them, the Base model is consistent with Stable Diffusion, and has the capabilities of Vincent drawing, graph drawing and image inpainting. After the Base model generates an image, the Refiner model will be cascaded after the Base model to further refine the latent features of the image generated by the Base model, thereby improving the quality and details of the generated image.

79beb5f8fc27e8e30eab2cde697f9e5f.png

From the Stability AI paper (https://arxiv.org/pdf/2307.01952.pdf) we can see that the number of UNet parameters of SDXL is 2.6B, which is much larger than the 860M of SD 1.4/1.5 and the 865M of SD 2.0/2.1 .

Among them, the added Spatial Transformer Blocks (Self Attention+ Cross Attention) account for the main part of the new parameters:

b941058f0658cc67926f08165e34c265.png

1.3 Refiner model

The biggest difference between SDXL and SD 1.5/2.0 is that it provides a separate Refiner model. In the Stable Diffusion XL 2-stage inference stage, input a prompt, generate Latent features through VAE and U-Net (Base) models, and then give this The Latent feature adds a certain amount of noise, and on this basis, the Refiner model is used for denoising to improve the overall quality and local details of the image. In essence, the Refiner model is doing the work of generating images from images.

The official provides a set of comparison pictures of SDXL without a refiner and with a refiner. You can see that the details of the pictures are more realistic after using the refiner. The following are the pictures provided by the official:

ebbc4bb40ff12b56497f430faa10b665.png

e5858841090598fe3782fffe88cbbccd.png

02

How to use SDXL with diffusers

In Quick Kit, we use HuggingFace diffusers (v0.19.3 or above, see Appendix 2 for project link). In the new version, diffusers adds two new Pipeline interfaces, StableDiffusionXLPipeline and StableDiffusionXLImg2ImgPipeline, to load SDXL models. It also provides a StableDiffusionXLControlNetPipeline. To implement the loading of the SDXL ControlNet model, the following is the basic code description:

#DiffusionPipeline
import torch
from diffusers import StableDiffusionXLPipeline, StableDiffusionXLImg2ImgPipeline


prompt = "Astronaut in a jungle, cold color palette, muted colors, detailed, 8k"


#加载SDXL base model
pipe = StableDiffusionXLPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)
pipe.to("cuda")


image = pipe(prompt=prompt,output_type="latent").images[0]


#加载SDXL refiner model
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-refiner-1.0", torch_dtype=torch.float16, variant="fp16", use_safetensors=True
)


#base model 输出,这里需output_type需要设置成为latent
images = pipe(prompt, output_type="latent").images
#refiner model 输出
images = refiner(prompt=prompt, image=images).images


#...images 保存处理


#controlnet模型初始化
def init_sdxl_control_net_pipeline(base_model,control_net_model):
    controlnet = ControlNetModel.from_pretrained(
            f"diffusers/controlnet-{control_net_model}-sdxl-1.0-small",
            variant="fp16",
            use_safetensors=True,
            torch_dtype=torch.float16,
        ).to("cuda")
    vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16).to("cuda")
    pipe = StableDiffusionXLControlNetPipeline.from_pretrained(
            base_model,
            controlnet=controlnet,
            vae=vae,
            variant="fp16",
            use_safetensors=True,
            torch_dtype=torch.float16,
        ).to("cuda")
    return pipe

Swipe left to see more

03

Using SDXL and SDXL in Quick Kit

(Kohya-style) LoRA,ControlNet

The latest version of HuggingFace's Diffusers is 0.19.3, which adds two new types: StableDiffusionXLPipeline and StableDiffusionXLImg2ImgPipeline to match the SDXL 0.9/1.0 model. It also supports the SDXL Kohya-Style LoRA model. Let's demonstrate it in practice.

First create a SageMaker Notebook based on the previous Stable Diffusion Quick Kit hands-on practice - Basics, and use git to clone the latest code (https://github.com/aws-samples/sagemaker-stablediffusion-quick-kit), and then open inference/ stable-diffusion-on-sagemaker-byoc-sdxl.ipynb in sagemaker/byoc_sdxl/.

HuggingFace provides two Controlnet models, Canny and Depth, for SDXL, and provides small versions https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small, https://huggingface.co/diffusers/controlnet- depth-sdxl-1.0-small is about 7 times smaller than the standard version and can be loaded quickly in diffusers.

3.1 Create SageMaker Endpoint

In SageMaker Notebook (stable-diffusion-on-sagemaker-byoc-sdxl.ipynb) we need to perform the following steps:

  1. Upgrade boto3 and sagemaker sdk in the notebook (the version used in this test: boto3-1.28.39 sagemaker-2.182.0)

  2. Compile docker image, the image name is sdxl-inference-v2

  3. Set the inference service parameters. Because the SDXL model parameters are larger and the default image size is larger (1024, 1024), it is recommended to use ml.g5.2xlarge. The specific configuration parameters are explained below.

# refiner 模型和 LoRA 模型是在推理参数中设置,这里只需要设置 base 模型
primary_container = {
    'Image': container,
    'ModelDataUrl': model_data,
    'Environment':{
        's3_bucket': bucket,
        'model_name':'stabilityai/stable-diffusion-xl-base-1.0' # 使用 SDXL 1.0 base 模型
    }
}


# InstanceType 推理服务的机器类型 ml.g5.2xlarge
response = client.create_endpoint_config(
    EndpointConfigName=endpoint_config_name,
    ProductionVariants=[
        {
            'VariantName': _variant_name,
            'ModelName': model_name,
            'InitialInstanceCount': 1,
            'InstanceType': 'ml.g5.2xlarge',
            'InitialVariantWeight': 1
        },
    ]
    ,
    AsyncInferenceConfig={
        'OutputConfig': {
            'S3OutputPath': f's3://{bucket}/stablediffusion/asyncinvoke/out/'
        }
    }
)

Swipe left to see more

04

test

SDXL related parameter description:

  • sdxl_refiner, whether to enable refiner inference, can be set to disable, enable

  • lora_name, lora_name name, must be unique

  • loral_url, accessible http download address, you can use the C station download link

  • control_net_model, ControlNet model name, currently supports canny and depth 2 models

4.1 Test SDXL without using Refiner model

payload={
                    "prompt": "a fantasy creaturefractal dragon",
                    "steps":20,
                    "sampler":"euler_a",
                    "seed":43768,
                    "count":1,
                    "control_net_enable":"disable",
                    "SDXL_REFINER":"disable"
}


predict_async(endpoint_name,payload)

Swipe left to see more

59e31e0f25d9ebbc985645cb09d70a02.png

4.2 Test SDXL using Refiner model

Set the parameter sdxl_refiner in 4.1 to enable to use the refiner model.

payload={
                    "prompt": "a fantasy creaturefractal dragon",
                    "steps":20,
                    "sampler":"euler_a",
                    "seed":43768,
                    "count":1,
                    "control_net_enable":"disable",
                    "SDXL_REFINER":"enable"
}


predict_async(endpoint_name,payload)

Swipe left to see more

4def864553b0580fc13a1d1ac518c787.png

4.3 Test SDXL Lora

SDXL LoRA is not compatible with the previous SD 1.5 LoRA model, so you need to choose the LoRA model trained by SDXL. Quick Kit supports dynamic loading of LoRA models. Here we chose the Dragon Style LoRA model on site C for testing.

The LoRA loading process is roughly as follows: the background service parses the lora_name and lora_url parameters, first checks whether the model file exists in the /tmp directory, if the file does not exist, use lora_url to download the model and store it in the /tmp directory. The first inference needs to wait for the LoRA model to be downloaded. If you need to change the LoRA model, you only need to change the lora_name and lora_url parameters in the inference request.

payload={
                    "prompt": "a fantasy creaturefractal dragon",
                    "steps":20,
                    "sampler":"euler_a",
                    "count":1,
                    "control_net_enable":"disable",
                     "sdxl_refiner":"enable",
                    "lora_name":"dragon",
                    "lora_url":"https://civitai.com/api/download/models/129363"
}


predict_async(endpoint_name,payload)

Swipe left to see more

9f1f578f91442f095f22ef1bf83ff39f.png

4.3 Test SDXL ControlNet

SDXL requires the use of a matching ControlNet model. In this article we will test the canny and depth models provided by HuggingFace.

In the notebook we use HuggingFace as an example to test canny and depth.

To test Canny, set control_net_model to "canny".

payload={
    "prompt": "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting",
    "steps":20,
    "sampler":"euler_a",
    "count":1,
    "control_net_enable":"enable",
    "sdxl_refiner":"enable",
    "control_net_model":"canny",
    "input_image":"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
  }




predict_async(endpoint_name,payload)

Swipe left to see more

d83dbad91fe087241887263f9a414da8.png

Test Depth, set control_net_model to "depth"

payload={
    "prompt": "aerial view, a futuristic research complex in a bright foggy jungle, hard lighting",
    "steps":20,
    "sampler":"euler_a",
    "count":1,
    "control_net_enable":"enable",
    "sdxl_refiner":"enable",
    "control_net_model":"depth",
    "input_image":"https://huggingface.co/datasets/hf-internal-testing/diffusers-images/resolve/main/sd_controlnet/hf-logo.png"
  }




predict_async(endpoint_name,payload)

Swipe left to see more

e44a8c83aeea1217fdd2901838f97a5e.png

05

Summarize

We can quickly deploy the SD XL Base model and Refiner model inference service in Quick Kit, and can dynamically load the matching LoRA model, or load the ControlNet SDXL canny and SDXL depth models provided by HuggingFace. At present, the diffusers community is developing rapidly. Many features of other GUI inference tools have been integrated into diffusers through the efforts of community contributors. However, there are still several important functions that are still under development. For example, the current version of diffusers does not officially support multiple LoRA loading. , needs to be implemented through a third-party script. The sampler/scheduler that comes with the current version of diffusers is relatively small compared to SD WebUI. These are what we need to pay attention to when using diffusers.

References

  1. Stable Diffusion XL 

    https://arxiv.org/pdf/2307.01952.pdf

  2. diffusers release note

    https://github.com/huggingface/diffusers/releases

  3. Canny Model 

    https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small

  4. Depth Model 

    https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0-small

  5. Stable Diffusion Quick Kit Hands-on Practice – Basics

    https://aws.amazon.com/cn/blogs/china/stable-diffusion-quick-kit-hands-on-practice-basics/

  6. Stable Diffusion Quick Kit Hands-on Practice – Optimization practice of using Dreambooth for model fine-tuning on SageMaker

    https://aws.amazon.com/cn/blogs/china/stable-diffusion-quick-kit-series-model-fine-tuning-with-dreambooth-optimization-practices-on-sagemaker/

  7. Stable Diffusion Quick Kit hands-on practice - LoRA fine tuning and inference in SageMaker

    https://aws.amazon.com/cn/blogs/china/lora-fine-tuning-and-reasoning-in-sagemaker/

The author of this article

c5e2a7c99e890a6e978f747cdd3a7b2d.jpeg

Su Wei

A senior solution architect at Amazon Cloud Technology, focusing on the gaming industry, an open source project enthusiast, and committed to the promotion and implementation of cloud native applications. He has more than 15 years of professional experience in the information technology industry and has served as a senior software engineer, system architect and other positions. Before joining Amazon Cloud Technology, he worked for Bea, Oracle, IBM and other companies.

5443536046879a54a04e1ab5923fd391.jpeg

Yan Jun

Amazon Cloud Technology Solutions Architect is currently mainly responsible for helping customers with cloud architecture design and technical consulting. He has an in-depth understanding of technical directions such as containerization, and has rich experience in the design and implementation of cloud migration solutions.

8653979d8ef4ac0b754021b73b85276b.gif

The star will not get lost and development will be faster!

After following, remember to star "Amazon Cloud Developer"

f9a3e9efe2e3a925ce17774eec8646ad.gif

I heard, click the 4 buttons below

You won’t encounter bugs!

83dca213c9c2710e3267c4bb1bb016e6.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/133004533