Rapid realization of engineering deployment, Amazon cloud technology provides a reliable foundation for AIGC productization

This article will use the deployment of the Stable Diffusion Quick Kit on Amazon SageMaker to introduce the basics of the Stable Diffusion model, the HuggingFace Difusers interface, and how to use the Quick Kit to quickly deploy inference services on the SageMaker Endpoint.

Stable Diffusion Model

In 2022, StabilityAI, Runway, and the CompVis team of the University of Munich jointly released the Stable Diffusion model. The code and weights of the model have been open sourced. The current mainstream versions are v1.5 (runwayml/stable-diffusion-v1-5), v2, and v2. 1 (stabilityai/stable-diffusion-2, stabilityai/stable-diffusion-2-1). The Stable Diffusion model supports the use of hints to generate new images describing elements to be included or omitted, as well as redrawing existing images with new elements described in the hints. The model also allows hints to be added to existing Partial changes are made by inner and outer drawing in the figure.

Stable Diffusion is a text-to-image model based on Latent Diffusion Models (LDMs). Stable Diffusion consists of 3 parts: Variational Auto Encoder (VAE), U-Net and a text encoder. Stable Diffusion trains a Latent Diffusion Models on a subset of LAION-5B, which is dedicated to text generation. The model generates images by iterating "denoising" data in a latent representation space, and then decodes the representation results into complete images. The image generation can be completed in less than 10 seconds on the GPU, which greatly reduces the landing threshold and brings It has sparked a fire in the field of text and image generation.

 

Stable Diffusion Model

Common formats and storage methods

At present, various Stable Diffusion derivative models in the community have different file formats and storage methods. These different file formats and storage methods require users to use different codes for loading and reasoning. There are mainly two mainstream file formats of the Stable Diffusion model, namely ckpt and safetensors; the storage method can be divided into a single file and a diffusers structure.

Common Reasoning Methods of Stable Diffusion Model

Common usage of the Stable Diffusion model:

  1. For the native pytorch code, CompVis provides txt2img.py and img2img.py in Stable Diffusion stable-diffusion-v1-4, which are generated by loading the model through pytorch.
  2. GUI, including Stable-Diffusion-WebUI, InvokAI, and ComfyUI, usually these tools work together with the UI and inference services, and are deployed on the local graphics card.
  3. HuggingFace Difusers interface, through StableDiffusionPipeline, StableDiffusionImg2ImgPipeline can quickly load third-party models or local models, and the Stable Diffusion Quick Kit is called in the way of Difusers.

In the reasoning process, you can choose the corresponding sampler (Sampler, called Scheduler in Diffusers). Common samplers include Euler Discrete, Euler Ancestral Discrete, DDIM, KDPM2 Discrete, LMSD, etc.

On SageMaker using Quick Kit

Rapid Deployment of Stable Diffusion Inference Service

The engineering code specially created by the Amazon cloud architect team for the inference and training of the Stable Diffusion model deployed on the cloud, through the sagemaker-stablediffusion-quick-kit, the model of the diffusers directory structure can be quickly deployed to SageMaker and generated The API interface of the Http protocol and the interface separating the front and back ends. So that Amazon cloud technology users can quickly apply Stable Diffusion to business and products.

in conclusion

To sum up, the reasoning of Stable Diffusion is a relatively time-consuming service. When providing services to the client, the availability and scalability of the service under multiple concurrent requests must be considered. However, compared with ordinary application services, AI Reasoning requires the use of relatively expensive GPU resources. How to effectively control costs under the premise of ensuring reliable services is also an important factor that Amazon Cloud Technology needs to consider.

SageMaker's asynchronous reasoning can easily achieve the above goals. Its internal queue can decouple front-end requests and back-end reasoning, and can realize buffering during traffic peaks to ensure service availability. AutoScaling through SageMaker's asynchronous reasoning can automatically expand The inference node realizes resource recovery during low-peak traffic periods and saves costs. Compared with the v100 dedicated graphics card and 3090 and other civilian graphics cards, the more cost-effective models such as ml.g4dn and ml.g5 provided by SageMaker are used to implement inference, and the resource cost can be further controlled under the premise of ensuring performance. The combination of SageMaker and Stable Diffusion Quick Kit can help quickly complete the engineering deployment of the diffusion model on Amazon Cloud, providing a solid and reliable foundation for the user's AIGC productization.

Guess you like

Origin blog.csdn.net/Discovering_/article/details/130974296