Use ControlNet to generate artistic QR codes – an AI painting solution based on Stable Diffusion

d86c634eaf38977f1d33171b48d2b39a.gif

Background introduction

In the past few months, Amazon Cloud Technology has launched a number of blog posts to introduce how to deploy Stable Diffusion on Amazon Cloud Technology, or how to combine Amazon SageMaker and Stable Diffusion for model training and inference tasks.

To help customers quickly and securely build, deploy and manage applications on Amazon Cloud Technology, many partners work closely with Amazon Cloud Technology. They offer a wide variety of services, in-depth technical knowledge, best practices and solutions including infrastructure migration, application modernization, security and compliance, data analytics, machine learning, artificial intelligence, cloud hosting, DevOps, Consulting and training.

 Recently, eCloudrover, a core service partner of Amazon Cloud Technology,  launched  imAgine , an AI painting solution based on Stable Diffusion. It not only has an advanced AI algorithm model that has been widely verified and easy to deploy, but also provides rich and high-quality Cost-effective cloud resources are designed to optimize costs and are designed to help industries such as games, e-commerce, media, film and television, advertising, and media quickly build AIGC application channels and create leading productivity in the AI ​​era.

This article mainly shares our practical experience in helping customers use Stable Diffusion, as well as the best practices for generating artistic QR codes using the imAgine product developed based on Stable Diffusion.

We will use QRCode as input to ControlNet so that the QRCode data points can be integrated into the artistic image while still being scannable by the QRCode reader. With this technology, you can transform any QR code into a unique work of art, expressing and delivering information in a whole new way. Here are some picture examples:

e4309dfeeb002cce5cf7b325e3a65218.png

Stable Diffusion practical skills

There is an old saying: "Everything is difficult at the beginning" and "to reach the vast and subtle". This corresponds to the two most common problems that customers encounter in the actual practice of Stable Diffusion. One is how to choose the appropriate prompt words to generate pictures that meet expectations; the other is how to optimize the details of the pictures so that the final product The results produced can meet the needs of production applications.

Based on our past experience in serving customers using Stable Diffusion, we have compiled the following content as our recommended best practices, hoping to provide readers with a reference when using Stable Diffusion for creation.

prompt word project

As the Stable Diffusion version continues to iterate and AI's understanding of semantics becomes closer and closer to "common sense," the requirements for prompts will become higher and higher. Many misconceptions about prompt words can sometimes have counterproductive effects on drawings.

Basic concepts of prompt

  • Prompt words are divided into positive prompt words (positive prompt) and negative prompt words (negative prompt), which are used to tell the AI ​​what is needed and what is not.

Misunderstandings about Prompt

  • Prompt lies in precision, not quantity; using the shortest words to describe the picture is more effective than natural language.

  • Descriptive words for improving quality are by no means a mindless pile-up, the more the better.

  • Frequently seen starting moves: " masterpiece ", " best quality ", etc., often become cumbersome in the prompt words. These words were meaningful in the NovelAI era, because NovelAI used a large number of these words to evaluate images when training the model; but now, after the model authors on Civitai continue to refining the model, these prompt words are difficult to use in the NovelAI era. Show the due role in the generated graph results.

Adjust the weight of prompt words

  • The default weight of affixes is 1, which decreases from left to right.

  • Prompt word weight will significantly affect the picture generation results

  • Use parentheses + colon + number to specify the weight of the prompt word, written like (one girl:1.5)

Pay attention to the order of prompt words

  • For example, if the scenery tag is in front, the character will be small, and if it is opposite, the character will be larger or half-length.

  • Choosing the correct order and grammar to use prompt words will display the desired picture better, faster and more efficiently.

Emoji in Prompt

  • Prompt supports the use of emoji and has good expressiveness. For specific facial expressions or actions, you can achieve the effect by adding emoji images.

  • In order to prevent semantic drift, give priority to emoji, and then use less unnecessary complex syntax such as with

Perspective Prompt Recommended

5a78817aab93ff2b60c432a69ff03f9a.png

Image optimization

Many times we generate an unsatisfactory image and hope to further optimize the result, but often don't know where to start. At this time, you may refer to the following best practices for image parameter tuning:

Which parameters need to be adjusted

CFG Scale: The correlation between the image and the prompt word. The higher the value, the greater the influence of the prompt words on the final generated results, and the higher the degree of fit. 

  • CFG 2-6: Creative, but maybe too twisted and not following the prompts. Can be fun and useful for short tips.

  • CFG 7-10: Recommended for most tips. A good balance between creativity and guidance.

  • CFG 10-15: Use when you are sure that the prompt is detailed and very clear, and you have very clear requirements for the image content.

  • CFG 16-20: Generally not recommended unless the prompt is very detailed. May affect consistency and quality.

  • CFG >20: Almost unusable.

Sampling Steps Number of iteration steps: The more steps there are, the smaller and more precise the image adjustments will be at each step. It will also increase proportionally the time required to generate the image. 

  • For most samplers, the more iterations the better, but beyond 50 steps the results will be minimal.

Sampling method: Different sampling methods have different corresponding optimal number of iteration steps, which need to be considered comprehensively when making comparisons. 

  • Euler a: Creative, different pictures can be produced with different number of steps. And this is a more efficient sampling method that can be used to quickly check the effectiveness of prompts.

  • DPM2 a Karras: suitable for running real models, but difficult to control after 30 steps.

  • DPM++ 2M Karras: Excellent performance at high step counts. The higher the step count, the more details.

  • DDIM: Convergence is fast, but the efficiency is relatively low because it requires many steps to obtain good results. It is suitable for use during redrawing.

  • Different models and sampling methods produce different results. The above is for reference only. When selecting a sampling method, it is best to use the X/Y/Z chart for comparison.

Seed Random seed: Random seed value often has a huge impact on the composition, which is also the main source of randomness in SD image generation.

  • Keeping the seed unchanged, the same prompt word and model, and keeping all parameters consistent, the same seed can generate (almost) the same image multiple times.

  • After determining a suitable picture composition, it is most appropriate to fix the seeds and further polish the details.

How to compare and find the best parameters

Use the X/Y/Z chart to find the best parameters: By using the X/Y/Z chart, we can clearly compare the results under different parameters, quickly locate the appropriate parameter range, and perform further generation control.

d1db96bf2ca1c2e24157eab7e0e167ad.png

Image size optimization

  • Image quality is not directly tied to image size.

  • But size affects the theme/picture content to a certain extent, because it potentially represents the selected category (such as vertical screen characters, horizontal screen landscapes, small resolution emoticons, etc.).

  • When the plot size is too wide, multiple subjects may appear in the plot.

  • Sizes above 1024 may produce undesirable results and put huge pressure on server memory. Small size resolution + HD restoration recommended .

Optimize the generation of multiple characters/wide single characters

  • Simply using txt2img cannot effectively specify the characteristics of a single character in the case of multiple characters.

  • The more recommended solution is to create a draft + img2img or ControlNet.

  • To generate a wide painting + a single character, it is best to make a sketch, apply color, and determine the main body of the picture; or use ControlNet's OpenPose to make the character skeleton.

  • Multi-Character To determine the number of characters, it is best to use ControlNet's OpenPose to specify; this solution is also suitable for drawing three views of the same character.

Perform hand repair

  • Send the image to img2img inpaint, use roughly the same prompt word, put the prompt about "hand" in front, and set the redraw range according to how much you want the hand features to change (if you just want the hand to be more complete, adjust it to below 0.25) , then keep the steps and CFG the same as txt2img.

  • Find a hand picture that meets your expectations, and use preprocessors + models such as ControlNet's Canny or OpenPose_hands, combined with the inpaint operation, to achieve more precise hand control.

Perform facial restoration

  • When drawing pictures with small human subjects, facial collapse often occurs. Especially in the process of generating artistic QR codes that will be introduced later in this article, the faces of characters often collapse due to the presence of QR code points.

  • For facial redrawing, it is more recommended to use the !After Detailer plug-in, commonly known as ADetailer.

  • This plug-in will use the yolo algorithm to identify objects in the picture. We set it to recognize people's faces and provide prompt words and models for facial redrawing; the plug-in will partially redraw the recognized facial position to complete facial repair. .

  • The ADetailer plug-in can meet the needs of face and hand recognition and repair.

  • The Lora model can also be referenced in ADetailer for partial redraw generation.

bc91a142272acbe546e6ded6745e54b8.png

Generate artistic QR codes with ControlNet

Step1: Optimize QR code

QR code is a black and white graphic that records data symbol information distributed in two-dimensional space with the help of specific geometric graphics. There are many different encoding methods for QR codes. Here we use the most versatile and basic encoding method: QR Code.

03ab8e46afab270a6208457a2872ea0d.png

The entered QR code is one of the most important parts of the process of generating artistic QR codes with the help of SD . We are mainly concerned with the following two characteristics of the input QR code:

1. The amount of information contained in the QR code

No matter what encoding method the QR code uses, the more character information it carries, the more complex the black and white structure the QR code will visually present. The complex structure can easily cause us to be greatly restricted by the information of the QR code itself when generating artistic ideas. Therefore, we must first find a way to shorten the length of characters contained in the QR code.

For the widest range of application scenarios, QR codes usually contain a web link; in order to improve the aesthetics of QR code generation, we first need to shorten the web link. There are many link shortening tools on the market, you can choose freely. However, please note that in mainland China, please choose a shrinking platform with domain name registration, otherwise it will be blocked by WeChat, browsers, etc.

For example, we have a URL that we want to create into a QR code: https://www.ecloudrover.com/aigc/

After chain shrinkage processing, it is: http://c.suo.nz/7KZrF.

From the figure below, you can more intuitively see the impact of link length on the vision of QR codes. Shortened links will be more conducive to our subsequent creation.

a8683e5b42a8e1ca0156e4237052369a.png

2. Presentation form of QR code

With the development of technology, QR codes not only support black and white square pattern styles, but also support diversified presentations of anchor points and code elements, such as the following styles:

281a534c1b8a6a8fad4eb674bd8a6d1d.png

In actual operation, we can try a variety of different code point forms to make the image rendering effect meet our expectations.

The figure below shows the impact of different QR code forms on the final rendering:

b1db9335e8bef0bfedfd7f0161307e31.png

Generate parameters:

Prompt: mountain, green grassland, sky, cloud, bird, blue sky, no human, day, wide shot, flying, border, outdoors, white bird, scenery
Negative prompt: easynegative
Steps: 40, Sampler: DPM++ 2M Karras, CFG scale: 6, Seed: 3943213078, Size: 872x872, Model hash: 876b4c7ba5, Model: cetusMix_Whalefall2, Clip skip: 2, ControlNet: "preprocessor: none, model: control_v1p_sd15_qrcode_monster [a6e58995], weight: 1.35-1.5, starting/ending: (0.05, 1), resize mode: Resize and Fill, pixel perfect: True, control mode: Balanced, preprocessor params: (512, 64, 64)", Version: v1.3.

Swipe left to see more

Step2: Make a basic QR code

After understanding the above points, we will start using the QR code creation tool to generate a basic QR code that is input to SD. There are a variety of web QR code generation tools on the Internet, and you can choose freely. At the same time, for your convenience, we have pre-installed the QRCode generation plug-in in the blog-specific AMI. As long as you enable the AMI from the correct version, you can see the following QRCode Toolkit directly on Webui:

  • Anthony's QR Toolkit: QRCode generation and optimization tool integrated in Webui

    https://github.com/antfu/sd-webui-qrcode-toolkit

Next we demonstrate how to use Anthony's QR Toolkit to generate a QR code. You can refer to the figure below to complete the configuration of the QR code parameters.

c20ffb40c5ff6c7adaebf05375090997.png

After completing the QR code production, you can click "Download" on the right to download it locally. Or click "Send to ControlNet" to directly send the QR code to ControlNet for the next step.

Step3: Determine the artistic style

The core of using Stable Diffusion for artistic creation is to choose the appropriate model + prompt word. Before creating artistic QR codes, we recommend not using ControlNet and generating an ordinary image first to test the image generation effect.

Here, I hope that the QR code will contain natural landscapes such as mountains, blue sky, and white clouds, so I first use the following parameters to test the generation effect of prompt words and models.

50d4154cc6aff98eeef878583d013e6f.png

Generate parameters:

Prompt: mountain, green grassland, sky, cloud, bird, blue sky, no human, day, wide shot, flying, border, outdoors, white bird, scenery
Negative prompt: easynegative
Steps: 20, Sampler: Euler a, CFG scale: 7, Seed: 4078355702, Face restoration: CodeFormer, Size: 512x512, Model hash: 876b4c7ba5, Model: cetusMix_Whalefall2, Clip skip: 2, Version: v1.3.2

Swipe left to see more

Step4: Import QR code in ControlNet

After confirming the image style, we upload the unprocessed QR code to ControlNet. Please pay attention to the configuration of the following options:

  • "Enable" button: Check to ensure that ControlNet takes effect during image generation;

  • Model selection box: Please select " control_v1p_sd15_qrcode_monster " to strengthen the control of QR codes;

  • Control weight: For the qrcode_monster model, we recommend setting it between 1.1-1.6;

  • Guidance intervention/termination timing: The intervention timing is recommended to be between  0-0.1  , and the termination timing is recommended to be  1 .

d2c811ab05ca299e4c3b836bc98dcdcf.png

It is recommended to adjust two sets of values ​​in the Vincent diagram configuration:

  • Number of iteration steps: It is recommended to be  between 30-50  . The default value of 20 is not enough to guide the generation of a high-quality QR code image.

  • Width/height: It is recommended to send the aspect ratio of the original QR code image directly from ControlNet to the top

92bb2467f541fc048d500ad0f6be85f9.png

After all parameters are configured, click Generate. You can see that we have generated a picture with good effect here, and the test of scanning the QR code with a mobile phone has also passed completely.

048444218901ea05368c2b00e63ea389.png

If the generated QR code cannot meet expectations, you can choose to fine-tune the following parameters, increase the total number of generated batches, and constantly try to draw cards to approximate the final desired effect:

  • prompt word

  • Sampling method

  • ControlNet control weight

  • ControlNet boot intervention/termination timing

59bf7935d21262ae46e95ab0ae58c7f2.png

If necessary, you can choose to use the X/Y/Z Plot in "Script" to compare the effect of generating QR codes under different parameters. Here we compare ControlNet’s control weight and guidance intervention timing:

7591ec802f8e51120b562af547fc9d14.png

appendix

Appendix 1: ControlNet QRCode model selection

For your convenience, we have completed the implantation of the ControlNet QRCode model in the Blog-specific AMI. As long as you enable the AMI from the correct version, you can directly select the model in ControlNet.

Up to now, QRCode Monster is the model that we believe has the highest success rate in controlling QR codes and the best effect of integrating QR codes into images after testing. This model can be downloaded at HuggingFace:

https://huggingface.co/monster-labs/control_v1p_sd15_qrcode_monster

There is also another QR code model on the market: QR Pattern v2.0 . We recommend using this model in conjunction with IoC Lab’s  Brightness  model as an auxiliary model to improve local contrast, which will also produce good results. However, according to our tests, this model comes with a lot of interference content, which may cause great changes in the image style. These two models can be downloaded at the link below:

https://civitai.com/models/90940/controlnet-qr-pattern-qr-codes

https://huggingface.co/ioclab/ioc-controlnet

Appendix 2: How to use Stable Diffusion AI drawing solution

imAgine is an AI drawing solution customized and developed by Amazon Cloud Technology's core service partner Icrod, based on Automatic1111 Stable Diffusion Webui and combined with Amazon Cloud Technology's various hosting services. imAgine has now been launched on Amazon Cloud Technology MarketPlace. Users can subscribe with one click in the Marketplace, start quickly, and quickly deploy an AI painting environment on the cloud without complex environment configuration.

At the same time, it is also combined with Amazon Cloud Technology's serverless services Amazon API Gateway, Amazon DynamoDB, etc. to seamlessly forward the training and inference requests of the WebUI front-end to the dedicated inference and training server of the Amazon SageMaker back-end to achieve seamless expansion of computing power and Based on this architectural foundation, front-end and back-end separation and precise cost control are achieved.

For any customers who want to quickly get started with AIGC technology and want to receive full life cycle maintenance and technical support to subscribe and test the solution, due to space limitations, please refer to the WorkShop page for the detailed operation process of subscribing to the imAgine solution: https://catalog.us- east-1.prod.workshops.aws/workshops/facdf921-2eea-4638-bc01-522e1eef3dc5

Reference link

  • Stable Diffusion AI solution MarketPlace subscription link: https://aws.amazon.com/marketplace/pp/prodview-ohjyijddo2gka?sr=0-1&ref_=beagle&applicationId=AWSMPContessa

  • Stable Diffusion AI Solution Workshop:

    https://catalog.us-east-1.prod.workshops.aws/workshops/facdf921-2eea-4638-bc01-522e1eef3dc5

  • Stable Diffusion AI solution official website:

    https://www.ecloudrover.com/aigc/

  • QRCode co-creation document by Anthony Fu, author of the QR Toolkit plug-in:

    https://antfu.me/posts/ai-qrcode-101

  • IoC Lab model display:

    https://mp.weixin.qq.com/s/i4WR5ULH1ZZYl8Watf3EPw

  • IoC Lab Stable Diffusion documentation:

    https://aigc.ioclab.com/

The author of this article

78995079d3a7f1fad3d411121e34eaa8.png

Zhuge Ruilin

Solution Architect Manager of Nanjing Ikrode Information Technology Co., Ltd., focusing on Amazon cloud native architecture design and solution practice. Specialized in the planning and implementation of cloud data lake warehouses, data analysis and machine learning. Currently he is mainly responsible for the research and development of Icrod's own solutions and the integration with ISV partner capabilities on the cloud.

10ac4cbccc75aa248a74b21118c8d38d.png

Su Zhe

Amazon Cloud Technology Solution Architect is responsible for Amazon Cloud Technology's cloud computing solution architecture consulting and design, and is committed to the promotion of Amazon Cloud Technology services in e-commerce, education, and developer groups. He once worked for IBM as an IT solutions architect, responsible for the design and development of cloud native and container architecture.

99545f18e03359556686e1d71da49357.png

Yu Tao

Amazon Cloud Technology Solution Architect, responsible for Amazon Cloud Technology cloud computing solution consulting and design. Currently, he mainly focuses on technical research and practice in the fields of modern application transformation and machine learning. Before joining Amazon Cloud Technology, he served for large operators and IT solution providers, and accumulated rich experience in cross-border e-commerce/fast-moving consumer goods industry projects.

ae980b116657923521a5b03f1b8051af.png

Su Lijun

A senior solutions architect at Amazon Cloud Technology. He is committed to the promotion, application and ecological development of Amazon Cloud Technology cloud technology. He has many years of IT Planning consulting planning, architect design, and implementation delivery experience. He has worked in manufacturing, retail FMCG, education, and medical care. , finance and other industries have accumulated numerous digital transformation cases.

1e7598d3a0ab4a9a069cb63899b05c3c.png

Jiang Meng

Amazon Cloud Technology Partner Solution Architect, responsible for Amazon Cloud Technology partner solution consulting and design, focusing on the development of partner cloud technology core competency systems. He once worked for IBM as a technical consultant and accumulated experience in solutions in the field of digital twins/process automation.

eadee3bc14c728306b20bf4c131f9dcd.gif

The star will not get lost and development will be faster!

After following, remember to star "Amazon Cloud Developer"

a5f2e3caa79340ae6bb92c19a0f572eb.gif

I heard, click the 4 buttons below

You won’t encounter bugs!

3a8e8c61dc0d586d9d87f06f1333ecc5.gif

Guess you like

Origin blog.csdn.net/u012365585/article/details/133287122