Amazon Cloud Technology and Icrod launch AI painting solution - imAgine

 In the past few months, Amazon Cloud Technology has launched a number of articles introducing how to deploy Stable Diffusion on Amazon Cloud Technology, or how to combine Amazon SageMaker and Stable Diffusion to perform model training and inference tasks.

 To help customers quickly and securely build, deploy and manage applications on Amazon Cloud Technology, many partners work closely with Amazon Cloud Technology. They offer a wide variety of services, in-depth technical knowledge, best practices and solutions including infrastructure migration, application modernization, security and compliance, data analytics, machine learning, artificial intelligence, cloud hosting, DevOps, Consulting and training.

 Recently, eCloudrover, a core service partner of Amazon Cloud Technology, launched imAgine, an AI painting solution based on Stable Diffusion. It not only has an advanced AI algorithm model that has been widely verified and easy to deploy, but also provides rich and high-quality Cost-effective cloud resources are designed to optimize costs and are designed to help industries such as games, e-commerce, media, film and television, advertising, and media quickly build AIGC application channels and create leading productivity in the AI ​​era.

 Stable Diffusion practical skills

 There is an old saying: "Everything is difficult at the beginning" and "to reach the vast and subtle". This corresponds to the two most common problems that customers encounter in the actual practice of Stable Diffusion. One is how to choose the appropriate prompt words to generate pictures that meet expectations; the other is how to optimize the details of the pictures so that the final product The results produced can meet the needs of production applications.

 Based on the past experience of serving customers using Stable Diffusion, the following content has been compiled as recommended best practices. We hope to provide readers with a reference when using Stable Diffusion for creation.

 prompt word project

 As the Stable Diffusion version continues to iterate and AI's understanding of semantics becomes closer and closer to "common sense", the requirements for prompts will become higher and higher. Many misconceptions about prompt words can sometimes have counterproductive effects on drawings.

 Basic concepts of Prompt

 Prompt words are divided into positive prompt words (positive prompt) and negative prompt words (negative prompt), which are used to tell the AI ​​what is needed and what is not.

 Misunderstandings about Prompt

  • Prompt lies in accuracy, not quantity; using the shortest words to describe the picture is more effective than natural language.

  • Descriptive words for improving quality are by no means a mindless pile-up, the more the better.

  • Frequently seen starting moves: "masterpiece", "best quality", etc., often become cumbersome in the prompt words. These words were meaningful in the NovelAI era, because NovelAI used a large number of these words to evaluate images when training the model; but now, after model authors on Civitai continue to refine the model, these prompt words are difficult to use in Show the due role in the generated graph results.

 Adjust the weight of prompt words

  • The default weight of affixes is 1, which decreases from left to right.

  • Prompt word weight will significantly affect the picture generation results

  • Use parentheses + colon + number to specify the weight of the prompt word, written like (one girl: 1.5)

 Pay attention to the order of prompt words

  • For example, if the scenery tag is in front, the character will be small, and if it is the opposite, the character will be larger or half-length.

  • Choosing the correct order and grammar to use prompt words will display the desired picture better, faster and more efficiently.

 Emoji in Prompt

  • Prompt supports the use of emoji and has good expressiveness. For specific facial expressions or actions, you can achieve the effect by adding emoji images.

  • In order to prevent semantic drift, give priority to emoji, and then use less unnecessary complex syntax such as with

 Perspective Prompt Recommendation

 parameter

 explain

 extreme closeup

 face close-up

 close up

 head

 medium close up

 ID photo

 medium shot

 Half body

 cowboy shot

 Legless

 medium full shot

 No feet

 full shot

 whole body

 Image optimization

 Many times we generate an unsatisfactory image and hope to further optimize the result, but often don't know where to start. At this time, you may refer to the following best practices for image parameter tuning:

 Which parameters need to be adjusted

  • CFG Scale: The correlation between the image and the prompt word. The higher the value, the greater the influence of the prompt words on the final generated results, and the higher the degree of fit.

 CFG 2-6: Creative, but maybe too twisted and not following the prompts. Can be fun and useful for short tips.

 CFG 7-10: Recommended for most tips. A good balance between creativity and guidance.

 CFG 10-15: Use when you are sure that the prompt is detailed and very clear, and you have very clear requirements for the image content.

 CFG 16-20: Generally not recommended unless the prompt is very detailed. May affect consistency and quality.

 CFG >20: Almost unusable.

  • Number of Sampling Steps iterations: The more steps there are, the smaller and more precise the image adjustments will be at each step. It will also increase proportionally the time required to generate the image.

 For most samplers, the more iterations, the better, but little effect is achieved beyond 50 steps.

  • Sampling method: Different sampling methods have different corresponding optimal number of iteration steps, which need to be considered comprehensively when making comparisons.

 Euler a: Creative, different pictures can be produced with different number of steps. And this is a more efficient sampling method that can be used to quickly check the effectiveness of prompts.

 DPM2 a Karras: suitable for running real models, but difficult to control after 30 steps.

 DPM++ 2M Karras: Excellent performance at high step counts. The higher the step count, the more details.

 DDIM: Convergence is fast, but the efficiency is relatively low because it requires many steps to obtain good results. It is suitable for use during redrawing.

 Different models and sampling methods produce different results. The above is for reference only. When selecting a sampling method, it is best to use the X/Y/Z chart for comparison.

  • Seed random seed: Random seed value often has a huge impact on the composition, which is also the main source of randomness in SD image generation.

 Keeping the seed unchanged, the same prompt word and model, and keeping all parameters consistent, the same seed can generate (almost) the same image multiple times.

 After determining a suitable picture composition, it is most appropriate to fix the seeds and further polish the details.

 How to compare and find the best parameters

 Use the X/Y/Z chart to find the best parameters: By using the X/Y/Z chart, we can clearly compare the results under different parameters, quickly locate the appropriate parameter range, and perform further generation control.

 Image size optimization

  • Image quality is not directly tied to image size.

  • But size affects the theme/picture content to a certain extent, because it potentially represents the selected category (such as vertical screen characters, horizontal screen landscapes, small resolution emoticons, etc.).

  • When the plot size is too wide, multiple subjects may appear in the plot.

  • Sizes above 1024 may produce unsatisfactory results and put huge pressure on server memory. Small size resolution + HD restoration is recommended.

 Optimize the generation of multiple characters/wide single characters

  • Simply using txt2img cannot effectively specify the characteristics of a single character in the case of multiple characters.

  • The more recommended solution is to create a draft + img2img or ControlNet.

  • To generate a wide painting + a single character, it is best to make a sketch, apply color, and determine the main body of the picture; or use ControlNet's OpenPose to make the character skeleton.

  • To determine the number of characters for multiple characters, it is best to use ControlNet's OpenPose to specify; this solution is also suitable for drawing three views of the same character.

 Perform hand repair

  • Send the image to img2img inpaint, use roughly the same prompt word, put the prompt about "hand" in front, and set the redraw range according to how much you want the hand features to change (if you just want the hand to be more complete, adjust it to below 0.25) , then keep the steps and CFG the same as txt2img.

  • Find a hand picture that meets your expectations, and use preprocessor + models such as ControlNet's Canny or OpenPose_hands, combined with the inpaint operation, to achieve more precise hand control.

 Perform facial restoration

  • When drawing pictures with small human subjects, facial collapse often occurs. Especially in the process of generating artistic QR codes that will be introduced later in this article, the faces of characters often collapse due to the presence of QR code points.

  • For facial redrawing, it is more recommended to use the !After Detailer plug-in, commonly known as ADetailer.

  • This plug-in will use the yolo algorithm to identify objects in the picture. We set it to recognize people's faces and provide prompt words and models for facial redrawing; the plug-in will partially redraw the recognized facial position to complete facial repair. .

  • The ADetailer plug-in can meet the needs of face and hand recognition and repair.

  • The Lora model can also be referenced in ADetailer for partial redraw generation.

Original title: Using ControlNet to generate artistic QR codes - AI painting solution based on Stable Diffusion

Guess you like

Origin blog.csdn.net/caijingshiye/article/details/132732161