In the previous blog, we showed you the practice of Stable Diffusion in e-commerce scenarios. In this blog, we will introduce some mainstream application scenarios in the game industry. How to output pictures with high controllability is the direction that many Stable Diffusion users care about, so this article focuses on several ways to use ControlNet to improve controllable output in game/anime scenarios.
"Beautiful Blossoms" series:
The third part (this part): Playing games and animation scenes
Reminder: If you are impatient, you can scroll down to the second part, directly enjoy the content of the game/anime scene, and come back to practice after you are satisfied: )
01
Environmental preparation
Before you start, please install WebUI according to the method in the first "Big Flowers" blog, and install the model included in the latest version of ControlNet 1.1 according to the method in the second blog. The details of the installation steps are as follows:
1.1 Install Stable Diffusion WebUI
Please refer to the 3.4 Scheme Deployment section in "SageMaker Notebook-Based on SageMaker Notebook to Quickly Build Managed Stable Diffusion - AI Painting and Visualization Environment" . Please note that in step 4 of the scheme deployment operation step of 3.4.1, when selecting the WebUI Version, it is recommended that you choose the latest version 0601. We have packaged the ControlNet 1.1 and OpenPose Editor plug-ins required for the experiment in this article, eliminating the need for you to install most of the steps.
Version selection 0601
1.2 Download the latest version of ControlNet 1.1 model
Please refer to the 1.1 ControlNet part in "Shenghua Miaobi Xinshoulai-Using Grounded-SAM to Accelerate the Generation of E-commerce Creatives Based on Amazon SageMaker [1]" , in the Terminal of SageMaker Notebook, download all the models of ControlNet 1.1 version through the script, the script is slightly Modify as follows:
cd SageMaker/sd-webui
./download-controlnet-1.1-models.sh
Swipe left to see more
02
Game/anime character prototype generation (highly controllable output)
Next, we will use different methods to complete the highly controllable output of the 5 game/anime application scenarios, please refer to it. I hope it can help you improve the application of Stable Diffusion in art production in combination with your own actual scenarios. The pictures in the experiment below are all from the Internet.
2.1 Scenario 1:
Generate game/anime characters based on stylized large models
2.1.1 Selection of stylized models
The stylized model we choose in this article is DreamShaper_6_BakedVae. In order to obtain a relatively consistent test experience, it is recommended that you install it together, or you can use other stylized models you are familiar with. Just replace the model download link when installing the script. This step is also downloaded through the script in the Terminal of SageMaker Notebook. Taking this model as an example, the installation and file movement scripts are as follows:
wget https://civitai.com/api/download/models/94081 --content-disposition
cp dreamshaper_631BakedVae.safetensors /home/ec2-user/SageMaker/sd-webui/data/StableDiffusion/
Swipe left to see more
After the installation is complete, let's go back to the Stable Diffusion WebUI interface, and refresh the "Stable Diffusion checkpoint" in the upper left corner to see it in the drop-down menu.
Refresh "Stable Diffusion checkpoint"
2.1.2 Generate game/anime characters
Let's first use the Wensheng graph ("txt2img") mode to generate anime characters according to instructions. The configuration is as follows:
Model: DreamShaper_6_BakedVae
(https://civitai.com/models/4384/dreamshaper)
Method: txt2img
Positive Prompt: girl with purple eyes, delicate skin, cat ears, chibi, blue, gold, white, purpple, dragon scaly armor, forest background, fantasy style, (dark shot:1.17), epic realistic, faded, ((neutral colors)), art, (hdr:1.5), (muted colors:1.2), hyperdetailed, (artstation:1.5), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.8), (teal and orange:0.4), colorfull, (natural skin texture, hyperrealism, soft light, sharp:1.2), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15), white hair
Negative Prompt: BadDream, UnrealisticDream, FastNegativeEmbedding
Sampling Method: DPM++ SDE Karras
Sampling Steps: 30
Batch Size: 4
Width: 512
Hight: 768
Generate stylized game/anime characters
We will save the image below as a material for the next steps.
Choose a character to continue the experiment
2.2 Scenario 2:
Use Reference-only to control the output of character details
Many players are more familiar with the method of large model + Lora to control the stylized output. Next, this article will show you another controllable output method: in the "img2img" mode, use the new Reference-only preprocessing in ControlNet 1.1 stylized output without adding any Lora model.
Principle: By combining the attention layer of the model with the picture features as examples, Reference-only can preserve the content and style of more example pictures. Currently Reference-Only has three modes: Reference-Only, Reference-adian, and Reference_adain+att. Among them, Reference-adian focuses more on retaining the content of the original image, Reference-only focuses more on retaining the style, and Reference_adian+att combines the two to preserve the theme characteristics of the original image to the greatest extent. The following will take "Reference-Only" as a demonstration first, and then compare the difference with Reference-adian and Reference_adain+att.
2.2.1 Mat image + prompt word + Reference-only
Demonstration 1: Facial features remain the same, but hair color and clothing change
In order to ensure that the model continues to output the same skin color, it is suggested that the prompt words can slightly adjust the skin color, hair color, etc. on the original basis, and put the prompt words as early as possible.
Model: DreamShaper_6_BakedVae
(https://civitai.com/models/4384/dreamshaper)
Method: img2img
Positive Prompt: girl with purple eyes, delicate white skin, cat ears, chibi, blue, gold, white, purpple, dragon scaly armor, forest background, fantasy style, (dark shot:1.17), epic realistic, faded, ((neutral colors)), art, (hdr:1.5), (muted colors:1.2), hyperdetailed, (artstation:1.5), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.8), (teal and orange:0.4), colorfull, (natural skin texture, hyperrealism, soft light, sharp:1.2), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15), green hair
Negative Prompt: BadDream, UnrealisticDream, FastNegativeEmbedding
Sampling Method: DPM++ SDE Karras
Sampling Steps: 30
Batch Size: 4
Width: 512
Hight: 768
ControlNet: Reference
Control Model: balanced
Resize Model: Just Resize
Green, pink, and white hair of the same character
Demonstration 2: The facial features and hair color remain unchanged, but the expression changes
1) In order to ensure that the desired emoticon is generated, it is recommended to add parentheses and weights after adding emoticon keywords to the prompt, such as (grin smile), (angry: 1.3), (cry: 2), etc. The effect is shown in the figure.
2) If you want to increase hairstyle and posture control, you can use multiple controlnets at the same time (the setting method is as shown in the figure below), for example: controlnet unit 0 is reference-only, and controlnet unit 1 is canny.
3) If you want to increase face control, you can check "Restore faces" (face repair) under the prompt word.
Set up multiple ControlNets to use simultaneously
Happy and angry expressions of the same person
2.2.2 Mat image + prompt words + comparison of different Reference-only modes
In the following experiment, we can compare the impact of different modes on the final result more clearly by removing all descriptions related to the girl's appearance in the Positive Prompt and keeping other configurations unchanged. The Positive Prompt is modified as follows, and the others remain unchanged:
Model: DreamShaper_6_BakedVae
(https://civitai.com/models/4384/dreamshaper)
Method: img2img
Positive Prompt: girl standing, bright background, fantasy style, (dark shot:1.17), epic realistic, faded, ((neutral colors)), art, (hdr:1.5), (muted colors:1.2), hyperdetailed, (artstation:1.5), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.8), (teal and orange:0.4), colorfull, (natural skin texture, hyperrealism, soft light, sharp:1.2), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15)
Negative Prompt: BadDream, UnrealisticDream, FastNegativeEmbedding
Sampling Method: DPM++ SDE Karras
Sampling Steps: 30
Batch Size: 4
Width: 512
Hight: 768
From left to right are:
Output results for mat diagram, no ControlNet, Reference_adian+att, Reference_adian, Reference-only
2.3 Scene 3: Game/anime character change
From the above results, it is not difficult to see that Reference_adian+att retains the appearance, clothing style and clothing color of the girl in the input to the greatest extent in the output result. In this way, game character materials with consistent styles but with differences in details can be quickly generated in batches. However, in the actual design process, we often need richer editing functions.
For example, we may want to change different costumes for game/anime characters, and put on the armor shown in the right picture for the beautiful warrior in the left picture below (generated from the previous part).
beautiful warrior
armor
At this point, we hope to completely fix the style of the armor and control the posture of the beautiful warrior to make the overall result more natural. Then, you need to make a mask for the armor - adjust the posture of the beautiful warrior - change the outfit, the specific steps are as follows:
The first step is to use Inpaint to draw the armor mask
Select the "Inpaint" mode of "img2img" and draw the mask of the clothing on the original image, please refer to the third step of the second part of the previous blog (https://aws.amazon.com/cn/blogs/china/ accelerated-e-commerce-ad-material-generation-using-grounded-sam-based-on-amazon-sagemaker-part-one/), the result is shown in the figure below:
Armor original drawing and mask drawing
The second step, use Openpose to control the pose of the character
Method 1: Inpaint + Openpose + Reference_adian+att + canny
You can choose to directly upload the picture with the target pose to "Openpose" in the "ControlNet" section under the "Inpaint" tab, as Controlnet Unit 0 to generate the pose of the character; at the same time superimpose the Reference_adian+att and canny introduced earlier As Controlnet Unit 1 and Unit 2, it is used to control character characteristics. However, this often results in disproportion or difficult alignment, as shown in the figure below.
Use Openpose directly to change the effect
Method 2: Inpaint + OpenPose Editor + Reference_adian+att
Since the effect of using OpenPose directly is not good, we suggest that you can use the OpenPose Editor plug-in to adjust the pose of the character, match the armor and then change the outfit. In this way, the picture with clothing can be used as the background, which makes the adjustment of key points more intuitive and accurate. At the same time, the adjusted result can be directly synchronized to ControlNet in inpaint, replacing Controlnet Unit 0 in method 1. The method of using OpenPose Editor is shown in the figure below:
How to use OpenPose Editor
The following will show the brief processing process of OpenPose Editor:
From left to right are:
Character image input, recognition of initial posture, target clothing, and adjustment of posture according to the background of the target clothing
The third step, clothing mask + character pose to complete the dressing
After sending the adjusted pose to ControlNet Unit 0 of "img2img", return to the "Inpaint" interface, and combine the following configurations to obtain the final result:
Model: DreamShaper_6_BakedVae
Method: img2img
Positive Prompt: girl standing, purple eyes, delicate skin, chibi, blue, gold, white, purpple, dragon scaly armor, dark war background, fantasy style, (dark shot:1.17), epic realistic, faded, ((neutral colors)), art, (hdr:1.5), (muted colors:1.2), hyperdetailed, (artstation:1.5), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.8), (teal and orange:0.4), colorfull, (natural skin texture, hyperrealism, soft light, sharp:1.2), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15), white hair
Negative Prompt: BadDream, UnrealisticDream, FastNegativeEmbedding
Sampling Method: DPM++ SDE Karras
Sampling Steps: 30
Batch Size: 4
Width: 512
Hight: 768
Inpaint configuration
ControlNet configuration
The beauty warrior AI dressup results are as follows:
Dressup of beautiful warrior
2.4 Scenario 4: Multiple game/anime character drawing
Sometimes we may want to draw different characters together, such as promotional posters and event banners used in game operation planning. Ideally, we hope to have the same control over the position and shape of each subject. This is often difficult to achieve in the Vinsen diagram function only by prompt words. At this time, we can also use the multiple ControlNet mentioned above. To enhance the production capacity. Specific steps are as follows:
The first step, select the "txt2img" mode
The second step is to adjust the main layout of different characters
We need to select all the required human subjects, adjust their positions according to their layout on the final picture, and select the Canny model as the input of ControlNet Unit 0 and ControlNet Unit 1, as shown in the figure below. It is worth noting here that if the input image background is not a solid color, the mask needs to be drawn manually.
input image 1
Input image 2
The third step is to combine the prompt words to complete the drawing of multi-character subjects
Model: DreamShaper_6_BakedVae
Method: txt2img
Positive Prompt: 2 girls standing together, castle background, fantasy style, (dark shot:1.17), epic realistic, faded, ((neutral colors)), art, (hdr:1.5), (muted colors:1.2), hyperdetailed, (artstation:1.5), cinematic, warm lights, dramatic light, (intricate details:1.1), complex background, (rutkowski:0.8), (teal and orange:0.4), colorfull, (natural skin texture, hyperrealism, soft light, sharp:1.2), (intricate details:1.12), hdr, (intricate details, hyperdetailed:1.15)
Negative Prompt: BadDream, UnrealisticDream FastNegativeEmbedding
Sampling Method: DPM++ SDE Karras
Sampling Steps: 20
Batch Size: 1
Width: 600
Hight: 528
txt2img configuration
ControlNet configuration
After clicking Generate, you will get the following renderings, which are very suitable for game operation planning:
Multiple game/anime character drawing results
2.5 Scenario 5: Partial close-up/improve picture resolution
In the scenes generated by games/anime, sometimes we need to super-score local parts for resolution requirements to make the image clearer while ensuring the image features; sometimes in order to highlight the dramatic effect, we need to make local features finer of the repaint.
In fact, both the Upscale in Stable Diffusion WebUI Extras and the tiles in ControlNet can enlarge the original image. However, due to the different principles, there is a large gap in performance and final effect, and the applicable scenarios are also different.
2.5.1 Upscale
Upscale can be used independently or combined with pre- and post-processing scripts, but it cannot be directly combined with prompt words or ControlNet, so it is more suitable for local super-resolution scenarios. The generated image is clearer and more fully retains the features of the original image, and the inference speed is also faster. Taking the following experiment as an example, the overall inference time is only 0.82 seconds.
Upscale configuration
Original image and Upscale partial close-up effect
2.5.1 Tile + Reference-only
Tile can be used not only in combination with prompt words but also in combination with other ControlNet pre-processors, so it is more suitable for scenes that combine local enlargement and redrawing. If we want to retain the features of the original image to the greatest extent, we can only enter "higher resolution" in the Positive Prompt, and at the same time input the original image into Reference_adian+att. It is not difficult to see from the following experimental results that when zoomed in to the same image size, the generated image will be different from the original image in detail, and the definition is poor. At the same time, the single inference time exceeds 21 seconds, which is more than 21 times that of Upscale .
img2img configuration
Tile configuration
Reference configuration
Left: original image, right: Tile result
Tile details redraw this, we can further feel this from more examples below:
From left to right are:
Original image, Upscale result, Tile result
However, if we need to redraw, Tile can achieve more convenient regulation through prompt words. For example, when we change the Positive Prompt to "higher resolution, bright blue eyes" and keep other parameters unchanged, we can get the following results.
Original image and Prompt modified result
epilogue
The above are the more mainstream scenarios in communication with customers in game and animation design scenarios. With the rapid iteration of Stable Diffusion and WebUI open source community plug-ins, the operation of AI image generation is getting easier and better, and the effect of images is getting better and better, even surpassing the level of some commercial software. We also look forward to discussing relevant technical practices with you. If you have any questions about blog content, you are welcome to contact us directly.
The author of this article
Li Xueqing
GCR AI/ML Solutions Architect.
Liu Chuchu
Amazon cloud technology solution consultant, responsible for cloud computing market exploration and mining, providing customers with digital transformation consulting to help accelerate business development and innovation.
Jansson
Amazon cloud technology solution consultant, dedicated to the connection of cloud technology, data and industry, empowering customers to innovate and grow.
Zhang Zheng
Amazon cloud technology machine learning product technical expert, responsible for consulting and design based on Amazon cloud technology accelerated computing and GPU instances. Focusing on large-scale model training and inference acceleration for machine learning, he has participated in the consulting and design of many domestic machine learning projects.
Yang Jiahuan
Amazon Cloud Technology AI/ML Product Manager. Focus on cloud computing and artificial intelligence technology.
I heard, click the 4 buttons below
You will not encounter bugs!