Article directory
Controlnet allows control over the resulting image via line art, motion recognition, depth information, and more.
1. Installation
automatic installation
stable-diffusion-webui
FindExtensions
-> on the pageInstall from URL
, enter the git address of the plugin, and click Install. The URL is as follows:
https://github.com/Mikubill/sd-webui-controlnet.git
Installed into xxx. Use Installed tab to restart.
After waiting for the loading to end, a prompt will appear at the bottom of the page
manual installation
- Go to the extension folder
cd ./stable-diffusion-webui/extensions
- Download project files to
extensions
folder
git clone https://github.com/Mikubill/sd-webui-controlnet.git
2. Enable Controlnet
- Find in turn
Extensions
->Installed
->Apply and restart UI
- After
webui
restarting, you can see that there will be an additional option intxt2img
andimg2img
ControlNet
- The installation is complete
3. Configure Controlnet
The user provides a reference image ( I origin I_{origin}Iorigin),
ControlNet
preprocess the reference image according to the specified mode, and get a new image ( I new I_{new}Inew), as another reference image;
according to the prompt word (Prompt
), combined with the previous reference image, the image is drawn, that is, I origin + I new = I final I_{origin} + I_{new} = I_{final}Iorigin+Inew=Ifinal
- There are two types of models applied to the two stages of the above process:
ControlNet
Official download address
Preprocessor model (annotator)
pre-training model (models)
ControlNet
The official pre-trained model file (models) stored on huggingface contains the modelSD
ofv1-5-pruned-emaonly
, whichstable-diffusion-webui
is unnecessary in the environment, and also causes a lot of waste of hard disk space, you can only download the cropped version
- cropped version
-
Model storage location
- Plugin
models
storage directory:./stable-diffusion-webui/extensions/sd-webui-controlnet/models
- Plugin
annotator
storage directory:./stable-diffusion-webui/extensions/sd-webui-controlnet/annotator
- Plugin
-
Note: Preprocessor models (annotators) need to be stored in categories, for example:
body_pose_model.pth
andhand_pose_model.pth
should be saved toopenpose
the directory;network-bsds500.pth
Save tohed
the directory;upernet_global_small.pth
is saved touniformer
the directory;- For other model files, you can find the corresponding storage directory according to the keyword
-
After downloading and saving the pre-processor model and pre-training model, restart
webui
to use
4. Differences between pre-trained models
- ControlNet with User Scribbles
control_scribble-fp16
supports creating a new white canvas and drawing lines on it. The model takes the drawing as input and obtains the corresponding output, using human scribbles to control SD. The model is trained using border edges with very powerful data augmentation to simulate border lines similar to those drawn by humans. - examples:
- ControlNet with Canny Edge
control_canny-fp16
needs to provide a picture, the preprocessor uses Canny edge detection to extract the edge line, the model uses this edge line as input, and controls to generate the corresponding output, which is suitable for coloring line drawings or converting pictures into lines After doing it, recolor it, which is more suitable for the characters. - examples:
- ControlNet with HED Maps
control_hed-fp16
needs to provide a picture. The preprocessor uses HED edge detection to extract soft edges. Using this edge line as input, the control generates the corresponding output. The extracted image target boundary will retain more details. This model is suitable for recoloring and stylized. - examples:
- ControlNet with M-LSD Lines
control_mlsd-fp16
needs to provide a picture, the preprocessor uses M-LSD edge detection, uses this edge line as input, and controls to generate the corresponding output. This model basically cannot recognize people, but it is very suitable for building generation. According to Basemap or self-drawn line draft to generate an intermediate image, and then generate a picture. - examples:
- ControlNet with Human Pose
control_openpose-fp16
obtains the corresponding pose skeleton diagram as input and generates the corresponding result according to the provided picture, generates the middle picture of the action skeleton according to the picture, and then generates the picture. It is most appropriate to use real-life pictures, because the real-life materials used in the model library . - examples:
- ControlNet with Depth
control_depth-fp16
needs to provide a picture, and the preprocessor will use Midas depth estimation to obtain the estimated picture as input, and generate an output picture based on the depth estimated picture, creating an intermediate picture with depth of field, which can be used by all architectural figures, and the model is 64×Stability
64 Depth,ControlNet
the model can generate a 512×512 depth map. - examples:
- ControlNet with Normal Map
control_normal-fp16
generates the corresponding normal map as input according to the provided picture. The normal map is a technology that simulates the lighting effect of the bump. It is an implementation of the bump map. Compared with the depth Depth Map model, the normal map model seems to be better at preserving details. Generate an intermediate image similar to a normal map based on the base image, and use this intermediate image to generate a modeling effect image. This method is suitable for character modeling and architectural modeling, but is more suitable for character modeling. - examples:
- ControlNet with Segmentation
control_seg-fp16
can perform block segmentation on multiple objects in the image, such as buildings, sky, flowers, trees, etc., and can identify the subject and background very well, and use semantic segmentation to controlSD
. Now you need to input images, and a model called Uniformer will detect segmentation for you. - examples:
Segmentation
Note : If the semantic segmentation map is not generated, please check whether there is an error in the console. In addition, checkupernet_global_small.pth
whether it is saved toextensions\sd-webui-controlnet\annotator\uniformer
the directory. BecauseSD
also has amodels\uniformer
directory, if the model file is not manually placed beforeupernet_global_small.pth
, then whenSemantic Segmentation
the model is used,SD
it will automatically download a default file to its ownmodels\uniformer
directory. In this case, the console will report an error, and you can troubleshoot this problem.- It can be seen that a colorful picture is generated in the preprocessing stage, which is the semantic segmentation map. Each color in this picture represents a type of item, such as purple ( ) for a bed (bed), orange
#cc05ff
(#ffc207
) stands for cushion (cushion), golden yellow (#e0ff08
) stands for lamp (lamp),ControlNet
theSemantic Segmentation
model usesADE20K
andCOCO
protocols, and the corresponding meanings of these color values can be found in the following URL:
https://docs.google.com/spreadsheets /d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit#gid=0
https://github.com/CSAILVision/sceneparsing/tree/master/visualizationCode/color150 - Advanced: modify the color of an item, and finally replace the item in the generated picture, such as changing the desk lamp (
#e0ff08
) to a flower (#ff0000
)
5. Combined application of multiple ControlNets
- By default, ControlNet can only be used to manipulate one guide generation method. By changing the settings, ControlNet can guide output through multiple models at the same time:
Settings
—ControlNet
—Multi ControlNet 的最大网络数量
- You need to save the settings and restart. After enabling it, you can set multiple ControlNets at the same time, allowing joint control of different models and superposition control of the same model. It can be used to constrain multi-person actions, and can also be used for multi-dimensional constraints.
- If you adjust its value to 3, you can use three ControlNet models to affect the generation of the final image, it is recommended to adjust as needed
- Example of use:
perfect photo (unlimited dress-up method) - Knowing
how ai painting can draw a good picture? — Better image generation effect
6. Parameter introduction
- Enable
- After checking, when you click the generate button, it will be
ControlNet
generated in real time through the guide image, otherwise it will not take effect.
- Low memory optimization (Low VRAM)
- Low memory mode If your graphics card memory is less than 4GB, it is recommended to check this option, and it needs to be
--lowvram
used with startup parameters
- Pixel Perfect
- If you enable pixel-perfect mode, you don't need to manually set the resolution of the preprocessor (annotator).
ControlNet
The optimal resolution is automatically calculated for you so that every pixel is perfectly matched to the stable diffusion.
- Allow Preview
- Allow preview images
- Preprocessor
- This list is a selection of preprocessor models, and each
ControlNet
model has different functions, which will be introduced separately later.
- Model
- The model selection in this list must be consistent with the model name in the preprocessing option box. If the preprocessing is not consistent with the model, the graph can be produced, but the effect is unpredictable and not ideal.
- Weight
- The weight represents
ControlNet
the influence of the weight ratio of the generated image using .
- Starting Control Step
- The number of steps to start to intervene, expressed as a percentage, set to 0 means to intervene at the beginning, set to 0.2 means to participate in drawing from 20% of the steps
- Ending Control Step
- The number of steps to end the intervention, expressed as a percentage, set to 1 means to stop the intervention at the first step, set to 0.9 means to stop participating in the drawing when 90% of the steps are reached
- Control Mode
- Select the importance strength of the weights, equivalent to previous versions of
Guess Mode
- Image scaling mode (Resize Mode)
- Adjust image size mode: the default is to zoom to the appropriate size, and the image will be automatically adapted.
7. Version comparison
- The original canny algorithm can subjectively feel the loading time of the model, while the compressed version loads faster, about 1 second, and the quality of the generated images is indistinguishable to the naked eye, such as character outlines, facial features, etc. It is related to the characteristics of the model, canny is to extract the contour, if there is no major change in the contour, it will not be counted)
- The Tencent t2i model outperforms the compressed version in terms of speed (because the model file is relatively small), and there is no obvious difference in the generated results, but there are subtle differences, such as face shape
- Refer to the articles of other bloggers below to compare the differences between the two. If you want to be faster, please use the compressed version model . If you want to be faster, please use the Tencent t2i model . As far as the effect is concerned, please judge for yourself which one is better. Prefer to use the condensed version of the model
- Precisely controlled ice-breaking solution for AI image generation, ControlNet and T2I-Adapter
ControlNet
What is the difference between andT2I-Adapter
?
ControlNet
As mentioned in the paper,Canny Edge detector
the training of the model used 3 million edge-image-labeled corpus, and 600 GPU hours of A100 80G.Human Pose
(Human Pose Skeleton) model uses 80,000 pose-image-labeled corpus, A100 80G with 400 GPU hours.
The training ofT2I-Adapter
is completed in only 2 days on 4 Tesla 32G-V100s, including 3 conditions, sketch (150,000 picture corpus), (Semantic segmentation map
160,000 pictures) andKeypose
(150,000 pictures).
The difference between the two:ControlNet
The pre-training model currently provided has a higher degree of availability and supports more typescondition detector
(9 categories).
ItT2I-Adapter
is more concise and flexible in engineering design and implementation, and easier to integrate and expand (by virushuo who has read its code). In addition, itT2I-Adapter
supports more than onecondition model
guide, such as usingsketch
andsegmentation map
as input conditions at the same time, or in a mask zone (that isinpaint
) usingsketch
bootstrap.