Instructions for using Controlnet in stable-diffusion-webui


Controlnet allows control over the resulting image via line art, motion recognition, depth information, and more.

1. Installation

automatic installation

  • stable-diffusion-webuiFind Extensions-> on the page Install from URL, enter the git address of the plugin, and click Install. The URL is as follows:
https://github.com/Mikubill/sd-webui-controlnet.git
  • Installed into xxx. Use Installed tab to restart.After waiting for the loading to end, a prompt will appear at the bottom of the page

manual installation

  • Go to the extension folder
cd ./stable-diffusion-webui/extensions
  • Download project files to extensionsfolder
git clone https://github.com/Mikubill/sd-webui-controlnet.git

2. Enable Controlnet

  • Find in turn Extensions-> Installed->Apply and restart UI
  • After webuirestarting, you can see that there will be an additional option in txt2imgandimg2imgControlNet
  • The installation is complete

3. Configure Controlnet

The user provides a reference image ( I origin I_{origin}Iorigin), ControlNetpreprocess the reference image according to the specified mode, and get a new image ( I new I_{new}Inew), as another reference image;
according to the prompt word ( Prompt), combined with the previous reference image, the image is drawn, that is, I origin + I new = I final I_{origin} + I_{new} = I_{final}Iorigin+Inew=Ifinal

  • There are two types of models applied to the two stages of the above process:
  1. ControlNetOfficial download address

Preprocessor model (annotator)
pre-training model (models)

ControlNetThe official pre-trained model file (models) stored on huggingface contains the model SDof v1-5-pruned-emaonly, which stable-diffusion-webuiis unnecessary in the environment, and also causes a lot of waste of hard disk space, you can only download the cropped version

  1. cropped version

Pre-trained models (models)

  • Model storage location

    • Plugin modelsstorage directory:./stable-diffusion-webui/extensions/sd-webui-controlnet/models
    • Plugin annotator storage directory:./stable-diffusion-webui/extensions/sd-webui-controlnet/annotator
  • Note: Preprocessor models (annotators) need to be stored in categories, for example:

    • body_pose_model.pthand hand_pose_model.pth should be saved to openposethe directory;
    • network-bsds500.pthSave to hedthe directory;
    • upernet_global_small.pthis saved to uniformerthe directory;
    • For other model files, you can find the corresponding storage directory according to the keyword
  • After downloading and saving the pre-processor model and pre-training model, restart webuito use

4. Differences between pre-trained models

See the Github repository for details

  • ControlNet with User Scribbles
    control_scribble-fp16supports creating a new white canvas and drawing lines on it. The model takes the drawing as input and obtains the corresponding output, using human scribbles to control SD. The model is trained using border edges with very powerful data augmentation to simulate border lines similar to those drawn by humans.
  • examples:

Scribbles

Scribbles2

  • ControlNet with Canny Edge
    control_canny-fp16needs to provide a picture, the preprocessor uses Canny edge detection to extract the edge line, the model uses this edge line as input, and controls to generate the corresponding output, which is suitable for coloring line drawings or converting pictures into lines After doing it, recolor it, which is more suitable for the characters.
  • examples:

Canny

  • ControlNet with HED Maps
    control_hed-fp16needs to provide a picture. The preprocessor uses HED edge detection to extract soft edges. Using this edge line as input, the control generates the corresponding output. The extracted image target boundary will retain more details. This model is suitable for recoloring and stylized.
  • examples:

HED

  • ControlNet with M-LSD Lines
    control_mlsd-fp16needs to provide a picture, the preprocessor uses M-LSD edge detection, uses this edge line as input, and controls to generate the corresponding output. This model basically cannot recognize people, but it is very suitable for building generation. According to Basemap or self-drawn line draft to generate an intermediate image, and then generate a picture.
  • examples:

M-LSD

  • ControlNet with Human Pose
    control_openpose-fp16obtains the corresponding pose skeleton diagram as input and generates the corresponding result according to the provided picture, generates the middle picture of the action skeleton according to the picture, and then generates the picture. It is most appropriate to use real-life pictures, because the real-life materials used in the model library .
  • examples:

openpose

openpose2

  • ControlNet with Depth
    control_depth-fp16needs to provide a picture, and the preprocessor will use Midas depth estimation to obtain the estimated picture as input, and generate an output picture based on the depth estimated picture, creating an intermediate picture with depth of field, which can be used by all architectural figures, and the model is 64× Stability64 Depth, ControlNetthe model can generate a 512×512 depth map.
  • examples:

depth

  • ControlNet with Normal Map
    control_normal-fp16generates the corresponding normal map as input according to the provided picture. The normal map is a technology that simulates the lighting effect of the bump. It is an implementation of the bump map. Compared with the depth Depth Map model, the normal map model seems to be better at preserving details. Generate an intermediate image similar to a normal map based on the base image, and use this intermediate image to generate a modeling effect image. This method is suitable for character modeling and architectural modeling, but is more suitable for character modeling.
  • examples:

Normal

Normal

  • ControlNet with Segmentation
    control_seg-fp16can perform block segmentation on multiple objects in the image, such as buildings, sky, flowers, trees, etc., and can identify the subject and background very well, and use semantic segmentation to control SD. Now you need to input images, and a model called Uniformer will detect segmentation for you.
  • examples:

Segmentation

Segmentation

  • SegmentationNote : If the semantic segmentation map is not generated, please check whether there is an error in the console. In addition, check upernet_global_small.pthwhether it is saved to extensions\sd-webui-controlnet\annotator\uniformerthe directory. Because SDalso has a models\uniformerdirectory, if the model file is not manually placed before upernet_global_small.pth, then when Semantic Segmentationthe model is used, SDit will automatically download a default file to its own models\uniformerdirectory. In this case, the console will report an error, and you can troubleshoot this problem.
  • It can be seen that a colorful picture is generated in the preprocessing stage, which is the semantic segmentation map. Each color in this picture represents a type of item, such as purple ( ) for a bed (bed), orange #cc05ff( #ffc207) stands for cushion (cushion), golden yellow ( #e0ff08) stands for lamp (lamp), ControlNetthe Semantic Segmentationmodel uses ADE20Kand COCOprotocols, and the corresponding meanings of these color values ​​can be found in the following URL:
    https://docs.google.com/spreadsheets /d/1se8YEtb2detS7OuPE86fXGyD269pMycAWe2mtKUj2W8/edit#gid=0
    https://github.com/CSAILVision/sceneparsing/tree/master/visualizationCode/color150
  • Advanced: modify the color of an item, and finally replace the item in the generated picture, such as changing the desk lamp ( #e0ff08) to a flower ( #ff0000)

5. Combined application of multiple ControlNets

  • By default, ControlNet can only be used to manipulate one guide generation method. By changing the settings, ControlNet can guide output through multiple models at the same time: SettingsControlNetMulti ControlNet 的最大网络数量
  • You need to save the settings and restart. After enabling it, you can set multiple ControlNets at the same time, allowing joint control of different models and superposition control of the same model. It can be used to constrain multi-person actions, and can also be used for multi-dimensional constraints.
  • If you adjust its value to 3, you can use three ControlNet models to affect the generation of the final image, it is recommended to adjust as needed
  • Example of use:
    perfect photo (unlimited dress-up method) - Knowing
    how ai painting can draw a good picture? — Better image generation effect

6. Parameter introduction

ControlNetParams

  1. Enable
  • After checking, when you click the generate button, it will be ControlNetgenerated in real time through the guide image, otherwise it will not take effect.
  1. Low memory optimization (Low VRAM)
  • Low memory mode If your graphics card memory is less than 4GB, it is recommended to check this option, and it needs to be --lowvramused with startup parameters
  1. Pixel Perfect
  • If you enable pixel-perfect mode, you don't need to manually set the resolution of the preprocessor (annotator). ControlNetThe optimal resolution is automatically calculated for you so that every pixel is perfectly matched to the stable diffusion.
  1. Allow Preview
  • Allow preview images
  1. Preprocessor
  • This list is a selection of preprocessor models, and each ControlNetmodel has different functions, which will be introduced separately later.
  1. Model
  • The model selection in this list must be consistent with the model name in the preprocessing option box. If the preprocessing is not consistent with the model, the graph can be produced, but the effect is unpredictable and not ideal.
  1. Weight
  • The weight represents ControlNetthe influence of the weight ratio of the generated image using .
  1. Starting Control Step
  • The number of steps to start to intervene, expressed as a percentage, set to 0 means to intervene at the beginning, set to 0.2 means to participate in drawing from 20% of the steps
  1. Ending Control Step
  • The number of steps to end the intervention, expressed as a percentage, set to 1 means to stop the intervention at the first step, set to 0.9 means to stop participating in the drawing when 90% of the steps are reached
  1. Control Mode
  • Select the importance strength of the weights, equivalent to previous versions ofGuess Mode

Control Mode

  1. Image scaling mode (Resize Mode)
  • Adjust image size mode: the default is to zoom to the appropriate size, and the image will be automatically adapted.

7. Version comparison

  • The original canny algorithm can subjectively feel the loading time of the model, while the compressed version loads faster, about 1 second, and the quality of the generated images is indistinguishable to the naked eye, such as character outlines, facial features, etc. It is related to the characteristics of the model, canny is to extract the contour, if there is no major change in the contour, it will not be counted)
  • The Tencent t2i model outperforms the compressed version in terms of speed (because the model file is relatively small), and there is no obvious difference in the generated results, but there are subtle differences, such as face shape
  • Refer to the articles of other bloggers below to compare the differences between the two. If you want to be faster, please use the compressed version model . If you want to be faster, please use the Tencent t2i model . As far as the effect is concerned, please judge for yourself which one is better. Prefer to use the condensed version of the model
  • Precisely controlled ice-breaking solution for AI image generation, ControlNet and T2I-Adapter

ControlNetWhat is the difference between and T2I-Adapter?
ControlNetAs mentioned in the paper, Canny Edge detectorthe training of the model used 3 million edge-image-labeled corpus, and 600 GPU hours of A100 80G. Human Pose(Human Pose Skeleton) model uses 80,000 pose-image-labeled corpus, A100 80G with 400 GPU hours.
The training of T2I-Adapteris completed in only 2 days on 4 Tesla 32G-V100s, including 3 conditions, sketch (150,000 picture corpus), ( Semantic segmentation map160,000 pictures) and Keypose(150,000 pictures).
The difference between the two: ControlNetThe pre-training model currently provided has a higher degree of availability and supports more types condition detector(9 categories).
It T2I-Adapteris more concise and flexible in engineering design and implementation, and easier to integrate and expand (by virushuo who has read its code). In addition, it T2I-Adaptersupports more than one condition modelguide, such as using sketchand segmentation mapas input conditions at the same time, or in a mask zone (that is inpaint) using sketchbootstrap.

Reference

Guess you like

Origin blog.csdn.net/qq_43377653/article/details/130646734