EasyPhoto: The artistic photo generation plug-in based on SD WebUI is here!

Author : wuziheng

Background introduction

Recently, applications based on generative AI technology to batch produce true/like/beautiful personal photos have become very popular. At the same time, with the rapid development of the open source community in the field of Stable Diffusion, open source projects similar to FaceChain have also emerged in the community to help developers develop personalized real-life photo generation applications. More and more developers are paying attention to this direction, hoping to get more flexible development methods.

EasyPhoto project introduction

As the development team of the FaceChain-Inpaint function, we quickly launched EasyPhoto, an open source plug-in for personalized photo generation based on the SD WebUI plug-in ecosystem. This plug-in allows users to quickly train the Lora model by uploading several photos of the same person, and then combines them with user-defined template images to generate true/like/beautiful portraits.

1.png

figure 1

Project address: https://github.com/aigc-apps/sd-webui-EasyPhoto

Welcome everyone to raise issues and optimize together so that every AIGCer has his or her own AI photo camera!

Principle introduction

AI live photo is a collection of customized portrait Lora model training and designated image generation links based on StableDiffusion and AI face related technologies. Here we briefly introduce the related technologies we implemented in EasyPhoto. The following picture is the generation of EasyPhoto Link introduction,

2.png

figure 2

EasyPhoto is divided into two stages: training and inference. Figure 3 below shows the training stage in detail, and Figure 2 above shows the generation stage.

EasyPhoto generation

EasyPhoto generation uses the open source model StableDiffusion + character customization Lora method + ControlNet method to complete the artistic photo generation

  1. Use the face detection model to perform face detection (crop & warp) on the input specified template and combine it with the digital clone for template replacement.
  2. The FaceID model is used to select the best ID Photo input by the user and the template photo for face fusion.
  3. Use the fused image as the base image, use the replaced face as the control condition, and add the Lora corresponding to the digital clone to perform image-to-image partial redraw generation.
  4. The method based on StableDiffusion + super-resolution is used to further generate high-definition result images while maintaining the ID.

EasyPhoto training!

3.png

image 3

EasyPhoto training uses a large number of face pre-processing technologies to screen and pre-process images uploaded by users, and introduces related verification and model fusion technologies, refer to Figure 3.

  1. Use FaceID and image quality scores to cluster and score all pictures, and filter out photos with different IDs.
  2. Use face detection and subject segmentation to cut out 1 filtered face image, perform face detection and cutout, and segment to remove the background.
  3. The skin beautification model is used to optimize some low-quality faces and improve the image quality of the training data.
  4. A single labeling scheme is used to label the processed training images and use the relevant Lora training.
  5. During the training process, a verification step based on FaceID is used, the model is saved at certain intervals, and the model is finally fused based on similarity.

We will briefly introduce the principles of related technologies in subsequent chapters. For more details, you are welcome to refer to the Repo code. (If you are already very familiar with this technical route, feel free to jump directly to Chapter 3 EasyPhoto & SDWebUI)

Text and graphics generation (SD/Control/Lora)

StableDiffusion

StableDiffusion is a Stability-AI open source image generation model , usually divided into SD1.5/SD2.1/SDXL and other versions. It is a diffusion model (DiffusionModel) combined with text guidance by training massive image text ( LAION-5B ) pairs. Using the trained model, the diffusion model is guided to generate high-quality images that conform to the input semantics in multiple iterations by extracting features from the input text. Interested students can refer to "An easy-to-understand explanation of the principle of stable diffusion, an epic 10,000-word long article,..." . The image below is their effect posted on the stablediffusion official website Repo .

ControlNet/Lora

For the StableDiffusion model using text control, how to better control the generated image content has always been a problem that academia and industry are trying to solve. ControlNet and Lora introduced in this section are two commonly used technologies. It is also part of the technology used in Figure 2 to control edge coherence and specify ID generation.

ControlNet: Proposed by "Adding Conditional Control to Text-to-Image Diffusion Models" , the StableDiffsion model is extended by adding some trained parameters to process some additional input signals, such as skeleton maps/edge maps/depth maps /Human posture map and other inputs, thereby completing the use of these additional input signals to guide the diffusion model to generate image content related to the signals. For example, as we can see in the official Repo , the Canny edge is used as a signal to control the output puppy.

4.png

Figure 4

We see in Figure 2 that StableDiffusion has 2 inputs, one of which is ControlNet used to control edge coherence and face shape cues. We use part of the Canny edge and OpenPose human pose.

Lora: A method proposed by "LoRA: Low-Rank Adaptation of Large Language Models" based on low-rank matrices to fine-tune a small number of parameters for large parameter models. It is widely cited in the downstream use of various large models. AI real-person portraits need to ensure that the final generated image is similar to the person we want to generate. This requires us to use Lora technology to conduct a simple training on a small number of input images, so that we can get a small designated person. Face (ID) model. Of course, this technology can also be widely used for Lora model training of styles, items and other specified images. You can find the Lora model you want on relevant web pages such as civitai.com.

Face related AI model

For the specific field of AI portraits, how to use as few pictures as possible to quickly train a realistic and similar face Lora model is the key to our ability to produce high-quality AI portraits. There are also a large number of articles and video materials on the Internet. Let me introduce you to how to train. Here we introduce some open source AI models we used in this process to improve the effect of the final face Lora training.

In this process, we made extensive use of ModelScope and other Github open source models to complete the following face functions.

face model model card Function use
FaceID https://github.com/deepinsight/insightface When extracting features from corrected faces, the feature distances of the same person will be closer. EasyPhoto image preprocessing, filtering faces with different IDs; EasyPhoto training to verify model effects; EasyPhoto prediction and selection of base images
Face Detection cv_resnet50_face Output the detection frame and key points of the face in a picture Training pre-processing, image processing and cutout; predicting and positioning template faces and key points
face segmentation cv_u2net_salient Salient object segmentation Training preprocessing, processing images and removing background
Face fusion cv_unet-image-face-fusion Fusion of two input face images Prediction, used to fuse the selected base picture and the generated picture to make the picture more like the person corresponding to the ID
Face beauty cv_unet_skin_retouching_torch Beautify the input face Training preprocessing: processes training images to improve image quality; prediction: used to improve the quality of output images.

EasyPhoto & SDWebUI

SDWebUI [ Repo ] is the most commonly used StableDiffusion development tool in the community. It has been open sourced since the beginning of the year and has 100k stars on Github. The functions we mentioned such as text and image generation/ControlNet/Lora have all been contributed to this tool by community developers , for everyone to quickly deploy a text and image generation service that can be debugged, so we also implemented the EasyPhoto plug-in under SDWebUI, integrating all the face preprocessing/training/art photo generation mentioned in the above principles into this plug-in. .

Project address: https://github.com/aigc-apps/sd-webui-EasyPhoto

Users can refer to the SDWebUI plug-in usage method to install and use it.

Introduction to EasyPhoto plug-in

EasyPhoto is a Webui UI plug-in for generating AI portraits, and the code can be used to train digital avatars related to the user.

  • It is recommended to use 5 to 20 portrait pictures for training, preferably half-length pictures without wearing glasses (a small amount is acceptable).
  • After training is complete, EasyPhoto can generate images in the inference part.
  • EasyPhoto supports using preset template images and uploading your own images for inference.

Figure 1, Figure 5 are the generation results of the plug-in. Judging from the generation results, the plug-in generation effect is still very good:

5.png

Figure 5

There is a template behind each picture, and EasyPhoto will modify the template to match the user's characteristics. In the EasyPhoto plug-in, some templates have been preset on the Inference side, and you can use the templates preset by the plug-in to experience it; in addition, EasyPhoto can also customize templates, and there is another tab page on the Inference side, which can be used to upload customized templates. . As shown below.

6.png

Figure 6

Before Inference prediction, we need to conduct training. Training requires uploading a certain number of user personal photos. The output of training is a Lora model. This Lora model will be used for Inference prediction.

In summary, the execution process of EasyPhoto is very simple: 1. Upload user pictures and train a Lora model related to the user; 2. Select a template for prediction and obtain the prediction results.

Installation method one: SDWebUI interface installation

When the network is good! ! ! Jump to Extentions in SDWebUI and select install from URL. Enter https://github.com/aigc-apps/sd-webui-EasyPhoto and click install below to install.

During the installation process, dependent packages will be installed automatically, so you need to wait patiently for this. After installation, you need to restart WebUI.

7.png

Figure 7

Installation method two: source code installation

If you want to install using the project source code, go directly to the extensions folder of Webui, open the git tool, and git clone. After the download is completed, restart webui and the required environment libraries will be checked and installed.

Screenshot 2023-09-07 11.38.02.png

Figure 8

Other installation items: ControlNet

EasyPhoto requires SDWebUI to support ControlNet, and the relevant plug-in used is Mikubill/sd-webui-controlnet . Before using EasyPhoto, you need to install this software source.

  • Additionally, we need at least three Controlnets for inference. Therefore, you need to set Multi ControlNet: Max models amount (requires restart) .

9.png

Figure 9

EasyPhoto training

EasyPhoto training includes the following 2 steps: 1. Upload personal pictures 2. Adjust training parameters 3. Click on training and set ID. The overall interface is as follows

10.png

Figure 10

Upload training images

On the left are training images. Click Upload Photos to upload images. Click Clear Photos to delete uploaded images.

Adjust training parameters

After uploading the image, the training parameters are shown on the right. There is no need to adjust the parameters for the first training.
Insert image description here

Figure 11

Here we also give an introduction to the parameters

parameter name meaning
resolution The size of the images fed to the network during training, the default value is 512
validation & save steps The number of steps to verify the image and save the intermediate weight. The default value is 100, which means the image is verified and the weight is saved every 100 steps.
max train steps Maximum number of training steps, default value is 800
max steps per photos The maximum number of training steps for each image, the default is 200, whichever is smaller combined with max train steps.
train batch size Batch size for training, default value is 1
gradient accumulationsteps Perform gradient accumulation. The default value is 4. When train batch size=1, each step is equivalent to feeding four images.
dataloader num workers The number of works for data loading does not take effect under Windows because an error will be reported if set. Linux can set it normally.
learning rate The learning rate for training Lora, the default is 1e-4
rank Lora The feature length of the weight, the default is 128
network alpha Regularization parameter for Lora training, generally half of rank, default is 64

The calculation formula for the final number of training steps based on the above table is also relatively simple.

Screenshot 2023-09-07 14.09.05.png

The simple understanding is: when the number of pictures is small, the number of training steps is photo_num * max_steps_per_photos. When the number of images is large, the number of training steps is max_train_steps.

Training & Setup ID

After completing the settings, click Start Training below. At this time, you need to fill in the User ID above, such as the user's name, and then you can start training.

12.png

Figure 12

observation training

! ! During the initial training, a part of the weights of the pre-trained model will be downloaded from the OSS (public) we prepared. We can just wait patiently. You need to pay attention to the terminal for the download progress.

13.png

Figure 13

Relevant logs for the normal start of training

14.png

Figure 14

When the terminal displays like this, the training has been completed. The last step is to calculate the face ID gap between the verification image and the user image to achieve Lora fusion and ensure that our Lora is the user's perfect digital clone.

15.png

Figure 15

EasyPhoto prediction

After training, we need to move the tab page to Inference. Due to the characteristics of Gradio, the newly trained model will not be automatically refreshed. You can click the blue rotating button next to the Used id to refresh the model .

16.png

Figure 16

  • After refreshing, select the model you just trained , and then select the corresponding template to start prediction.
  • For the first prediction, you need to download some modelscope models , just wait patiently.

Some templates are preset, you can also switch to upload image and directly upload the template yourself for prediction. Then we can get the prediction results.

17.png

Figure 17

Forecast parameter description

Description of some parameters of the prediction interface

parameter name meaning
After Face Fusion Ratio The ratio of the second face fusion, the larger it is, the more similar it is.
First Diffusion steps Number of steps for the first Stable Diffusion
First Diffusion denoising strength Proportion of first Stable Diffusion reconstruction
Second Diffusion steps Number of steps for the second Stable Diffusion
Second Diffusion denoising strength Proportion of the second Stable Diffusion reconstruction
Crop Face Preprocess Whether to crop the face first and then process it, suitable for large pictures
Apply Face Fusion Before Whether to perform face fusion for the first time
Apply Face Fusion After Whether to perform face fusion for the first time

write at the end

EasyPhoto all uses models and related technologies from the open source community, aiming to explore the technology and related applications of StableDiffusion in the field of AIGC Time indicates reprinting! ! ! .

Everyone is very welcome to download and try it out, and participate in the development to create truly beautiful AI photos!

Guess you like

Origin blog.csdn.net/weixin_48534929/article/details/132737027