Xi Xiaoyao's Science and Technology Sharing
Source | Qubit
Author | Ming Min
The latest work of the Peking University team, the diffusion model can also be used to drag and drop the P map!
Click a little, you can make the snow mountain grow taller:
Or let the sun rise:
This is DragonDiffusion, which was jointly brought by VILLA (Visual-Information Intelligent Learning LAB), the team of Mr. Zhang Jian of Peking University, relying on Peking University Shenzhen Graduate School-Tuzhan Intelligent AIGC Joint Laboratory, and Tencent ARC Lab.
It can be understood as a variant of DragGAN.
DragGAN now has more than 30,000 GitHub Stars, and its underlying model is based on GAN (generated confrontation network).
Large model research test portal
GPT-4 capability research portal (advanced/continue to visit in case of browser warning):
For a long time, GAN has shortcomings in generalization ability and image quality.
And this happens to be the strength of the Diffusion Model.
Therefore, Zhang Jian's team extended the DragGAN paradigm to the Diffusion model.
When the achievement was released, it was on the hot list of Zhihu.
Some people commented that this solves the problem of partial incompleteness in the pictures generated by Stable Diffusion, and can control redrawing very well.
Make a lion turn its head in a photo
The effect that Dragon Diffusion can bring also includes changing the shape of the front of the car:
Let the sofa grow gradually:
Or manually thin face:
It is also possible to replace objects in a photo, such as putting a donut into another image:
Or turn the lion's head:
The method framework includes two branches, the guidance branch and the generation branch.
First, the image to be edited goes through the inverse process of Diffusion to find the representation of the image in the diffusion latent space, which is used as the input of the two branches.
Among them, the guidance branch will reconstruct the original image, and the information in the original image will be injected into the generation branch below during the reconstruction process.
The role of the generation branch is to guide the information to edit the original image, while keeping the main content consistent with the original image.
According to the strong correspondence between the intermediate features of the diffusion model, DragonDiffusion converts the latent variable images of the two branches into the feature domain through the same UNet denoiser in each diffusion iteration.
Then use two masks, and the area. Calibrate the position of the dragged content in the original image and the edited image, and then constrain the content to appear in the area.
The paper uses cosin distance to measure the similarity of two regions, and normalizes the similarity:
In addition to constraining the content changes after editing, the consistency of other unedited areas with the original image should also be maintained. Here too, constraints are performed by the similarity of the corresponding regions. Finally, the total loss function is designed as:
In terms of editing information injection, the paper regards the conditional diffusion process as a joint score function through score-based Diffusion:
The editing signal is converted into a gradient through the score function based on the strong correspondence between features, and the latent variables in the diffusion process are updated.
In order to take both semantic and graphical alignment into consideration, the authors introduce a multi-scale guided alignment design based on this guided strategy.
In addition, in order to further ensure the consistency of the editing results and the original image, a cross-branch self-attention mechanism is designed in the DragonDiffusion method.
The specific method is to use the Key and Value in the guided branch self-attention module to replace the Key and Value in the generated branch self-attention module, so as to realize the reference information injection at the feature level.
Ultimately, the proposed method, with its efficient design, provides multiple editing modes for both generated and real images.
This includes moving objects in the image, resizing objects, replacing object appearance, and image content dragging.
In this approach, all content editing and saving signals come from the image itself without any fine-tuning or training of additional modules, which simplifies the editing process.
In their experiments, the researchers found that the first layers of the neural network were too shallow to accurately reconstruct images. However, if the fourth layer is reconstructed, it will be too deep, and the effect will be poor. Works best on 2nd/3rd floor.
Compared with other methods, Dragon Diffusion's elimination effect is also better.
From Zhang Jian's team at Peking University, etc.
The achievement was jointly brought by Zhang Jian's team of Peking University, Tencent ARC Lab, and Peking University Shenzhen Graduate School-Tuzhan AIGC Joint Laboratory.
Zhang Jian's team once led the development of T2I-Adapter, which can precisely control the content generated by the diffusion model.
Take Stars Super 2k on GitHub.
This technology has been officially used by Stable Diffusion as the core control technology of the graffiti drawing tool Stable Doodle.
The AIGC joint laboratory established by Tuzhan Intelligence and the Institute of Deep Research of Peking University has recently achieved breakthrough technological achievements in image editing and generation, legal AI products and other fields.
Just a few weeks ago, the Peking University-Rabbit Exhibition AIGC Joint Laboratory launched ChatLaw, a large language model product that was listed as the No. 1 hot search on Zhihu. .
The joint laboratory will focus on the multi-modal large model with CV as the core, continue to dig into the ChatKnowledge large model behind ChatLaw in the language field, and solve the problems of anti-illusion, privatization, and data security in vertical fields such as law and finance.
It is reported that the laboratory will launch an original large model of Stable Diffusion in the near future.
Paper address: https://arxiv.org/abs/2307.02421
Project homepage: https://mc-e.github.io/project/DragonDiffusion/