DragGAN is open source, and the diffusion model version of DragDiffusion is here!

Lying down, 60,000 words! 130 articles in 30 directions! CVPR 2023's most complete AIGC paper! Read it in one sitting.

Reprinted from: Heart of the Machine | Editors: Du Wei, Chen Ping

Move the mouse to make the picture "live" and become what you want.

In the magical world of AIGC, we can change and synthesize the image we want by "dragging" on the image. For example, to make a lion turn its head and open its mouth:

dca326af6d338cb21f11cb33ee170776.gif

The research to achieve this effect comes from the "Drag Your GAN" paper led by a Chinese author, which was released last month and has been accepted by the SIGGRAPH 2023 conference.

More than a month has passed, and the research team recently released the official code. In just three days, the number of Stars has exceeded 23k, which shows how popular it is.

c8f31070ad9165db791bf48e1e5d6ba8.png

GitHub address: https://github.com/XingangPan/DragGAN

Coincidentally, another similar research——DragDiffusion has come into people's attention today. The previous DragGAN realized point-based interactive image editing and achieved pixel-level precision editing effects. But there are also shortcomings. DragGAN is based on the generation confrontation network (GAN), and its versatility will be limited by the capacity of the pre-trained GAN model.

In the new study, several researchers from the National University of Singapore and Bytedance extended this editing framework to the diffusion model and proposed DragDiffusion. Using a large-scale pre-trained diffusion model, they greatly improved the applicability of point-based interactive editing in real-world scenarios.

While most current diffusion-based image editing methods are suitable for text embeddings, DragDiffusion optimizes the diffusion latent representation for precise spatial control.

9443388d67fd9a3054df3698a8d0f2dd.png

  • Paper address: https://arxiv.org/abs/2306.14435

  • Project address: https://yujun-shi.github.io/projects/dragdiffusion.html

The researchers said that the diffusion model generates images in an iterative manner, and the "one-step" optimization of the diffusion latent representation is sufficient to generate coherent results, enabling DragDiffusion to efficiently complete high-quality editing.

They conduct extensive experiments under various challenging scenarios (e.g., multiple objects, different object categories), verifying the plasticity and generality of DragDiffusion. The relevant code will also be released soon,

Let's see how DragDiffusion works.

First of all, we want to raise the head of the kitten in the picture below a little bit more. The user only needs to drag the red point to the blue point:

3ad553f583e0b80e123ebd2b3c741399.gif

Next, we want to make the mountain a little higher, there is no problem, just drag the red key point:

e73f6138004b7c680fff69ad4058c71a.gif

I also want to turn the head of the sculpture, just drag and drop it:

a7d96c2432eed87e56fcd1588928de6a.gif

Let the flowers on the shore bloom in a wider range:

6d295195c04a1cdffbc89d909e38f412.gif

method introduction

DRAGDIFFUSION proposed in this paper aims to optimize specific diffusion latent variables for interactive, point-based image editing.

To achieve this goal, the study first fine-tunes LoRA based on the diffusion model to reconstruct user input images. Doing so can ensure that the style of the input and output images remains consistent.

Next, we apply DDIM inversion (a method that explores the inverse transformation and latent space operations of diffusion models) to the input image to obtain step-specific diffusion latent variables.

During the editing process, we iteratively apply action supervision and point tracking to optimize the previously obtained t-th diffusion latent variables to "drag" the content of the processed point to the target location. The editing process also applies a regularization term to ensure that unmasked regions of the image remain unchanged.

Finally, the optimized t-th step latent variable is denoised by DDIM to obtain the edited result. The overall overview diagram is as follows:

a0a2abfeed2e7771c72bdbc939eda0a6.png

Experimental results

Given an input image, DRAGDIFFUSION "drags" the content of key points (red) to corresponding target points (blue). For example, in picture (1), the dog’s head is turned around, in picture (7), the tiger’s mouth is closed, and so on.

f62e2dad861b8e526b215c20262502b7.png

Below are some more sample demonstrations. As shown in Figure (4), the mountain peak will become higher, Figure (7) will increase the size of the pen, and so on.

389bde0f4738e23b98cabedca4b169c9.png

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

02909832686386962174c10eeca5ed8f.png Click me to view GAN's series albums~!

A cup of milk tea, become the frontier of AIGC+CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join  the planet of AI-generated creation and computer vision  knowledge!

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/131799236