FreeDrag: Semantic Drag Editing Without Point Tracking, University of Science and Technology of China and Shanghai AI Lab

Click on " Machine Learning and AI Generation Creation " above to follow the star

Get interesting and fun cutting-edge dry goods!

The source of this article is PaperWeekly. The article is only shared

37b29b9164c9f461db3d9d285ca2907b.png

Essay topic:

FreeDrag: Point Tracking is Not You Need for Interactive Point-based Image Editing

Paper link:

https://arxiv.org/abs/2307.04684

Code link:

https://github.com/LPengYang/FreeDrag

Thesis homepage:

https://lin-chen.site/projects/freedrag/

Recently, a fiery image editing method has emerged in the broad world of AIGC—that is, by dragging the semantic content from the original position (handle point) to the target position (target point) on a given image to perform fine customized editing operations. For example, the impressive magical operation that makes your cat wink to you:

bba14c6ff8d1333fc9bb2036fbef1d6f.gif

This amazing effect comes from the [Drag Your GAN] paper (DragGAN for short) published at the SIGGRAPH 2023 conference. And once the code of DragGAN was released, it won 30K stars in just a few weeks, triggering a "Drag" craze among netizens. With the blessing of DragGAN, the "Achilles heel" of various AI drawing tools is no longer a weakness, and you can just fix it if you are not satisfied!

86bdb1f0bba1181a8b45c16fcc341c35.png

▲ Figure 1 DragGAN lost tracking points due to content mutation

Recently, researchers from the University of Science and Technology of China and Shanghai AI Lab released a related study - FreeDrag. The researchers showed that the previous DragGAN consists of two alternating iterative processes: 1) the motion supervision process guides the handle point to move towards the corresponding target point; 2) the point tracking process is responsible for locating the precise position of the handle point after the move to provide direction and constraint features for the next move. Therefore DragGAN heavily relies on the accuracy of point tracking.

However, the point tracking strategy is inherently unstable because it implicitly assumes that there is one and only one point in the default search area after each move that perfectly inherits the characteristics of the handle point. This assumption breaks down in the following two cases: i) tracking loss due to drastic changes in image content (Fig. 1) ii) tracking errors due to similar points within the search area (Fig. 2), such as contour lines and horse legs, etc. Incorrect point tracking can provide the wrong direction and constraint characteristics for the next move, causing errors to accumulate and compromising the quality of the edit.

608897287704efe5eb34d382b434585c.png

▲ Figure 1 DragGAN wrong point tracking due to the existence of similar points

0b77ec0c41b44637c284b586b8fc0bb8.png

method introduction

To prevent the unstable point tracking process from inevitably compromising the quality of image editing, researchers from University of Science and Technology of China and Shanghai AI Lab jointly propose FreeDrag, a feature-oriented point-based interactive editing framework. FreeDrag can achieve more stable and reliable drag editing without precise point tracking by introducing adaptive update template features, fuzzy positioning and linear search technology.

793291b58ed0e0078aa69398c3200cd0.png

▲ Figure 2 Flowchart of FreeDrag

16bffa779a92e19d8a35d992978b88c5.png

▲ Figure 3 Comparison of DragGAN's point tracking and FreeDrag point positioning.

DragGAN requires precise localization locations, while FreeDrag constrains localization points to be nearby by constraining feature differences, but does not require exact location of patches.

Dynamically updated template features

9ceb6c3d0ca4be1874604d20e27eed06.png

The researchers first proposed a dynamically updated template feature technique to alleviate the problem of missing tracking points. The template feature determines whether to update by measuring the quality of each movement, that is, the ratio of each update is determined by the value of the control. Bigger means a greater degree of update. The higher the quality of movement, the greater the degree of update. The moving quality is measured by measuring the L1 distance ( ) between the feature at the end of the moving and the last template feature value, and the smaller the moving quality is, the higher it is.

The update process of the template feature does not depend on the position and characteristics of the handle point, thereby getting rid of the burden of precise point tracking. At the same time, the smoothness brought by the adaptive update strategy endows the template feature with better robustness to overcome drastic content changes and avoid abnormal loss of edited content.

Fuzzy localization and linear search

Then, researchers proposed fuzzy localization and linear search techniques to alleviate the tracking point ambiguity problem. FreeDrag locates the appropriate target point for each movement by moving distance and feature difference, that is, formula (10). Positioning is mainly divided into three situations: continue to move to the target point (high quality movement); keep the current position still (incomplete movement), and point back (abnormal movement).

35233449288c741200b32ed3ce804b6e.png

Compared with the precise point tracking required by DragGAN, the positioning point searched by formula (10) is "fuzzy" because it does not require the exact position of the handle point, but ensures that the positioning point is near the handle point by constraining the feature difference, thus getting rid of the burden of precise positioning. In addition, formula (10) only performs point search on the straight line formed by the original handle point and target point. This linear search strategy effectively alleviates the interference of similar points in adjacent areas, ensures the reliability of motion supervision, and further improves the stability of point movement.

9aa92c69cdf5d49ec54c9bd2314a11f2.png

Experimental comparison

The comparison between DragGAN and FreeDrag in various scenes is shown in the figure below (Figure 4). It can be found that FreeDrag can effectively prevent the abnormal disappearance of the handle point (such as the disappearing mouth in the first example in Figure 4 and the disappearing glasses in the second example).

In addition, from the examples (5)-(8) in Figure 4, it can be observed that FreeDrag can more effectively and accurately achieve the predetermined editing goal through stable point movement. Furthermore, a large number of experiments on various scenes (Figure 5) have fully verified that FreeDrag can achieve higher editing quality through stable point movement, helping interactive point-based image editing to reach new heights.

811df3436f2fd47c54ada9391031e913.png

▲ Figure 4 Comparison of DragGAN and FreeDrag in various scenarios

97c74d1e69be0d88d0b6020cab8b1591.png

▲ Figure 5 Comparison of DragGAN and FreeDrag in more scenarios

The video comparison is as follows:

The two pictures on the left are the original picture and the editing target (red is the handle point, blue is the target point)

On the right are the processing procedures of DragGAN and FreeDrag respectively (gif)

26486a3793e1d92f3ba9eb49c212e131.png

For the example of dragging the elephant's eyes, it can be observed that the mutation of the image layout in DragGAN caused the loss of point tracking during the process of moving the elephant's eyes. The loss of point tracking made it impossible to provide effective motion supervision for subsequent movements, and thus failed to achieve the intended editing purpose. In contrast, thanks to the smoothness of the dynamically updated template features, FreeDrag can better avoid sharp changes in the image content, and thus more reliably drag the eye features to the predetermined position.

For the example of dragging the horse's legs, it can be observed that DragGAN has incorrect point tracking during the process of moving the horse's legs, which provides the wrong optimization direction for the subsequent motion supervision, thereby reducing the image quality. This error will accumulate in multiple iterations and lead to a sharp decline in the quality of the editing results. In contrast, FreeDrag's fuzzy localization and linear search strategies effectively alleviate the interference of similar points and provide a reliable supervisory signal for point movement, thereby achieving the intended editing purpose with high quality.

Lying down, 60,000 words! 130 articles in 30 directions! CVPR 2023's most complete AIGC paper! Read it in one sitting.

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

2dc78991a709915acd95243ee2aa05cf.png Click me to view GAN's series albums~!

A cup of milk tea, become the frontier of AIGC+CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join  the planet of AI-generated creation and computer vision  knowledge!

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/131820802