DragGAN came into being, and drag-and-drop editing may be used on 4G videos in the future.

Original | Text by BFT Robot 

From August 14th to 15th, 2023, the 7th GAIR Global Artificial Intelligence and Robotics Conference was successfully held at the Orchard Hotel in Singapore.

At the "AIGC and Generative Content" sub-forum, Assistant Professor Pan Xingang of the School of Science and Engineering at Nanyang Technological University shared the research results on the interactive editing direction of point-drag - DragGAN with the theme of "Interacitve Point-Dragging Manipulation of Visual Contents"

Pan Xingang pointed out that the current user's creation of images does not only stop at coarse-grained editing, but also expects fine control of the spatial attributes of the image. In response to this demand, DragGAN came into being. Through DragGAN, users can selectively specify an editable area, determine points A and B, and then freely move point A to the position of point B.

More importantly, DragGAN can present not only the final edited picture, but the entire intermediate transition process, that is, the effect of a video or animation, enriching its applicable scenarios.

DragGAN, a key point drag-and-drop editing tool, provides a very good supplement to the currently popular method of generating Vincentian graphs, and has received a lot of attention and applications since it was announced.

What are we missing in image synthesis?

Although generative AI is already very good at generating images from text, achieving more advanced image fine-tuning still faces challenges. For example, we can input a piece of speech into Midjourney or Stable Diffusion and let it generate a realistic lion. But many times, the creative process does not end here.

The text description of the image is only coarse-grained, and users want to continue to fine-tune the content of the image in a fine-grained manner, such as changing the posture of the generated content, turning the lion's head, increasing or reducing the size of the object, and moving the object. position and even change the lion's expression. This series of operations are all about fine control of the spatial attributes of objects. How to finely control these attributes still faces relatively big challenges.

In order to achieve more refined image fine-tuning, users need to provide more detailed and accurate information descriptions, including descriptions of the specific location, size, posture, texture, color and other attributes of each object in the image. This information is important to produce more realistic and accurate images.

However, achieving high-quality image fine-tuning is not an easy task. A large amount of data and algorithm training are needed to improve the accuracy and effectiveness of the generator model, and more intelligent and adaptive algorithms need to be developed to handle different types of input text. In addition, it is also necessary to consider how to protect intellectual property rights and privacy during the generation process to avoid infringement.

How should we control spatial properties?

In order to achieve fine control over the spatial attributes of objects, we can follow the method of Vincentian diagrams and edit pictures based on text descriptions. At present, there are already some methods in the academic community to change the content of pictures based on text, such as moving the lion's nose 30 pixels to the right. However, there are some problems with this way of editing. First, this kind of text editing requires the support of a text model in order to understand all possible editing methods of object space attributes. There are many other ways to edit besides moving to the right. Second, it's actually hard for the language model to understand the exact length of 30 pixels in the current image. Therefore, precise editing is still a big challenge for current Vincentian graphics languages.

What is interactive point dragging?

Users can adjust the spatial attributes of the image by clicking two key points, and move the semantic part of the image represented by the red dot to the blue dot to edit the spatial attributes of the image.

This method has the following advantages: first, it is very simple and easy to use, requiring only the coordinate information of two points; second, the user can accurately specify the position and distance of the grabbing point and the target point, thereby achieving highly accurate editing and Adjustment; finally, it is very flexible and can be applied to a variety of different image editing scenarios, such as changing the size, posture, position, etc. of the image.

The result of point-and-drag interactive editing direction——DragGAN

It can be seen that the user can selectively specify an editable area, and then by specifying the red point and the blue point, the algorithm will move the red point to the position of the blue point. And it is worth mentioning that what you get is not just the final edited picture, but the entire intermediate transition process. Therefore, what is finally presented is the effect of video or animation, which also has certain application scenarios for the direction of video or animation.

Author | Ju Jushou

Typography | Spring Flowers

Review | Cat

If you have any questions about the content of this article, please contact us and we will respond promptly. For more information, please pay attention to BFT intelligent robot system~

Guess you like

Origin blog.csdn.net/Hinyeung2021/article/details/132738545