Frontier Direction of CV: Visual Prompting Paradigm under Visual Prompt Engineering

Prompt is becoming more and more important in the field of vision. In image generation, as a controllable condition, it improves interaction and controllability. In terms of multimodal understanding, command prompt also makes tasks flexible and versatile. Visual prompt engineering has become a cutting-edge direction of CV!

Check out the latest two papers below to learn about the application of visual cues!


Visual Instruction Inversion: Image Editing via Visual Prompting

Text Conditional Image Editing has become a powerful image editing tool.

However, in many cases, verbal descriptions of image editing are ambiguous and inefficient. When faced with these challenges, visual cues can more intuitively and accurately convey the desired editorial content.

This paper presents a method for image editing via visual cues. Given an example pair of "before" and "after" images representing edits, the method learns a text-based editing direction for performing the same edits on new images. Transform visual cues into editing instructions by leveraging the rich pre-trained editing capabilities of text-to-image diffusion models.

The results show that even with only a single example pair, it is possible to obtain results that are competitive with state-of-the-art text-conditioned image editing frameworks. https://thaoshibe.github.io/visii/

065a03f5718e215b6c2c6af8dcd21e6f.png

A Systematic Survey of Prompt Engineering on Vision-Language Foundation Models

Prompt engineering is a technique that adapts to new tasks by augmenting large pre-trained models with task-specific hints (i.e. prompts). Hints can be manually created as natural language instructions or automatically generated as natural language instructions or vector representations.

Prompt engineering makes it possible to make predictions only by relying on prompts, without updating model parameters, and makes it easier to apply large pre-trained models to tasks.

In the past few years, prompt engineering has been widely studied in the field of natural language processing. However, there is currently a lack of systematic reviews on prompt engineering on pretrained visual language models. This paper aims to provide a comprehensive survey of cutting-edge research on prompt engineering on three types of visual language models, including multimodal generative models, image-text matching models, and text-image generative models. For each model, an overview of the model introduction, hint approach, hint-based application, and related accountability and integrity issues is outlined.

In addition, the commonalities and differences of Prompt engineering on visual language models, language models, and visual models are discussed. Challenges, future directions, and research opportunities are summarized to facilitate future research on this direction.

1087ab1163aea59e2e4d130911c362cd.png

Pay attention to the official account [Machine Learning and AI Generation Creation], more exciting things are waiting for you to read

Lying down, 60,000 words! 130 articles in 30 directions! CVPR 2023's most complete AIGC paper! read it in one go

Simple explanation of stable diffusion: Interpretation of the potential diffusion model behind AI painting technology

In-depth explanation of ControlNet, a controllable AIGC painting generation algorithm! 

Classic GAN has to read: StyleGAN

4e3e7c15d4a9b1576d005058ff4e1d73.png Click me to view GAN's series albums~!

A cup of milk tea, become the frontier of AIGC+CV vision!

The latest and most complete 100 summary! Generate Diffusion Models Diffusion Models

ECCV2022 | Summary of some papers on generating confrontation network GAN

CVPR 2022 | 25+ directions, the latest 50 GAN papers

 ICCV 2021 | Summary of GAN papers on 35 topics

Over 110 articles! CVPR 2021 most complete GAN paper combing

Over 100 articles! CVPR 2020 most complete GAN paper combing

Dismantling the new GAN: decoupling representation MixNMatch

StarGAN Version 2: Multi-Domain Diversity Image Generation

Attached download | Chinese version of "Explainable Machine Learning"

Attached download | "TensorFlow 2.0 Deep Learning Algorithms in Practice"

Attached download | "Mathematical Methods in Computer Vision" share

"A review of surface defect detection methods based on deep learning"

A Survey of Zero-Shot Image Classification: A Decade of Progress

"A Survey of Few-Shot Learning Based on Deep Neural Networks"

"Book of Rites·Xue Ji" has a saying: "Learning alone without friends is lonely and ignorant."

Click on a cup of milk tea and become the frontier waver of AIGC+CV vision! , join  the planet of AI-generated creation and computer vision  knowledge!

Guess you like

Origin blog.csdn.net/lgzlgz3102/article/details/131990209