CVPR 2022|The processing speed is only 0.2 seconds! HKUST & Tencent AI Lab open source high-fidelity image editing algorithm based on GAN inversion...

Click the card below to follow the " CVer " public account

AI/CV heavy dry goods, delivered as soon as possible

Reprinted from: Jishi Platform Author丨Tengfei Wang

Source丨Hong Kong University of Science and Technology, Tencent AI Lab

Guided reading

 

This article introduces a work from HKUST and Tencent AI Lab that has been included in CVPR 2022. The work proposes a novel high-fidelity GAN inversion framework that enables attribute editing while preserving image-specific details such as background, appearance, and lighting. Not only the processing speed can reach 0.2s per image, but also the high fidelity and high quality of edited images can be guaranteed.

23ba2248ee51228d78b22c4ca8f1bb72.pngPaper: https://arxiv.org/abs/2109.06590

Code: https://github.com/Tengfei-Wang/HFGI

Homepage: https://tengfei-wang.github.io/HFGI/

Video: https://www.bilibili.com/video/BV1Xq4y1i7ev

With just one photo, this AI algorithm can quickly edit the image attributes according to the user's needs. Here is how the AI ​​brain makes up the smile in the photo:

4b78e69472012e617865c100dd6be7eb.png

6d7052234cf9df7d7a96e2dfa6882eb1.gif

af3692587aa67ab2ae399e0dcb25c1d2.png

9a4f94c8161804d8f17b96eff9222be4.gif

3210672add866aeba980410f9ea88aa8.png

071ab8f620998e33103623b9689114f5.gif

No more worrying about not being able to grasp the moment to smile when taking pictures. In addition to expressions, age and posture can also be changed at will:

f76b39a3fe91a2e8abff50894199a94a.png

95e62132f663490f6320e05c4acfff23.gif

ac7dfd205afe29f616c0103321df58a6.png

fa89e247e7b18a79628f5081a6b31f20.gif

And the edited image can retain the details of the original image with high fidelity, such as background, lighting, dress up.

Let's try to find a few familiar celebrity photos from the Internet, Musk ten years later, Lecun the smirk boy, and... Johnson with lipstick on?

f3907da14a2f7db875a4439ff222b4bd.png

The study has been included in CVPR 2022.

1. High-fidelity image editing based on GAN Inversion

The GAN inversion technique, which has been extensively studied recently, can map a photo into the latent space of a GAN generator, thereby leveraging the powerful capabilities of StyleGAN for image editing. Current GAN inversion methods fall into three categories:

  1. Based on encoder (encoder-based) This method is very fast to edit pictures (each picture < 1s), but the edited picture will lose a lot of details in the original picture, and the fidelity is low.

  2. Optimization-based This type of method iterates over each photo individually and is high fidelity , but very slow (a few minutes per image).

  3. Hybrid methods first use the encoder to get an initialized latent variable, and then optimize each latent variable. The speed is between the first and second categories based on (tens of seconds to several minutes per image), but Still slow, affecting usability.

This leads to the need to make trade-offs and trade-offs when choosing a model. Should you choose a faster speed or a higher fidelity? For those who have difficulty choosing, it is simply too tangled!

So, how did this paper choose speed and quality? The answer is: all of them . Get a bear (supplement encoder) that can fish (missing details), and you can get both the fish and the bear's paw. While fast processing (0.2s per image), high fidelity and high quality of edited images can be guaranteed .

2. Methods

Before introducing the algorithm, the author first analyzes the reasons for the low fidelity of the encoder-based method for reconstruction or editing. The famous Rate-Distortion theory in information theory is mentioned here , that is, for an encoding-decoding system, the bit-rate of the latent code affects the fidelity of the reconstructed signal (the distortion of the reconstructed signal and the source signal). ) is restricted.

f7ed39d6cbfb9b0a71d4bbc98ca7a101.png

That is to say, the hidden code compressed by the previous encoder is very small (low-rate), usually 1x512 or 18x512, which will inevitably damage some information during the reconstruction process of the generator, causing large distortion. The reconstructed or edited image is distorted compared to the original image.

So, do we directly increase the size of the encoder output hidden code (high-rate), and the problem is solved? The answer is: yes and no. Doing so does improve the fidelity of the reconstructed image, but our purpose is to edit the image rather than reconstruct it. Low-rate implicit coding is highly compressed, so it can encode some advanced, rich, and decoupled semantics. These implicit coding can easily edit image properties through vector arithmetic in the latent space. But for high-rate latent coding, redundancy will cause latent coding coupling, and coding usually lacks semantic information (low-level), which makes it difficult to edit images effectively.

fb9bd3789ccab8ee50815c448b25d92b.png

To address this issue, this paper proposes a method called information consultation, which utilizes both low-rate and high-rate implicit coding. The model includes two encoders. The basic encoder compresses low-rate implicit coding to ensure the editability of the image; the reference encoder performs supplementary coding on the distortion information of the reconstructed image at a low rate to obtain a high-rate implicit coding. Missing details.

0857cf0b4d667938d3c1e7f73ad91f7b.png

The two parts of the latent code are integrated in the generator through a consultation fusion layer, and they are used together for image generation. Refer to the following figure for the fusion layer:

382944870f1ea2457baba03936e78c81.png

Due to the lack of paired edited images for training, the authors also propose a corresponding self-supervised training method and an Adaptive Distortion Correction Module (ADA).

3. Experimental results

The paper provides comparative results on photos of faces and vehicles. The first is a comparison with encoding-based methods:

e6592d6d79e874c0cb6ebbe69cf1cada.png

Then there is the comparison with the optimization method and the hybrid method:

4496d72d9fb4ba85dd3e99aff7e7b486.png

And a quantitative comparison:

ec49841e14120eb2b34a8aaf14e0ca9c.png

4. More results

This method can be used not only for image editing, but also for video editing. More results can be found on the author's homepage: https://tengfei-wang.github.io/HFGI/

caaee79087ad1e11de159750becff82a.gif

95e90cad9c2ee30ff85fd2a0ea36bcc7.gif

4b7806cb4ce17e2a827713cc4b2d5a4e.gif

b3801b56ab770a2e3cbc354709c5533e.gif

f775d4ae7e19e212a874703eb66a4561.png

5. Play online

Such a fun method, do you want to experience it with photos of yourself or your friends? The author provides an online demo, you can upload pictures yourself or take pictures with a camera for editing.

Online trial address: https://replicate.com/tengfei-wang/hfgi

c2fa5127848cee8b8813ce941fefaa98.png
 
  
 
  
ICCV和CVPR 2021论文和代码下载

后台回复:CVPR2021,即可下载CVPR 2021论文和代码开源的论文合集

后台回复:ICCV2021,即可下载ICCV 2021论文和代码开源的论文合集

后台回复:Transformer综述,即可下载最新的3篇Transformer综述PDF
CVer-Transformer交流群成立
扫描下方二维码,或者添加微信:CVer6666,即可添加CVer小助手微信,便可申请加入CVer-Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信: CVer6666,进交流群
CVer学术交流群(知识星球)来了!想要了解最新最快最好的CV/DL/ML论文速递、优质开源项目、学习教程和实战训练等资料,欢迎扫描下方二维码,加入CVer学术交流群,已汇集数千人!

▲扫码进群
▲点击上方卡片,关注CVer公众号

整理不易,请点赞和在看

Guess you like

Origin blog.csdn.net/amusi1994/article/details/123785426