Click the card below to follow the " CVer " public account
AI/CV heavy dry goods, delivered as soon as possible
Reprinted from: Jishi Platform Author丨Tengfei Wang
Source丨Hong Kong University of Science and Technology, Tencent AI Lab
Guided reading
This article introduces a work from HKUST and Tencent AI Lab that has been included in CVPR 2022. The work proposes a novel high-fidelity GAN inversion framework that enables attribute editing while preserving image-specific details such as background, appearance, and lighting. Not only the processing speed can reach 0.2s per image, but also the high fidelity and high quality of edited images can be guaranteed.
Paper: https://arxiv.org/abs/2109.06590
Code: https://github.com/Tengfei-Wang/HFGI
Homepage: https://tengfei-wang.github.io/HFGI/
Video: https://www.bilibili.com/video/BV1Xq4y1i7ev
With just one photo, this AI algorithm can quickly edit the image attributes according to the user's needs. Here is how the AI brain makes up the smile in the photo:
No more worrying about not being able to grasp the moment to smile when taking pictures. In addition to expressions, age and posture can also be changed at will:
And the edited image can retain the details of the original image with high fidelity, such as background, lighting, dress up.
Let's try to find a few familiar celebrity photos from the Internet, Musk ten years later, Lecun the smirk boy, and... Johnson with lipstick on?
The study has been included in CVPR 2022.
1. High-fidelity image editing based on GAN Inversion
The GAN inversion technique, which has been extensively studied recently, can map a photo into the latent space of a GAN generator, thereby leveraging the powerful capabilities of StyleGAN for image editing. Current GAN inversion methods fall into three categories:
Based on encoder (encoder-based) This method is very fast to edit pictures (each picture < 1s), but the edited picture will lose a lot of details in the original picture, and the fidelity is low.
Optimization-based This type of method iterates over each photo individually and is high fidelity , but very slow (a few minutes per image).
Hybrid methods first use the encoder to get an initialized latent variable, and then optimize each latent variable. The speed is between the first and second categories based on (tens of seconds to several minutes per image), but Still slow, affecting usability.
This leads to the need to make trade-offs and trade-offs when choosing a model. Should you choose a faster speed or a higher fidelity? For those who have difficulty choosing, it is simply too tangled!
So, how did this paper choose speed and quality? The answer is: all of them . Get a bear (supplement encoder) that can fish (missing details), and you can get both the fish and the bear's paw. While fast processing (0.2s per image), high fidelity and high quality of edited images can be guaranteed .
2. Methods
Before introducing the algorithm, the author first analyzes the reasons for the low fidelity of the encoder-based method for reconstruction or editing. The famous Rate-Distortion theory in information theory is mentioned here , that is, for an encoding-decoding system, the bit-rate of the latent code affects the fidelity of the reconstructed signal (the distortion of the reconstructed signal and the source signal). ) is restricted.
That is to say, the hidden code compressed by the previous encoder is very small (low-rate), usually 1x512 or 18x512, which will inevitably damage some information during the reconstruction process of the generator, causing large distortion. The reconstructed or edited image is distorted compared to the original image.
So, do we directly increase the size of the encoder output hidden code (high-rate), and the problem is solved? The answer is: yes and no. Doing so does improve the fidelity of the reconstructed image, but our purpose is to edit the image rather than reconstruct it. Low-rate implicit coding is highly compressed, so it can encode some advanced, rich, and decoupled semantics. These implicit coding can easily edit image properties through vector arithmetic in the latent space. But for high-rate latent coding, redundancy will cause latent coding coupling, and coding usually lacks semantic information (low-level), which makes it difficult to edit images effectively.
To address this issue, this paper proposes a method called information consultation, which utilizes both low-rate and high-rate implicit coding. The model includes two encoders. The basic encoder compresses low-rate implicit coding to ensure the editability of the image; the reference encoder performs supplementary coding on the distortion information of the reconstructed image at a low rate to obtain a high-rate implicit coding. Missing details.
The two parts of the latent code are integrated in the generator through a consultation fusion layer, and they are used together for image generation. Refer to the following figure for the fusion layer:
Due to the lack of paired edited images for training, the authors also propose a corresponding self-supervised training method and an Adaptive Distortion Correction Module (ADA).
3. Experimental results
The paper provides comparative results on photos of faces and vehicles. The first is a comparison with encoding-based methods:
Then there is the comparison with the optimization method and the hybrid method:
And a quantitative comparison:
4. More results
This method can be used not only for image editing, but also for video editing. More results can be found on the author's homepage: https://tengfei-wang.github.io/HFGI/
5. Play online
Such a fun method, do you want to experience it with photos of yourself or your friends? The author provides an online demo, you can upload pictures yourself or take pictures with a camera for editing.
Online trial address: https://replicate.com/tengfei-wang/hfgi
ICCV和CVPR 2021论文和代码下载
后台回复:CVPR2021,即可下载CVPR 2021论文和代码开源的论文合集
后台回复:ICCV2021,即可下载ICCV 2021论文和代码开源的论文合集
后台回复:Transformer综述,即可下载最新的3篇Transformer综述PDF
CVer-Transformer交流群成立
扫描下方二维码,或者添加微信:CVer6666,即可添加CVer小助手微信,便可申请加入CVer-Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer等。
一定要备注:研究方向+地点+学校/公司+昵称(如Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群
▲扫码或加微信: CVer6666,进交流群
CVer学术交流群(知识星球)来了!想要了解最新最快最好的CV/DL/ML论文速递、优质开源项目、学习教程和实战训练等资料,欢迎扫描下方二维码,加入CVer学术交流群,已汇集数千人!
▲扫码进群
▲点击上方卡片,关注CVer公众号
整理不易,请点赞和在看