CVPR 2023 | How to deal with typhoon days (rain/fog/snow, etc.) at one time?

Click the card below to follow the " CVer " official account

AI/CV heavy dry goods, delivered in the first time

Click to enter —> [Computer Vision and Transformer] Exchange Group

Author: Oliiiver (Source: Zhihu) | Editor: CVer Public Account

https://zhuanlan.zhihu.com/p/646574198

211b8c5134c23528525533c421da17f6.png

Context-aware Pretraining for Efficient Blind Image Decomposition

Article link:

https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Context-Aware_Pretraining_for_Efficient_Blind_Image_Decomposition_CVPR_2023_paper.pdf

Authors: Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun and Yi Yang

Code: https://github.com/Oliiveralien/CPNet

introduction:

Traditional weather inpainting tasks such as rain removal, snow removal, and fog removal have achieved good performance in their respective domains. However, these works usually focus on a single type of weather, as shown in the figure:

a28dc4aac3c82e5e9e71726ae4a6195e.png

Severe weather in the real world (raindrops, rainflow, fog, snow, etc.) often appear in combination, which poses a higher challenge for clean image restoration. In response to this problem, J. Han and others proposed Blind Image Decomposition (BID) [1], which regards different weathers as random combinations, random intensities, and superimposed layers, and restores all images by image decomposition [2]. Overlay elements (including clean images). In fact, BID can be understood as a combined task of Deraining, Dehazing, Raindrop and Snow Removal, etc.

Overview:

7116fe000cd3ced8ba411f5828c8ca2f.png

Since the BID task was proposed (BIDeN, ​​ECCV22), there are still two main problems:

  1. Existing methods require data-rich supervision, yet high-quality image pairs are often unavailable in harsh real-world scenarios. While synthetic datasets are constrained by handcrafted weather degradation models, which inevitably limit the quality of the final inpainting.

  2. Image decomposition aims to restore all layers including noisy weather through complete supervision signals. The multi-head decoder pointing to each layer will limit the structural design and optimization of the model (as shown in the above figure (e)), and the "decomposition" Schemas also seem to contradict classic image-map thinking somewhat.

066829f33858c1dd0b16a1839868ba45.png

In order to solve the above problems, we combined Masked AutoEncoder (MAE) [3] to propose an efficient and simple pre-training model: Context-aware Pretraining (CP), which contains two pretext tasks: hybrid image separation
and loss graph reconstruction.
Assuming that the restoration process of the image follows the pattern from structure to texture (coarse-to-fine) [4], our idea is very simple. First, we use two pretext tasks to reconstruct the structural information in the pre-training stage, and then the fine-tuning stage You can quickly fill in textures based on structures.

method:

In order to verify the effectiveness of the proposed pretext task, we constructed a baseline network Context-aware Pretrained Network (CPNet). CPNet consists of two transformer-based encoders, an information fusion module and a predictive decoder.

6c9b91a2efeffaeaa7169dc010523b87.png

During pre-training, we mix two pretext tasks and obtain context-aware embeddings from the encoder, and then apply the decoder to recover the original structural information (RTV smoothing) from the embeddings [5]. Among them, the information fusion module explicitly exploits the correlated features in the spatial channel dimension, while the multi-head prediction module facilitates the texture-guided appearance flow.
Despite its simplicity, self-supervised pre-trained encoders explicitly facilitate context-based feature learning while reducing the need for annotation. Through the Gaussian sampling in the Fine-tuning stage, the appearance flow can explicitly utilize the texture features of the unoccluded regions in the original image.

In terms of loss function, in addition to traditional reconstruction and confrontation loss, we propose a new sampling loss for appearance flow map:

8b6a6df99a0967b92109bc7477699a7d.png

The numerator item ensures the texture matching after the appearance flow passes through the offset of (Δx, Δy), and the denominator puts further constraints on the pre-trained structure restoration.

In addition, the BID task can also be regarded as a type of image attribute editing task (image translation) [6], in which the initial attribute label is a random one-hot encoding (the weather combination is random, 1 means that the image is affected by this weather), and the final goal It is an attribute label of all 0s. Therefore we additionally introduce a conditional loss:

0b2d8830d4030144a1be2ccf9727435e.png

Where P_i(x) represents the probability that image x has the i-th weather feature.

experiment:

1. Quantitative experiment:

Based on our model, we have made some finetune, which can be better improved in downstream tasks, especially for scenes with complex mixed weather, the more obvious the benefits of pre-training are.

42df46c163be1e283c08c081e4c3856a.png

Compared with the multi-head decoder training mode, which is also trained on multi-weather mixed data sets, our model still maintains good stability on the test set of a specific single weather.

0176c1cb7ba7bbc4fecf2cd326009fd4.png

2. Qualitative experiments

After training, we observed that the network can achieve better recovery in a variety of weather mixed scenarios, as shown in the figure below.

ff81cca58ae3699d502459bbfd5852d5.jpeg

In addition, we also tried to visualize the regions with large activation values ​​in the two encoder features. It can be seen that the focus of the two pretext tasks of image separation and reconstruction is not the same, and by controlling the target attribute label, the retention and removal of specific coatings can be achieved.

7d3813371d214532a3ea3ce4556c9bb8.jpeg

reference

  1. Blind Image Decomposition https://arxiv.org/abs/2108.11364

  2. Deep adversarial decomposition: A unified framework for separating superimposed images https://github.com/jiupinjia/Deep-adversarial-decomposition

  3. Masked autoencoders are scalable vision learners https://arxiv.org/abs/2111.06377

  4. StructureFlow: Image Inpainting via Structure-aware Appearance Flow https://arxiv.org/abs/1908.03852

  5. Structure extraction from texture via relative total variation https://dl.acm.org/doi/abs/10.1145/2366145.2366158

  6. StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation https://arxiv.org/abs/1711.09020

 
  

Click to enter —> [Computer Vision and Transformer] Exchange Group

ICCV/CVPR 2023 Paper and Code Download

 
  

Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers

后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群

▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!

▲扫码进星球
▲点击上方卡片,关注CVer公众号

It's not easy to organize, please like and watch553d1e51eb380499e799e1346a270738.gif

Guess you like

Origin blog.csdn.net/amusi1994/article/details/132310084