Click the card below to follow the " CVer " official account
AI/CV heavy dry goods, delivered in the first time
Click to enter —> [Computer Vision and Transformer] Exchange Group
Author: Oliiiver (Source: Zhihu) | Editor: CVer Public Account
https://zhuanlan.zhihu.com/p/646574198
Context-aware Pretraining for Efficient Blind Image Decomposition
Article link:
https://openaccess.thecvf.com/content/CVPR2023/papers/Wang_Context-Aware_Pretraining_for_Efficient_Blind_Image_Decomposition_CVPR_2023_paper.pdf
Authors: Chao Wang, Zhedong Zheng, Ruijie Quan, Yifan Sun and Yi Yang
Code: https://github.com/Oliiveralien/CPNet
introduction:
Traditional weather inpainting tasks such as rain removal, snow removal, and fog removal have achieved good performance in their respective domains. However, these works usually focus on a single type of weather, as shown in the figure:
Severe weather in the real world (raindrops, rainflow, fog, snow, etc.) often appear in combination, which poses a higher challenge for clean image restoration. In response to this problem, J. Han and others proposed Blind Image Decomposition (BID) [1], which regards different weathers as random combinations, random intensities, and superimposed layers, and restores all images by image decomposition [2]. Overlay elements (including clean images). In fact, BID can be understood as a combined task of Deraining, Dehazing, Raindrop and Snow Removal, etc.
Overview:
Since the BID task was proposed (BIDeN, ECCV22), there are still two main problems:
Existing methods require data-rich supervision, yet high-quality image pairs are often unavailable in harsh real-world scenarios. While synthetic datasets are constrained by handcrafted weather degradation models, which inevitably limit the quality of the final inpainting.
Image decomposition aims to restore all layers including noisy weather through complete supervision signals. The multi-head decoder pointing to each layer will limit the structural design and optimization of the model (as shown in the above figure (e)), and the "decomposition" Schemas also seem to contradict classic image-map thinking somewhat.
In order to solve the above problems, we combined Masked AutoEncoder (MAE) [3] to propose an efficient and simple pre-training model: Context-aware Pretraining (CP), which contains two pretext tasks: hybrid image separation
and loss graph reconstruction.
Assuming that the restoration process of the image follows the pattern from structure to texture (coarse-to-fine) [4], our idea is very simple. First, we use two pretext tasks to reconstruct the structural information in the pre-training stage, and then the fine-tuning stage You can quickly fill in textures based on structures.
method:
In order to verify the effectiveness of the proposed pretext task, we constructed a baseline network Context-aware Pretrained Network (CPNet). CPNet consists of two transformer-based encoders, an information fusion module and a predictive decoder.
During pre-training, we mix two pretext tasks and obtain context-aware embeddings from the encoder, and then apply the decoder to recover the original structural information (RTV smoothing) from the embeddings [5]. Among them, the information fusion module explicitly exploits the correlated features in the spatial channel dimension, while the multi-head prediction module facilitates the texture-guided appearance flow.
Despite its simplicity, self-supervised pre-trained encoders explicitly facilitate context-based feature learning while reducing the need for annotation. Through the Gaussian sampling in the Fine-tuning stage, the appearance flow can explicitly utilize the texture features of the unoccluded regions in the original image.
In terms of loss function, in addition to traditional reconstruction and confrontation loss, we propose a new sampling loss for appearance flow map:
The numerator item ensures the texture matching after the appearance flow passes through the offset of (Δx, Δy), and the denominator puts further constraints on the pre-trained structure restoration.
In addition, the BID task can also be regarded as a type of image attribute editing task (image translation) [6], in which the initial attribute label is a random one-hot encoding (the weather combination is random, 1 means that the image is affected by this weather), and the final goal It is an attribute label of all 0s. Therefore we additionally introduce a conditional loss:
Where P_i(x) represents the probability that image x has the i-th weather feature.
experiment:
1. Quantitative experiment:
Based on our model, we have made some finetune, which can be better improved in downstream tasks, especially for scenes with complex mixed weather, the more obvious the benefits of pre-training are.
Compared with the multi-head decoder training mode, which is also trained on multi-weather mixed data sets, our model still maintains good stability on the test set of a specific single weather.
2. Qualitative experiments
After training, we observed that the network can achieve better recovery in a variety of weather mixed scenarios, as shown in the figure below.
In addition, we also tried to visualize the regions with large activation values in the two encoder features. It can be seen that the focus of the two pretext tasks of image separation and reconstruction is not the same, and by controlling the target attribute label, the retention and removal of specific coatings can be achieved.
reference
Blind Image Decomposition https://arxiv.org/abs/2108.11364
Deep adversarial decomposition: A unified framework for separating superimposed images https://github.com/jiupinjia/Deep-adversarial-decomposition
Masked autoencoders are scalable vision learners https://arxiv.org/abs/2111.06377
StructureFlow: Image Inpainting via Structure-aware Appearance Flow https://arxiv.org/abs/1908.03852
Structure extraction from texture via relative total variation https://dl.acm.org/doi/abs/10.1145/2366145.2366158
StarGAN: Unified Generative Adversarial Networks for Multi-Domain Image-to-Image Translation https://arxiv.org/abs/1711.09020
Click to enter —> [Computer Vision and Transformer] Exchange Group
ICCV/CVPR 2023 Paper and Code Download
Background reply: CVPR2023, you can download the collection of CVPR 2023 papers and code open source papers
后台回复:ICCV2023,即可下载ICCV 2023论文和代码开源的论文合集
目标检测和Transformer交流群成立
扫描下方二维码,或者添加微信:CVer333,即可添加CVer小助手微信,便可申请加入CVer-目标检测或者Transformer 微信交流群。另外其他垂直方向已涵盖:目标检测、图像分割、目标跟踪、人脸检测&识别、OCR、姿态估计、超分辨率、SLAM、医疗影像、Re-ID、GAN、NAS、深度估计、自动驾驶、强化学习、车道线检测、模型剪枝&压缩、去噪、去雾、去雨、风格迁移、遥感图像、行为识别、视频理解、图像融合、图像检索、论文投稿&交流、PyTorch、TensorFlow和Transformer、NeRF等。
一定要备注:研究方向+地点+学校/公司+昵称(如目标检测或者Transformer+上海+上交+卡卡),根据格式备注,可更快被通过且邀请进群
▲扫码或加微信号: CVer333,进交流群
CVer计算机视觉(知识星球)来了!想要了解最新最快最好的CV/DL/AI论文速递、优质实战项目、AI行业前沿、从入门到精通学习教程等资料,欢迎扫描下方二维码,加入CVer计算机视觉,已汇集数千人!
▲扫码进星球
▲点击上方卡片,关注CVer公众号
It's not easy to organize, please like and watch