Plug and Play Series | PromptIR: MBZUAI Proposes a Prompt-Based Almighty Image Restoration Network

Title: PromptIR: Prompting for All-in-One Blind Image Restoration
PDF: arxiv.org/pdf/2306.13…
Code: github.com/va1shn9v/pr…

guide

Image restoration is the process of recovering high-quality, clear images from their damaged versions. Deep learning-based methods significantly boost image restoration performance, however, they have limited generalization ability over different types and levels of degradation. This limits their usefulness in practical applications, since models need to be trained individually for each specific degradation and know the type of degradation of the input image in order to apply the corresponding model. This paper introduces a prompt-based learning method, called PromptIR, for omnipotent image restoration that can efficiently recover images from various types and levels of degradation. Specifically, our method uses hints to encode degradation-specific information and dynamically guides the restoration network. This enables our method to generalize to different types and levels of degradation and achieves state-of-the-art results in image denoising, rain removal and haze removal. Overall, PromptIR provides a general and efficient plug-in module that can be used to restore damaged images of various types and levels with only a few lightweight hints, without prior knowledge of the damage information present in the image.

introduction

During image acquisition, various degradation phenomena such as noise, blur, haze, rain, etc. often occur, which are usually caused by the physical limitations of the camera or unsuitable environmental conditions. Deep neural network based methods have different approaches in solving the image restoration problem. Some methods introduce explicit task-specific knowledge in the network to handle the corresponding restoration tasks, such as denoising, deblurring and dehazing. However, these methods lack generalization beyond specific degradation types and degrees. Therefore, it is urgent to develop an all-in-one method that can effectively restore various types and degrees of degraded images.

A recent approach, AirNet, addresses the all-in-one restoration task by employing a contrastive learning paradigm. This involves training an additional encoder to distinguish between various types of image degradation. Although AirNet achieves state-of-the-art results, it has difficulty in modeling fully decoupled representations of different pollution types. Furthermore, using an additional encoder for contrastive learning results in a higher training burden since a two-stage training approach is required.

::: block-1 figure 1.

PromptIR proposes a plug-and-play hint module that implicitly predicts hints related to degradation conditions to guide the restoration process of input images with unknown degradations. Guidance from cues is injected into multiple decoding stages of the network with a small number of learnable parameters. This enables learning an all-purpose unified model that can perform well in multiple image restoration tasks such as rain removal, dehazing, and noise reduction. :::

To overcome these challenges, this paper proposes a hint learning based approach to perform all-in-one image restoration. The method utilizes hints (a set of tunable parameters) that are used to encode important distinguishing information about various types of image degradation (shown in Figure 2 below). By interacting cues with the feature representations of the main restoration network, we dynamically enhance the representations to obtain an adaptation with degradation-specific knowledge, which enables the network to efficiently restore images by dynamically adjusting its behavior.

::: block-1figure 2.

The figure shows the tSNE plots of the degenerated embeddings used in PromptIR and the state-of-the-art AirNet. Different colors indicate different degradation types. The embeddings of each task are better clustered together, showing the effectiveness of cue labeling to learn a discriminative degraded context, thereby facilitating the restoration process. :::

Key highlights of this article include:

  • In this paper, we propose PromptIR, an all-in-one hint-based restoration framework, which only relies on the input image to recover a clean image without any prior knowledge about the degradation present in the image.
  • This paper prompts the block to be a plug-in module that can be easily integrated into any existing recovery network. It consists of a Prompt Generation Module (PGM) and a Prompt Interaction Module (PIM). The goal of the hint block is to generate input-conditioned hints (via PGM) that have useful contextual information to guide the restoration network (via PIM) to efficiently remove corruptions in the input image.
  • This paper experimentally demonstrates the dynamic adaptation behavior of PromptIR, achieving state-of-the-art performance on various image restoration tasks including image denoising, deraining, and dehazing.

method

::: block-1image 3.

PromptIR方法在编码和解码阶段使用了UNet网络架构,其中包含了Transformer块。该框架的主要组件是提示块,由两个模块组成:提示生成模块(PGM)和提示交互模块(PIM)。提示生成模块使用输入特征Fl和提示组件生成与输入条件相关的提示P。然后,提示交互模块通过Transformer块使用生成的提示动态调整输入特征。提示与解码器特征在多个级别交互,以丰富特定于退化的上下文信息。 :::

PromptIR使用提示块来生成可学习的提示参数,并在恢复过程中利用这些提示来指导模型。框架通过逐级编码器-解码器将特征逐步转换为深层特征,并在解码器中引入提示块来辅助恢复过程。提示块在解码器的每个级别中连接,隐式地为输入特征提供关于退化类型的信息,以实现引导恢复。总体来说,PromptIR框架通过逐级编码和解码以及引入提示块的方式实现图像恢复任务。

Prompt Block

本文提出的PromptIR方法借鉴了在自然语言处理和计算机视觉任务中使用的基于提示的技术。在这些任务中,基于提示的技术已经被用于对在源任务上训练的大型固定模型进行参数高效微调,以适应目标任务。基于提示的技术之所以有效,是因为它们能够有效地将任务特定的上下文信息编码到提示组件中。在PromptIR中,提示组件是可学习的参数,与输入特征进行交互,以丰富它们的退化类型信息。提示块由两个关键组件组成:提示生成模块(PGM)和提示交互模块(PIM)

Prompt Generation Module (PGM)

提示组件 P c P_c 是一组可学习的参数,与输入特征交互,嵌入了退化信息。一种直接的特征-提示交互方法是直接使用学习到的提示来校准特征。然而,这种静态方法可能会产生次优结果,因为它对输入内容是无知的。因此,本文提出了提示生成模块(PGM),它从输入特征中动态预测基于注意力的权重,并将这些权重应用于提示组件,生成与输入条件相关的提示 P P 。此外,PGM创建了一个共享空间,促进了提示组件之间的相关知识共享。

为了从输入特征 F l F_l 生成提示权重,PGM首先对空间维度进行全局平均池化(GAP),生成特征向量 v R C ^ v \in \mathbb{R}^{\hat{C}} 。接下来将 v v 通过通道缩减的卷积层,得到一个紧凑的特征向量,然后进行softmax操作,从而得到提示权重 w R N w \in \mathbb{R}^N 。最后使用这些权重对提示组件进行调整,接着应用一个 3 × 3 3 \times 3 的卷积层。总体而言,PGM的过程可以概括为:

P = Conv3x3 ( c = 1 N w i P c ) , w i = Softmax ( Conv1x1 ( GAP ( F l ) ) ) ( 2 ) P = \text{Conv3x3}\left(\sum_{c=1}^{N} w_i P_c\right), \quad w_i = \text{Softmax}\left(\text{Conv1x1}\left(\text{GAP}(F_l)\right)\right) \quad (2)

由于在推理阶段,恢复网络需要能够处理不同分辨率的图像,不能使用具有固定尺寸的提示组件 P c P_c 。因此,作者对提示组件进行双线性插值操作,将其放大到与输入特征相同的尺寸。

Prompt Interaction Module (PIM)

PIM的主要目标是实现输入特征 F l F_l 和提示 P P 之间的交互,以实现有指导的恢复过程。

在PIM中,沿着通道维度将生成的提示与输入特征进行拼接。接下来将拼接后的表示通过一个Transformer块进行处理,该块利用提示中编码的退化信息来转换输入特征。

本文的主要贡献是提示块,它是一个插件模块,与具体的架构无关。因此,在提出的PromptIR框架中,作者使用了现有的Transformer块,而不是开发一个新的块。Transformer块由两个顺序连接的子模块组成:多转置卷积头转置注意力(MDTA)和门控转置卷积前馈网络(GDFN)。MDTA在通道而不是空间维度上应用自注意操作,并具有线性复杂度。GDFN的目标是以可控的方式转换特征,即抑制信息较少的特征,只允许有用的特征在网络中传播。PIM的整体过程为:

F ^ l = Conv3x3 ( GDFN ( MDTA [ F l ; P ] ) ) ( 3 ) \hat{F}_l = \text{Conv3x3}\left(\text{GDFN}\left(\text{MDTA}[F_l; P]\right)\right) \quad (3)

其中 [ ; ] [ ; ] 表示拼接操作。MDTA的公式为 Y = W p V Softmax ( K Q / α ) + X Y = W_p V \cdot \text{Softmax}(K \cdot Q/\alpha) + X ,其中 X X Y Y 分别表示输入和输出特征。 Q Q K K V V 分别表示通过应用 1×1 点卷积后跟随 3×3 深度卷积在层归一化的输入特征图上获得的查询、键和值的投影。 W p W_p 是点卷积, α \alpha 是可学习的缩放参数, ( ) (\cdot) 表示点积交互。GDFN的过程定义为 Z = W p 0 ( ϕ ( W d 1 W p 1 ( LN ( Y ) ) ) W d 2 W p 2 ( LN ( Y ) ) ) + Y Z = W_p^0\left(\phi\left(W_d^1 W_p^1(\text{LN}(Y))\right) \odot W_d^2 W_p^2(\text{LN}(Y))\right) + Y 。其中, W d ( ) W_d^{(\cdot)} 是 3×3 的深度卷积, \odot 表示逐元素乘法, ϕ \phi is the GELU nonlinear activation function, LN \text{LN} is layer normalization.

experiment

::: block-1

Table 1: Comparison under the omnipotent restoration setting: using a single model trained on a combined dataset of images from different degradation types. When averaging over different tasks, PromptIR has a significant gain of 0.86 dB over the previous all-around method AirNet. :::

::: block-1

Figure 4: Dehazing comparison against the all-around method on the SOTS dataset. The results generated by PromptIR are visually better than the previous state-of-the-art method AirNet. :::

::: block-1

Figure 5: Image deraining comparison against the almighty method based on the Rain100L dataset. The PromptIR method effectively removes raindrops, producing images free of rain marks. :::

in conclusion

Existing image restoration models based on deep neural networks are usually only suitable for specific degradation types, and have limited generalization ability on other degradation types. However, in practical applications, a single unified model is required to handle multiple degradation types, rather than relying on a specific degradation type model, which lacks generalization ability and requires prior knowledge of the specific degradation type in the input. To this end, this paper proposes a plug-and-play hint block that can interact with input features and dynamically adjust representations, making the recovery process adaptable to various degradation tasks of interest. By integrating hint blocks into state-of-the-art restoration models, this paper demonstrates the utility of hint blocks in almighty image restoration, achieving significant improvements on image denoising, deraining, and dehazing tasks.

Guess you like

Origin juejin.im/post/7258526520167252005