NAFNet (ECCV 2022) - Interpretation of Image Restoration Papers


Paper: "Simple Baselines for Image Restoration"
github: https://github.com/megvii-research/NAFNet

Solve the problem

At present, the complexity of the SOTA method in the field of image restoration is relatively high, which is not conducive to analysis and method comparison; the author proposed a simple network NAFNet, found that the activation function is not necessary, and obtained SOTA in GoPro and SIDD.

algorithm

background

There are usually two ways between blocks, as shown in Figure 2:
1. Connection between different feature maps;
2. Multi-stage network, the latter stage refines the results of the previous stage;
insert image description here
the design within the block is shown in Figure 3a. Multiple deconvolution attention mechanisms, gated deconvolution, swin transformer block, HINBlock
insert image description here
In order to simplify the network, the author uses the conventional U-Net structure in Figure 2c

Simple Baseline

Plain Block

In order to simplify the neural network, the author proposes PlainNet, using the most common modules, as shown in Figure 3b. The reason why Transformer is not used is because:
1. Some work claims to achieve SOTA effect, and Transformer is not necessary;
2. Depthwise conv is better than self-attention The mechanism is simpler;

Normalized

BN (N/H/W channel calculation mean variance) is unstable for small batch statistics, IN can avoid this problem, but some work shows that IN (H/W channel calculation mean variance) does not always bring positive benefits, Requires finetune. With the success of Transformer, LN (C/H/W channel calculation mean variance) is used in more and more methods, so the author uses LN to Plain Block to stabilize the training process.

activation function

There is a trend in the current SOTA method: using GELU to replace ReLU, while maintaining image denoising performance, it also brings gain to image deblurring.

Attention mechanism

The amount of computation of the native self-attention mechanism increases quadratically with the size of the feature map. The swin transformer is performed in a fixed-size local window, which can alleviate the problem of increased computation. However, due to the lack of global information, the native channel attention (SE Network) meets the requirements: efficient computation, At the same time, it has global information;

Summarize

The Simple Baseline structure is shown in Figure 2c and 3c. Each component is very common, such as: LN, GELU, CA, but the Baseline composed of them surpasses the previous SOTA.

NAFNet

The author wants to further simplify the baseline in Figure 3c under the premise of ensuring performance, and found that all SOTA methods use GLU.

SimpleGate replaces GELU

GLU is like formula 1, and GELU is like formula 2. Through comparison, it is found that GELU is a special case of GLU, and GLU itself contains nonlinearity. Based on this, the author proposes SimpleGate, such as formula 4, Figure 4c, which divides the feature map in the channel dimension. The two parts are then multiplied pixel by pixel.
insert image description here
insert image description here
insert image description here
insert image description here

SCA replaces CA

The CA process is shown in Figure 4a and Equation 5, which can be rewritten as Equation 6 and simplified to a GLU-like form. The author only retains two important parts of CA: aggregation of global information and channel dimension information interaction, and proposes SCA , such as Equation 7
insert image description here
insert image description here
insert image description here

Summarize

In Simple Baseline,
1. SimpleGate replaces GELU;
2. SCA replaces CA;
the simplified NAFNe is obtained, which does not contain any activation functions (ReLU, GELU, Sigmoid)

experiment

The ablation experiment from PlainNet to simple baseline is shown in Table 1;
insert image description here
the ablation experiment from simple baseline to NAFNet is shown in Table 2, and the performance is improved at the same time, and the speed is accelerated;
insert image description here
Table 4 verifies the influence of different block numbers. When it is increased to 72, the performance improvement is not obvious;
insert image description here
Table 5 verifies the influence of different activation functions in SimpleGate, and found that the activation function is not necessary;
insert image description here

application

RGB image denoising

Table 6 shows that compared with the SOTA method in the direction of image denoising, it surpasses the best result Restorer by 0.28dB, and the amount of calculation is greatly reduced. As shown in Figure 5, the method proposed by the author can repair more details;
insert image description here
insert image description here

image deblurring

The comparison between the GoPro dataset graph and the SOTA method is shown in Table 7, and the visualization results are shown in Figure 6
insert image description here
insert image description here

RAW image denoising

The experimental results are shown in Table 8, and the visualization results are shown in Figure 7.
insert image description here
insert image description here

in conclusion

The author analyzed the baseline and found that the nonlinear activation function is not necessary. The proposed NAFNet has no linear activation. Although the structure is simple, the performance has not declined.

Guess you like

Origin blog.csdn.net/qq_41994006/article/details/127859059