Attention-Guided CNN for Image Denoising

Attention-Guided CNN for Image Denoising

Journal published: Neural Networks 124 (2020) 117–129 https://doi.org/10.1016/j.neunet.2019.12.024

Paper and Code Sci-Hub real-time update: https://tool.yovisun.com/scihub/

Abstract: Propose an attention-oriented denoising convolutional neural network (ADNet), which mainly includes sparse block (SB), feature enhancement block (FEB), attention block (AB) and reconstruction block (RB) image denoising .

SB removes noise by using dilation and normal convolution, thereby making a trade-off between performance and efficiency.

FEB integrates global and local feature information through a long path to enhance the expressive ability of the denoising model.

AB is used to finely extract the noise information hidden in the complex background, and is very effective for complex noise images (real noise images) and binding denoising. In addition, the integration of FEB and AB can improve efficiency and reduce the complexity of training noise reduction models.

RB aims to construct a clear image through the obtained noise map and a given noise image.

ADNet performs well in three tasks (ie, synthetic and real noise images and blind noise reduction).

Contributions: (1) SB composed of dilated convolution and ordinary convolution is proposed to reduce the depth to improve denoising performance and efficiency.
(2) FEB uses long paths to fuse information from the shallow and deep layers to enhance the expressive ability of the denoising model.
(3) AB is used to deeply mine the noise information hidden in the complex background from the given noise image, such as real noise image and blind noise reduction.
(4) The integration of FEB and AB can improve efficiency and reduce the complexity of training noise reduction models.
(5) On the six benchmark data sets, ADNet is superior to the latest technology (2020) in terms of synthetic and real noise images and blind noise reduction.

As shown in Fig. 1, the 17-layer ADNet consists of four blocks, namely SB, FEB, AB and RB. The 12-layer sparse block SB is used to enhance the performance and efficiency of image denoising.
Fig. 1 Fig. 1
loss function :
Insert picture description here

SB: The
12-layer SB includes two types: dilated Conv+BN+ReLU and Conv+BN+ReLU. dilated Conv+BN+ReLU represents a convolution with a dilation rate of 2, BN and the activation function ReLU are connected. The other is Conv, BN and ReLU are connected. The dilated Conv+BN+ReLU is located in the second, fifth, ninth and twelfth layers of SB (purple in Figure 1), and these layers can be regarded as high-energy points. Conv+BN+ReLU is in the first, third, fourth, sixth, seventh, eighth, tenth and eleventh layer (green in Figure 1), which is the low energy point. The size of the convolution filter for layers 1-12 is 3 × 3. The input of the first layer is c: the number of channels of the input noise image. The input and output of layers 2-12 are 64. The combination of several high-energy points and several low-energy points can be considered sparsity. The realization of the sparse block is converted to formula 6, D represents the function of dilated convolution. R and B stand for ReLU and BN, respectively. CBR is a function of Conv+BN+ReLU. According to the previous description, use the following equation to express SB.
OSB = R (B (D (CBR (CBR (R (B (D (CBR (CBR (R (B (D (CBR (CBR (R (B (D (CBR (IN)))))) )) ))))))))))))) (6) O^{SB}= R(B(D(CBR(CBR(R(B(D(CBR(CBR(CBR(R(B(D( CBR(CBR(R(B(D (CBR(I^N))))))))))))))))))))) (6)THESB=R(B(D(CBR(CBR(R(B(D(CBR(CBR(CBR(R(B(D(CBR(CBR(R(B(D(CBR(IN ))))))))))))))))))))(6)
FEB:(Deep network may be affected by shallow weakening)
FEB takes full advantage of global and local features through a long path Mining more robust features, which is complementary to SB in processing a given noisy image. The 4-layer FEB consists of three types: Conv+BN+ReLU, Conv, and Tanh, where Tanh is the activate function. Conv+BN+ReLU is on layer 13-15, filter size=64×3×3×64. Conv is used for the 16th layer, ilter size=64×3×3×c. The 17th layer uses concatenation operation to fuse the input noise image and the 16th layer's output to enhance the denoising model's expressive ability. Therefore, the final output size is 64×3×3×2c. In addition, Tanh is used to convert the acquired features into nonlinearity. The process is described and explained as in Equation 7. OFEB = T (C at (C (CBR (CBR (CBR (OSB)))), IN)) (7) O^{FEB}= T(Cat(C(CBR(CBR(CBR(O^{SB) )))), I^N)) (7)THEFEB=T(Cat(C(CBR(CBR(CBR(OSB)))),IN ))(7)where C, Cat and T are functions of convolution, cascade and Tanh respectively. In Figure 1, Cat is used to represent the connection function. In addition, OFEB is also used for AB.

AB: (complex background can easily hide the characteristics of images and video applications)
AB uses the current stage to guide the previous stage to learn noise information, which is very useful for unknown noise images, that is, blind denoising and real noise images. The 1-layer AB only includes one Conv with a size of 2c × 1 × 1 × c, where c is the number of channels for a given damaged image. AB uses the following two steps to implement the attention mechanism. The first step uses the convolution of size 1 × 1 from the seventeenth layer to compress the obtained features into vectors as the weight of the previous stage, which can also improve the denoising efficiency. The second step uses the weight obtained to multiply the output of the sixteenth layer to extract more significant noise features. The process can be converted into the following formula.
Insert picture description here
Training datasets: The
training dataset uses 400 images of 180 × 180 from the Berkeley Segmentation Dataset (BSD) and 3,859 images from the Waterloo Exploration Database to train the Gaussian synthesis denoising model. Different areas of the image contain different detailed information, so the training noise image is divided into 1,348,480 small blocks with a size of 50 × 50, which helps promote more robust features and improve the efficiency of training denoising models; the disadvantage is that the noise is in reality The world is changing and complex. For this reason, 100 real noise images with a size of 512 × 512 from the benchmark data set (Xu, Li, Liang, Zhang, & Zhang, 2018) are used to train the real noise denoising model. In order to speed up the training, 100 real noisy images are also divided into 211,600 small blocks with a size of 50 × 50. In addition, each of the above training images is randomly rotated from one of the eight ways: original image, 90♀, 180♀, 270♀, the original image is flipped horizontally by itself, 90♀, flipped horizontally by itself, 180♀, flipped horizontally by itself , 270♀, flip itself horizontally.

Test data set:
Through 6 data sets, namely BSD68, Set12, CBSD68, Kodak24, McMaster and cc, which consist of 68, 12, 68, 24, 18 and 15 images respectively, to evaluate the denoising performance of ADNet. BSD68 and Set12 are gray images. The other data sets are color images. The scenes of BSD68 and CBSD68 are the same. The cc data set of real noise is collected from three different cameras, and the size of each real noise image is 512 × 512.

Insert picture description here
Insert picture description here
Insert picture description here
Insert picture description here

Guess you like

Origin blog.csdn.net/qq_35200351/article/details/108962037