【Read the paper】GANMc

Paper: https://ieeexplore.ieee.org/document/9274337

If there is any infringement, please contact the blogger

In the past few days, I read another paper on GAN's implementation of infrared fusion. Not surprisingly, it was written by someone from the FusionGAN author team. Compared with the previous paper on GAN's implementation of infrared image fusion, this paper proposed a new some solution ideas. Let’s take a look.
Insert image description here

a brief introdction

I have read several papers on image fusion, got a little started in this field, and seen various methods. I have to say that the big guys are really good at it.
Insert image description here
The paper I am going to talk about today is based on GAN. The most important point this paper provides me is the processing of preserving texture details and contrast. Most of the papers we have read before deal with this aspect only to preserve the texture information of the visual image and the contrast of the infrared image, but as the author of this paper said, the contrast of the visual image and the texture of the infrared image Information also deserves our attention. As shown in the picture below, the left is the visual image, and the right is the infrared image. If
Insert image description here
you look carefully at the information in the picture above, you will find that the leaf texture information in the right infrared image in the first row is better preserved, and the contrast of the visual image in the second row is better. When it gets stronger, things start to get interesting. Let’s talk about this paper bit by bit.
Insert image description here

Network structure

Let’s first look at the overall network structure
Insert image description here

Compared with the network structure of DIVFusion, it is quite simple. Next, we will understand the components of the network bit by bit.

Builder

Insert image description here
The above figure shows the network structure of the generator. The input of the generator is divided into two paths, namely the gradient path and the contrast path. The gradient path contains two visual images and one infrared image, and the contrast path contains two infrared images and one visible image. view image. The same as FusionGAN, the input image here is padded to 132x132 size to ensure that the final generated image has the same size as the input image.

The inputs of the two paths first go through four layers of convolution (the specific contents of the convolution kernel, activation function and batch normalization are all represented in the figure), extract features, and then connect the features extracted from the two paths together. A 1x1 convolution and activation generates the target image.

There is a very interesting thing here. The input to the generator here is not a single visual image and infrared image, but a stack of multiple such images.

discriminator

Insert image description here
The network structure of the discriminator is shown in the figure above. If you are careful, you will find that the final output is not the same as the GAN we saw before.

Recall FusionGAN and DDcGAN, you will find that the discriminator of both ultimately outputs only a one-dimensional probability. Even a dual discriminator like DDcGAN, the final output is only a one-dimensional probability, while the output of the discriminator in GANMcC is It is a two-dimensional data.

So why is it designed this way?

The logic of the author of the paper here is slightly different from that of others. The two-dimensional data output by the discriminator respectively represent the probability that the input image is a visual image and the probability that the input image is an infrared image.

So how to apply this two-dimensional data?

Now think about the role of this model, which is to generate a fused image containing more texture information and contrast information, and bring it into the GAN architecture. Does it mean that we hope that the fused image contains more texture information, which means that the discriminator thinks that the fused image is The higher the probability of the visible image , the better, and the same is true for the visual image, that is, when the two probabilities of the fused image input to the discriminator are both large, our fusion effect is very good. We will talk about this process in detail in the loss function.

Now go back to the title, and you will find that the multi-category is here

loss function

generator loss function

Insert image description here
The overall loss function of the generator is shown in the figure above. The first part is the texture and contrast loss, and the second part is the adversarial loss with the discriminator.

The L Gcon here is relatively complicated. As we said before, on the one hand, we need to ensure the texture of the visible image and the contrast of the infrared image. On the other hand, we also need to ensure the contrast of the visual image and the texture of the infrared image.

Let’s first talk about the two loss functions of L Gcon. The function of these two loss functions is to ensure that the texture features of the visual image and the contrast information of the infrared image are included in the fused image.

The following formula is used to ensure that the fused image contains as much contrast information in the infrared image as possible (image pixel intensity is used to ensure contrast information). The following formula is used to
Insert image description here
ensure that the fused image contains more texture information in the visible image (here gradient information is used To ensure texture information)
Insert image description here
it is not over here. As mentioned earlier, we also need to retain the texture information in the infrared image and the contrast information in the visual image, so we also need to design a loss function to retain these two types of information, as follows . The same as the above formula, that is, the object for calculating the gradient becomes an infrared image, and the object for calculating the intensity (contrast information) becomes a visible image
Insert image description here

The final big summary
Insert image description here
where β1 > β4, β2 > β3, {β2, β3} > {β1, β4}

So why set β like this?

Here the author explains in the paper that first β1 is the parameter of the loss function between the contrast of the fused image and the infrared image, and β4 is the parameter of the loss function between the contrast of the fused image and the visible image, because the contrast information we want to retain mainly comes from Infrared image, so β1 > β4; similarly, β2 > β3 should be set.

So why set {β2, β3} > {β1, β4}? The author mentioned in the paper that the value of the gradient loss term is often smaller than the contrast loss term. In order to ensure that the texture information and contrast information can be balanced during the training process, it is necessary to set {β2, β3} > {β1, β4}, That is, the parameters of texture loss are set larger than the parameters of contrast.

So far, the generator has finished talking about the loss function that guarantees gradient information and contrast information alone.

Because the network architecture used in the article is GAN, it also needs to compete with the discriminator. The loss function is as follows

Insert image description here
Observing the picture of the entire architecture above, you will find that the output is a two-dimensional vector. The data at the first position of the vector represents the probability that the input image is a visible image, that is, D(Ifuse)[1]; the vector The data at the second position represents the probability that the input image is an infrared image, that is, D(Ifuse)[2].

In this way, the above loss function is easier to understand. Because we want the discriminator to think that the fused image is a visual image, and also think that the fused image is an infrared image, d here is set to 1, so that the result after training will make the fused image look like both a visual image and an infrared image.

discriminator

The overall loss function of the discriminator is as follows.
Insert image description here
From left to right are the loss of visual image discrimination, the loss of infrared image discrimination and the loss of fused image discrimination.

What is their role?

It is clear that the loss of visual (infrared or fused) image discrimination is to help the discriminator have a stronger ability to identify and judge whether it is a visible (infrared or fused) image. Combining these three together will make the discriminator Has better ability to identify visual images, infrared images and fused images.
Insert image description here
Let’s first look at the loss function of visual image discrimination ability. Here you will find that there is another function, what are Pvis and Pir? Don’t worry, in fact, Pvis corresponds to D(Ifuse)[1] mentioned in the generator we mentioned earlier, and Pir corresponds to D(Ifuse)[2].

Now think about it, if you want the discriminator to improve its ability to recognize images, does it mean that you want to input a visual image, and the output Pvis is as close to 1 as possible, and Pir is as close to 0 as possible, if you think so, Congratulations, you are right. Here
a1 is set to 1 and a2 is set to 0.
Insert image description here
The above loss function is to help the discriminator improve its ability to distinguish infrared images. It is the same as the previous loss function. Here, b1 is set to 0 and b2 is set to 1. The reason can be compared to the previous visual image recognition loss function.
Insert image description here
The last loss function is to help the discriminator improve its ability to recognize fused images. From the perspective of the discriminator, the images are divided into three categories, namely visual images, infrared images and fused images, but there are only two probabilities above (the probability that the image is a visual and infrared image), so what should be done ? What is the probability of identifying the image as a fused image?

Just imagine here, if the probability of the visible image and the probability of the infrared image output by the discriminator after processing an image are very small , does it mean that this image is not a visual image or an infrared image in the eyes of the discriminator , but in the discriminator There are three categories of images, not those two categories, but the third category , which is the fused image . In this case, we will know, then c is set to 0 , which means that the discriminator thinks that the fused image is neither a visible image It is also not an infrared image, thus realizing the function of identifying the fused image.

tips

Please note here that the parameter settings of the paper mention the settings of a1, a2, b1, b2 and c in the loss function. Soft labels are used here, that is, they should be set to 1, but are set to between 0.7 and 1.2. A random number; originally set to 0, it is set to a random number between 0 and 0.3. The previous setting to 1 or 0 is just for the convenience of everyone's understanding.

Summarize

It is also a fruitful article, here is a brief introduction

  • When extracting texture information, we not only focus on the visual image, but also on the texture information of the infrared image.
  • When extracting contrast information, we not only focus on infrared images, but also on the contrast information of visual images.
  • The probability that the discriminator generates multiple classes

Other fusion image paper interpretation and
reading paper columns, come and click me

【Read the paper】DIVFusion: Darkness-free infrared and visible image fusion

【读论文】RFN-Nest: An end-to-end residual fusion network for infrared and visible images

【Read paper】DDcGAN

【读论文】Self-supervised feature adaption for infrared and visible image fusion

【读论文】FusionGAN: A generative adversarial network for infrared and visible image fusion

【读论文】DeepFuse: A Deep Unsupervised Approach for Exposure Fusion with Extreme Exposure Image Pairs

【读论文】DenseFuse: A Fusion Approach to Infrared and Visible Images

reference

[1] GANMcC: A Generative Adversarial Network With Multiclassification Constraints for Infrared and Visible Image Fusion

Guess you like

Origin blog.csdn.net/qq_43627076/article/details/128034247