[Style Transfer]——MedGAN: Medical Image Translation using GANs

MedGAN: Medical Image Translation using GANs
Author:
Karim Armanious, Chenming Yang, Marc Fischer , Thomas K¨ ustner, Konstantin Nikolaou, Sergios Gatidis, and Bin Yang

Abstract

Image conversion is considered to be the next research hotspot of medical image analysis, but it is often designed to design a specific network framework based on specific applications or limited to limited training data. The MedGAN proposed in this paper can implement end-to-end image transformation tasks at the image level. , MedGAN generates a confrontation network based on the current hot GAN, and additionally adds a patrol function for non- confrontation training so that it can capture the required features of different levels. The main function of the discriminant network is to penalize the gap between the log and the target domain; and the loss function of the style transfer part is used to complete the texture and microstructure matching. The generative network is named CasNet in this paper, and the image migration is gradually carried out in the form of codec pairing. Finally, this article tested the effectiveness of MedGAN in tasks such as PET-CT, MR deblurring and PET image denoising.

Section I Introduction

In the field of medical imaging, multi-modal images, such as CT, MEI, PET, CT images, etc., are often used to obtain spatial information of tissues and organs in the body. The physical imaging principles are diverse and the purpose is to produce different dimensions. Or image information with different contrasts. This makes the image conversion between different modes or the conversion of different images in the same mode extremely challenging.


However, completing a relatively complete diagnosis often requires combining image information under different modalities or information provided by different contrast images under the same modalities, such as PET/CT hybrid imaging. CT image information can be used to correct PET images; in addition to this In addition, image quality optimization is also an indispensable and very important step in extracting useful diagnostic information. In particular, taking high-quality images as input in some automatic analysis tools is a prerequisite for obtaining correct diagnosis results. Therefore, if the mutual conversion between medical images of different modalities can be completed, the diagnosis time can be effectively shortened, and some unnecessary scanning/filming will no longer be needed.


Part A Related Work



CRF:



MR->CT->PET/CT




kNN:CT->MRpatch





generation model:






VAE:element-wise loss->blurry






GANs:G: maximize probability density to fool the discriminator; D: minimum Loss function on the






spread- >photo realistic images DG-GANs: G&D->CNN, gradient disappearance/instability







loss function: W-GAN, MMD-GAN regularization: Spectral Normalization GAN, BE-GAN








The excellent performance of GAN is widely used in image classification, segmentation, super-resolution, etc., and GAN is the longest tool for image conversion in the field of medical image analysis. In 2016, Isola et al. proposed that Pix2Pix becomes a framework for supervised GAN for image transformation. Input a grayscale image, and the loss function includes pixel-loss and adversarial loss;







then in 2018, Wang et al. proposed the PAN framework to replace pixel loss with The feature matching loss calculated by D; while the Fila-sGAN proposed in 2017 can achieve one-to-many conversion, and it will first calculate the style loss through a pre-training network.








In addition, there are many unsupervised GANs such as CycleGAN, Disco-GAN Part B Contributions. The existing GAN-based frameworks used for medical image analysis are often limited to specific tasks or insufficient generation capabilities due to limited training data.








Therefore, this article proposes a comprehensive framework suitable for a variety of medical image analysis tasks. The framework proposed by Quan et al. is the closest to this article. They used two cascaded residual networks as the generation network for the reconstruction of CS MRI images.








The MedGAN framework proposed in this paper can capture the high-frequency and low-frequency information of the image at the same time through the non-confrontation loss function (perceptual loss + style transfer loss); this paper also proposes a new generation network structure framework, called CasNet, which combines a series of The codec structure is connected into a chain through skip connection to achieve the effect of gradual refinement.








The generator G is responsible for completing the image conversion from the source domain to the target domain, such as PET to CT. Each block in CasNet adopts an encoding and decoding structure, and the accuracy ranges from coarse to fine. The discriminator D is not only used to distinguish true and false, but also can be regarded as The feature extractor is used to calculate perceptual loss, and the pre-trained feature extractor is used to extract richer features for the calculation of style loss.








In order to verify the effectiveness of MedGAN, tests were carried out in three tasks: PET->CT, MR image correction and PET image denoising.

Section II Methods

Insert picture description here

Fig1 shows the structure of MedGAN, which mainly consists of 3 parts:








CasNet as a generating network, the








discriminant network trains a feature extractor for calculation of perceptual loss, and a pre-trained feature extractor for calculation of style loss.









Preliminaries









GAN









GAN mainly contains The generator and the discriminator are composed of two parts, where G inputs random noise and generates samples according to a certain data distribution; D is a binary classification network used to identify whether the input samples come from the generator or the real training data; the two networks fight against learning , Optimize at the same time, but often face problems such as gradient disappearance, unstable training, and mode collapse.
 









One of the main ways









Image-to-Image Translation
uses GAN for image transformation is to use cGAN. Through the input of extra information, the generator generates samples with a specific domain distribution, and then the discriminator is used to measure the difference between the generated samples and the real samples. The degree of similarity; but this kind of sample generated only based on the adversarial loss function has very serious instability, and the final image may have a large deviation in structure from the original image. Therefore, some scholars have proposed using pixel reconstruction loss Level to constrain the degree of deformation in the image transformation process.











Part A Perceptual loss











adversarial loss->pixel loss->perceptual loss











Insert picture description here
Insert picture description here

However, the limitation of pixel loss is that the generated image is generally blurry, which will result in the final converted image better retaining the overall structure but losing the characteristics of the details. On the one hand, the loss of details may seriously affect the diagnosis of medical images, on the other hand, it also affects the visual effect of human observation. In order to ensure the high frequency information as much as possible while ensuring the overall structure to minimize the loss of detail, perceptual loss is proposed:

Insert picture description here

Where Di represents the feature expression of the i-th hidden layer, and lambda represents the weight of each layer. It is worth noting that Perceptual loss does not discard the calculation of similarity based on pixels, but it still includes it.
Part B Style transfer loss

In the process of style transfer, it is necessary to ensure the high fidelity of the whole image and also ensure that the detailed information is not lost. For example, in PET/CT migration, the generated CT image must contain detailed bone information before it can be used for PET AU correction; and when removing MR motion artifacts, it is necessary to retain soft tissue information before it can be used for subsequent segmentation and detection tasks.


Therefore, MedGAN adds a non-confrontational loss part, and uses a style transfer loss function to ensure the preservation of texture and organization information during the transfer process.



The calculation of style transfer loss is also based on the features extracted by hidden layers, but instead of taking the discriminator network in GAN, it uses a pre-trained network with a deeper structure as the feature extractor, so that the feature extraction ability is more powerful. It can extract richer image features in a larger receptive field, so that the global structure and local details can be better preserved during the migration process.




Style transfer loss consists of two parts: content loss and style loss.





Style loss The style loss function is used to measure the style difference between the style map and the generated map. The style feature is calculated based on the gram matrix.





Insert picture description here

The content loss calculation is the content similarity between the generated map and the feature map extracted by the hidden layer, which is conceptually similar to pixel-loss.






Insert picture description here
Part C MedGAN Architecture
U-Blocks
image conversion task can be seen as converting the input tensor into another vector, but the structure of the two is similar, which is a graph-to-graph conversion mapping task, so the basic module of MedGAN uses U-Block of codec structure.

Each U-Block is a full convolutional network with an encoding and decoding structure, similar to UNet; the encoding path input image is 256*256, and a convolution module such as 8-layer convolution-BN-ReLU is stacked to obtain the encoded feature expression, and The input contains only real images, no noise images; decodingpath is the mirroring process to complete the restoration of the image; similarly, there is skip cnnection between the codecs of the same layer, and they are directly cascaded on the channel. This cross-layer connection is for the underlying information The delivery is very important.


**CasNet:**



The main structure of the discriminator is the CasNet structure newly proposed in this article, because medical image conversion is more challenging than natural image conversion in that small areas often contain rich information, and during the conversion process Inevitably there will be loss of detail or distortion. Therefore, some of the current research is mostly based on specific tasks or will be processed by a series of frameworks. This article proposes a more general end-to-end network-CasNet. The basic structure is shown in Fig2.
Insert picture description here

CasNet is inspired by ResNet. In ResNet, the deep network is constructed by continuously cascading residualblocks, and the residual connection effectively solves the problem of gradient disappearance; therefore, this article draws on the above four items and also cascades U-Block. The skip connection in each block can also reduce the occurrence of gradient disappearance.

However, there is still a certain difference between the cascaded U-Block in CasNet and the residual block in ResNet. The residual block generally has 2-4 layers, while U-Block has 16 layers, so CasNet has better generalization.


Discriminator

discriminator part of this article Some modifications are made on the basis of PatchGAN. PatchGAN does not classify the input, but cuts the input image into a series of small patches before classification, and then outputs the averaged result of all patches.

In this way, D will focus more on the comparison of high-frequency information on each patch. The conventional patch size is 70*70. If the details change more obviously, the patch size can be further reduced.
Part D MedGAN
overall framework and training strategy Therefore, the overall MedGAN framework includes: the generator part of CasNet, the feature extractor and the discriminator part of the pre-trained VGG19 network, which are used to calculate the perceptual loss and style transfer loss, and CasNet contains 6 Therefore, the overall loss function of U-Blocks includes:
Insert picture description heretraining details: Optimizer:ADAM lr=0.0002 momentum:0.5 A series of secondary strategies are adopted for the stability of training: for example, every three CasNet training iterations optimize a patch discriminator and a series Optimization of hyperparameters.

The average training time on a single Titan XGPU is 36hr, and the inference time is 115ms. The overall algorithm pseudo code:
Insert picture description here

Section III Experimental Evaluations

In order to verify the effectiveness of MedGAN, tests were conducted in three tasks: PTE/CT conversion, MR removal of motion artifacts, and PET image denoising.
PET/CT:

Synthesizing CT images based on PET needs to contain more detailed information, such as bone structure and soft tissue. The input image resolution of the dataset used in this article is 256 256, including 1935 sets of training image pairs from 38 patients, and 414 images for verification. 

The difficulty of
MR motion artefacts to
remove MR artifacts is that it is difficult to achieve pixel-level alignment. The data set comes from MR images of the brains collected by 11 volunteers, moving images in two states of motion and motion-free. The training set contains 1445 sheets, the test set 556 sheets, and the resolution is 256
256.



PET denosing 
 training:test = 11420:4411


Part B Experiment setup



The loss function used by MedGAN includes both the adversarial loss and the non-adversarial learning part (perceptual, style, content), so as to retain the global structure, and at the same time have better learning of high-frequency and low-frequency information. In order to verify that it is not the gain brought by the above network combination, this article first tested the effects of various loss functions. For example, in the basic network G uses a single U-Block, D uses a 16*16 patch discriminator, and uses the same parameters as MedGAN. Set up and train for 200 epochs for comparison.







Insert picture description here
Insert picture description here
See Table I for comparison results. Fig4 visualizes some of the results, and it can be seen that the anti-loss effect of using only cGAN is the worst. After adding the discriminator's perceptual loss, the effect is improved to a certain extent, but the details are still lost. After adding Style Loss, the loss of details is supplemented. Therefore, the above-mentioned loss is included in MedGAN, which uses only one U-Block, and a better image conversion effect is achieved. After further increasing the number of CasNet layers, the effect is further improved.

In addition, MedGAN is also compared with other SOTA frameworks. For example, pix2pix, which combines pixel-loss and cGANloss, is a classic framework for image conversion, as well as the PAN network with perceptual loss, and Fila-sGAN with style migration.
For comparison results, see Table II. Among them, pix2pix has the worst effect, and MedGAN performs better than the above framework.





Insert picture description hereInsert picture description here

In order to evaluate the effect of image conversion, this article invited 5 radiology experts to score and rate the generated images. A score of 0-4 means nothing at all-realistic. Table III shows the average results of the experts.
Insert picture description here

Section IV Discussion

This paper proposes an end-to-end medical image conversion framework-MedGAN, which organically combines adversarial loss, perceptual loss, and style loss through CasNet and patch discriminator, etc., which improves the effect of image conversion. , Low-frequency information has better learning.


Finally, MedGAN has achieved good results in PET/CT image conversion, MR de-artifacting and PET image de-noising tasks. It is a comprehensive image conversion framework with excellent generalization performance.


In future work, this article will further test the performance of MedGAN in other specific applications. For example, testing MedGAN's conversion effect on PET-CT images without calibration, and studying the availability of synthetic CT for PET independent AU attenuation correction. In addition, this paper plans to test the applicability of MedGAN in tasks such as MR image segmentation and organ volume calculation.

Guess you like

Origin blog.csdn.net/qq_37151108/article/details/108387929