[Style Transfer]Generative Adversarial Network in Medical Imaging: A Review

Generative Adversarial Network in Medical Imaging: A Review

Research Progress of Generative Adversarial Networks in Medical Images-A Review

Abstract

Thanks to the fact that the generative confrontation network can implicitly complete the modeling of data distribution, it shines in many tasks under computer vision. The generator generates a large number of samples, and the discriminator performs category prediction on a large number of unlabeled samples. Therefore, GAN has done many explorations in the fields of domain migration, segmentation tasks, classification tasks, and cross-modal synthesis. Therefore, this article reviews the related research progress of generative adversarial networks in the field of medical imaging, aiming to provide some convenience and ideas for researchers who are interested in this field.

Section I Introduction

Since 2012, the deep learning framework has regained its vitality in the field of computer vision, and there have also been a large number of related studies in medical image-related journals and conferences. Thanks to the powerful feature learning capabilities of deep learning, it can be used to enhance the feature expression of medical images for further use For tasks such as classification and segmentation.


GAN is mainly composed of two networks: a generator and a discriminator. The generator generates a certain distribution of samples, and the discriminator determines whether the sample comes from a real image or a generated image, and gives a category prediction result. The two networks are trained at the same time. Fight against learning.

GAN has achieved SOTA results in many fields, such as text-to-image, super-resolution and image-to-image translation.


The main search sources for this review include the following mainstream journals/conferences: International Conference on Medical Image Computing and Computer Assisted Intervention (MICCAI);



SPIE Medical Imaging;




 IEEE International Sympo- sium on Biomedical Imaging (



 ISBI );
 International confer- ence on Medical Imaging with Deep Learning (MIDL)




 Deadline: 2019.1.1 The



 
main arrangements of the article are as follows:



 

Section I: Introduction



 


Section II: Basic knowledge of GAN and related variants Section III:



 



Application of GAN in medical images, involving tasks: segmentation, classification, detection , Annotation, etc.





 







 








Section IV 


 



: Summarize the full text and look forward to the future # Section II GAN and its variants




Part A GAN The




original GAN ​​includes two networks: a generative network and a discriminant network, so there is no need to learn the probability density function of the image/data but directly from the expected data distribution Sample for training.





The input of Generaotr is random noise z sampled from a certain distribution, and the output is a generated image xr similar to the real image; this nonlinear mapping function uses θg to indicate that Discriminator takes the real sample or the generated sample as input, and the output is the input sample as The probability of true or false, so called a discriminator, D can be a simple two-class network.





The generator is dedicated to deceiving D with fake samples, while the discriminator is dedicated to distinguishing whether the input samples are true or false. The two fight against learning and train at the same time. The loss function is expressed as:






Insert picture description here
Limitations of
GAN GAN training can be regarded as Looking for the saddle of the loss function, but the convergence problem of the two network training is difficult to guarantee. For example, the discriminator D is strong enough and it is easy to distinguish whether the image is real or from the generating network. At this time, D reaches the local optimum and it is difficult to continue to optimize G , This situation is especially common when generating high-resolution images; another problem is that GAN training will also encounter mode collapse (mode collapse), that is, the generator always produces local output under a certain distribution. As a result, the output mode is limited. For example, it can generate pictures of dogs but cannot generate pictures of cats and rabbits. 

Discriminator,
a related variant of GAN
:



The improvement of D is mainly to make stable training or prevent mode collapse. Related improvements are: f-GAN, LS-GAN, WGAN, EBGAN, BEGAN, ALI, BiGAN, InfoGAN, ACGAN




Generator:




The generation network maps random noise to a specific distribution of output, usually a decoder network, such as VAEGAN uses the features learned by the variational self-encoder to complete the pixel-level reconstruction.




The entire GAN network has no restrictions on the generated mode at the beginning of its construction, but if the auxiliary input additional information can guide the GAN to generate a specific mode, this is cGAN.




In image conversion, reconstruction loss is often used as auxiliary supervision information, in the segmentation task The dice loss is used in the above GAN training, and the above GAN training requires input image pairs; and the Cycle GAN as the representative does not require the input of image pairs, which increases the existence of cyclic consistency to limit the deformation between domain transfers; there is also UNIT It is to combine the two VAEGANs without image pair training. Architecture:





The FC layer used in the original GAN ​​is replaced with an up-sampling or down-sampling layer in DCGAN, which becomes a fully convolutional neural network. BN and LeakyReLU can increase the stability of training; some work uses residuals in GAN Poor connection.





This can build a deeper network.






It is very difficult to directly generate high-resolution images from noise. Therefore, in LAPGAN, a stacked strategy is used to add high-frequency details of the image successively; in SGAN, a series of GAN cascades are also used. Different GANs produce different A shallower-level feature expression; in PGGAN, it is a gradual growth method that continuously adds new layers to expand the scale of the G/D network; the strategy of styleGAN is not to directly use the noise z as input, but to convert it into A certain form. Fig3 shows the specific structure of GAN, CatGAN, EBGAN/BEGAN, ALI/BiGAN, InfoGAN, ACGAN, VAEGAN, CGAN, LAPGAN, SGAN; Fig 4 shows the structure of cGAN, CycleGAN and UNIT.






Insert picture description hereInsert picture description here

# Section III Application of GAN

in Medical Image The application of generative adversarial network to medical image analysis mainly

starts from two aspects: First, from the perspective of GAN generation, GAN can be used to learn the internal structure of training data and generate new images to solve medical problems. Image is scarce or plays a role in protecting patient privacy;

second, from the perspective of GAN identification, the Discriminator can be used to identify abnormal images.
Insert picture description here

Fig 5 shows the related application of GAN in medical images, af focuses on generating images, and g focuses on identifying images. According to the different tasks to be solved, it is mainly divided into the following categories: image reconstruction, image synthesis, image segmentation, image classification, organ detection, image registration, etc.


(A) CT image denoising (b) MR generating CT (c) generating fundus image based on retinal blood vessel segmentation map (d) generating skin lesion image from random noise (e) segmenting heart and lungs based on chest X-ray (f) Brain injury segmentation (g) OCT abnormal detection of retina


Part A Reconstruction is


based on the limitations of imaging equipment itself, so that medical images are often accompanied by noise or artifacts, which affect observation and analysis. Early data-driven training methods usually generate reconstructions directly based on input The subsequent output is used as a step of image processing. The image has either insufficient resolution, or contains noise, insufficient sampling rate, or aliasing; MR images are an exception. The original K-space data can be merged into the reconstructed image through Fourier transform.



At present, there have been studies using the pix2pix framework for CT image and PET image denoising, and MR reconstruction; other studies are dedicated to frame optimization, such as using pre-trained VGGNet to ensure the similarity of the generated image perception level, and using detection network substitutes The denoising of low-resolution areas, the use of local saliency maps to improve the super-resolution reconstruction of retinal images, to ensure that the network focuses on some key areas, and when processing MR images, it is necessary to consider how to accurately use K-space data for image processing. Under reconstruction.


In terms of data sets, there are also a series of problems. For example, the current performance evaluation of various medical image reconstruction networks mostly relies on the subjective evaluation of observers and lacks objective comparison; the current open-source large-scale image reconstruction data sets are not suitable for medical images. Reconstruction analysis, this aspect continues to large open source data sets.
Insert picture description here

Table I lists the related work of GAN for medical image reconstruction, and the corresponding loss, evaluation indicators and data sets used are listed in Table 2, 3, and 7, respectively. The right or wrong number in the table represents whether image pairs are used during training.

It can be seen that most of the medical image reconstruction tasks such as CR and MR use the pix2pix framework, except that the MR data is somewhat special, and the Fourier transform is involved in the processing; in addition, the more training data, the use of counter loss Generally, they have better visual reconstruction effects than pixel-loss, but they may cause some distortion. Pixel-loss does not have this problem but requires image pairs as training data, and the reconstruction effect is not good for domain mismatches. Good; existing studies have shown that before being strictly demonstrated by experts, the reconstructed image generated based on GAN is not suitable for direct use in patient diagnosis.

Table 2-Loss Function:
The optimization of the loss function includes assigning different weights to different pixels through saliency calculation; and using CycleGAN for the super-division reconstruction of ventricular CT when the training image is difficult to obtain; some studies have found that It is also possible to remove the fidelity loss in pix2pix when performing low-dose CT image denoising.

Table 3
Insert picture description hereTable 7-Datasets
Insert picture description here

Part B Medical Image Synthesis,


according to the regulations of the relevant institutions, requires full consideration of the patient's wishes when the patient's medical image involves public publication. GAN is widely used in medical image generation, which effectively circumvents patients' privacy issues and solves the problem of insufficient pathological images. However, medical images with expert annotations for the training of supervised learning frameworks are still very scarce, although there are already many organizations dedicated to building such large data sets, such as Biobank, National Biomedical Imaging Archive (NBIA), The Cancer Imaging Archive (TCIA) , Radiologist Society of North America (RSNA), etc.



Traditional data expansion includes: scaling, rotation, mirroring, affine transformation, elastic transformation, etc., but they have not changed the richness of samples in a certain category, only the size and size have changed, while the images generated by GAN are more diverse , Has been used as a means of data expansion in many works and achieved good results.


B-1 Unconstrained image generation



Unconditional refers to input random noise to generate images without adding any constrained information. Commonly used frameworks for medical image generation include DCGAN, WGAN, and PGGAN because they are more stable to train, and the first two are acceptable. The upper limit of the resolution is 256x256, which is incompetent at high resolution. If the generated image is not much different from the original image, such as lung nodules and liver damage, you can try to directly transplant the author's source code. In most downstream tasks (used for fine-tuning after pre-training), in order to solve the problem of insufficient data, most generators only generate images of specific patterns, such as different types of liver injury images. Using the generated images to expand the data set can be improved to a certain extent Sp and Se of the model.


Insert picture description here

B-2 Cross-modality image generation. Cross-modality generation of



medical images is very useful, such as generating CT images based on MR images, which can effectively reduce the time and cost of obtaining CT images; another advantage is that it can generate search samples that can contain Some structural constraint information of the source modal image. Generally, the two modes have a high degree of similarity, which is often based on the pix2pix framework. If the modal differences are large, it can be based on the CycleGAN framework. Fig5 lists the related work of GAN for cross-modal image generation. It can be seen that most of them are based on the CycleGAN framework.


For example, the work of Zhang et al. found that only using cycle loss in cross-modal generation is not enough to suppress the geometric deformation of the generated image. Therefore, shape consistency loss is additionally added, and the semantic expression of the shape is obtained through two segmentors. Constraints are added to the loss calculation to suppress image distortion. In addition, CycleGAN and UNIT are both suitable for cross-modal generation.



Insert picture description here

B-3 Conditional Generation



Additional conditional information can come from the segmentation map, text information, target location, or generated image, such as GAN or a pre-trained segmentation network to generate constrained conditional information, and send it to the GAN generation network. Divide the entire process into two stages.



Table 6 lists conditional generation (cGAN) and different sources of constraint information.



Insert picture description here

The Part C Segmentation



segmentation task uses pixel-level loss functions, such as calculating cross-entropy loss functions. Although UNet’s proposal effectively combines shallow features and deep features, it cannot guarantee the consistency of the final segmentation map space. It usually uses CRF or image cutting to increase the spatial constraint information, but in some areas with very low contrast, the edge segmentation effect may be poor. The adversial loss in GAN can be regarded as a shape regulator, especially when the area of ​​interest is relatively compact (such as the segmentation of the heart and lungs) but it is not so effective for deformable catheters, blood vessels, etc. Applying to the middle feature layer ensures domain invariance.



The anti-loss calculation is the similarity measure between the segmentation map and the labeled GT, but instead of the calculation between pixels, the input is mapped to a low-dimensional space to calculate the similarity between the two. This is the same as the perceptual loss. loss) is relatively close, but the perceptual loss is calculated through a pre-trained classification network, and the discriminator is gradually evolved in the confrontation learning with the generator. Xue uses multi-scale L1 loss in D, so that information of different scales can be included; Zhang sends both labeled and unlabeled pictures into D to confuse D and improve D’s discrimination ability; the above work uses confrontation Most of the training is to maintain the structural information of the final segmented image, and some studies use GAN to enhance the robustness of the model and reduce the overfitting problem on small data sets.



Insert picture description here

Part D Classification




classification task is undoubtedly an area where deep learning has achieved great success. It extracts the features of different levels of the image through different levels of the network to obtain the category label of the image. GAN is also widely used in classification tasks. Some use part of G/D as a feature extractor, and some only use D as a classifier, adding some additional conditional information.




At present, some scholars combine WGAN and InfoGAN to classify cytopathological images unsupervised; and semi-supervised training is used for abnormal diagnosis of chest radiographs, retinal blood vessels and heart diseases, which can achieve similar effects to supervised training CNN , But the required training data is reduced by an order of magnitude.




As for the use of GAN for data enhancement, the previous article also introduces two steps. Phase 1 learns features for data enhancement. Phase 2 is based on traditional classification network for classification. The two phases are separated from each other without interference. The advantage is that if there is better performance The framework can be replaced directly. The disadvantage is that only a certain type of image can be expanded by G each time. N types need to be generated N times, which is very expensive for memory and calculations. Therefore, how to dynamically generate multi-modality based on a model The data is a more popular research direction. In this regard, Frid-Adar's research found that using separate DCGANs is more effective in skin damage detection than using a single ACGAN.




Part E Detection





GAN's discriminator realizes anomaly detection by learning the distribution probability of training image data. If an image falls outside this distribution, it may be abnormal.





Schlegl et al. detect the abnormality of the OCT image by calculating the score of each chapter picture; Alex uses GAN for the detection of MR brain injury, where G is used to imitate the distribution of the input patch, and D is used to calculate the posterior probability of the input patch. It can be seen that most of the current detections are performed on abnormal images, and it is difficult to list them all.






In terms of image reconstruction, some studies have found that if the distribution of images from the case analysis has never been studied, and CycleGAN is used to migrate without image pairs, the lesions in the generated image may even be removed due to the matching effect of the distribution; if it is used The data of the same modal, but the normal and abnormal categories are different, then this side effect can be removed through anomaly detection.





Part F Registration






cGAN can also be used for multi-modal or single-modal image registration tasks, where G either outputs the transformation parameters or directly outputs the converted image, and D judges whether it is an aligned or unaligned image. Usually a spatial deformation network or an intermediate transformation layer is inserted between these two networks to ensure end-to-end training. From the relevant research conducted, most of them are based on CycleGAN to complete the alignment of CT, MR and other different mode images.



 







 



Part G Other Works



 



In addition, GAN has other applications, such as using cGAN to highlight a specific preoperative image of a patient, highlighting the most likely diseased area, or recoloring the endoscopic image.



 # Section IV Discussion




 



Part A Summary



 



2017-2018 ushered in the explosive period of various research applications of GAN. Related references can be found on GitHub of this article .








 



Insert picture description here

Fig1 (a) shows the publication status of GAN under different tasks (image synthesis, reconstruction, segmentation, classification, detection, annotation, etc.),



 



(b) different medical image formats (MR, CT, pathological images, optic disc images, X-ray, ultrasound, dermoscopy, PET, mammography, etc.),



 




(c) Publications in different years, we can see that 2018 ushered in a big explosion.



 








 




46% of the research focused on medical image generation because of cross-modal data Generation is very important in medical image analysis. For example, MR is a sequence image analysis task, and GAN can generate other sequences based on existing sequences, reducing the generation time of MR. Another reason for the enthusiasm for MR research may also be large There are more open source data sets.



 




Another 37% of related research focuses on image segmentation and reconstruction. The reason is the maturity of the image-to-image translation framework. By adding shape or texture constraints to the adversarial training, G can output very ideal images, such as 3D CT. The segmentation of the image effectively increases the counter loss and improves the segmentation effect of regions with low contrast.




 




About 8% study classification tasks, mainly to solve the problem of domain migration. When using GAN for data enhancement, most of the focus is on images with small foreground objects, such as lung nodules, cells, etc., which may be more stable based on the current state of the art. It may also be that the calculation cost is considered. High-resolution images consume too much computing resources. Although some studies have successfully classified chest X-rays, the scale of the task is relatively small, with thousands of pictures, and the task is relatively simple, which is the detection of ventricular abnormalities.




 




With the open source of some large data sets, such as CheXpert, the demand for GAN for data enhancement may decline, but it is still useful in the following two aspects:



 




First, increase the diversity of data enhancement strategies, which are currently limited to manual design Various cropping rotation affine transformations, etc., can use GAN to increase the diversity of transformations;



 




The second is that we know that the training samples of medical images are very uneven, most of the samples are common diseases, and the training data for some uncommon diseases such as rheumatoid arthritis and sickle-type erythroblastemia is very scarce, so we can use cGAN, etc. Generate training samples for unusual diseases from expert descriptions or hand-drawn drawings.



 




Part B Challenges



 





Although GAN has many applications in medical image analysis, there are still many challenges. For example, the evaluation indicators used in image reconstruction or cross-modal generation are still PSNR, SSIM, etc., which are not directly related to the quality of visual observation. For example, optimization based on pixel loss will result in a sub-optimization result, although the evaluation indicators look good However, the actual observations still have a lot of ambiguities. This problem can be solved by performing downstream tests, such as segmentation or classification tasks based on the reconstructed/generated images, so as to measure the effect of reconstruction.



 





Recently, Zhang et al. proposed LPIPS (learned perceptual image path similarity) as an evaluation index and used it in MedGAN. Another problem is the problem of image transformation between domains. Some need to use image pairs and some do not need image pairs. If the image pair is not applicable, the fidelity of the image details cannot be guaranteed. Some studies have shown that system bias will be introduced when using CycleGAN for image conversion. This bias will also exist in cGAN trained using images, but most of it occurs during domain migration, such as training on normal data and testing with abnormal example.



 





The power of Part C Interesting future applications



 





GAN is that it can be learned in an unsupervised or weakly supervised manner, which can simplify the image analysis process and improve patient care.



 





For example, cGAN is used to eliminate artifacts caused by motion in MR images, so that the number of repeated shooting can be reduced. Some studies can use GAN to semi-automatically generate medical analysis reports, and CycleGAN can be used to remove make-up and make-up and can be used to remove artifacts from medical images. In addition to abnormal detection, the detection of implanted devices can be further expanded, such as pacemakers. And artificial valves, etc.



 






StyleGAN's research provides the possibility for the realization of text-to-image, so that corresponding training samples can be generated according to the expert description of rare diseases to make up for the lack of training data; it can even predict the development of the disease and the operation mechanism of the drug.



 







In medical image analysis, different types of medical images of the same organization are usually required to complement each other. Comprehensive analysis, in supervised learning, only one modality can be trained at a time. Although the network framework used is the same, it must be repeated. ; And GAN avoids this repetitive process and reduces the waste of labor costs.




 







Up to this point, exuberance has introduced the fascinating application prospects of GAN in the field of medical image analysis, but we have to admit that the use of GAN for medical image analysis is still in its infancy, and there is no real mature clinical application results.

Guess you like

Origin blog.csdn.net/qq_37151108/article/details/107848373