Low-dose CT Image Synthesis for Domain Adaptation Imaging Using a Generative Adversarial Network

Low-Dose CT Image Domain Adaptive Synthesis with Generative Adversarial Networks Based on Noise-Encoded Transfer Learning (IEEE TMI 2023)

insert image description here

Paper address: https://ieeexplore.ieee.org/document/10081080

Project address: https://github.com/nightastars/GAN-NETL

Abstract

Deep learning (DL)-based image processing methods have been successfully applied to low-dose X-ray images based on the assumption that the feature distribution of the training data coincides with that of the test data. However, low-dose computed tomography (LDCT) images from different commercial scanners may contain different amounts and types of image noise, which violates this assumption. In addition, when applying DL-based image processing methods to LDCT, the feature distributions of LDCT images from simulated and clinical CT examinations may be quite different. Therefore, a network model trained with simulated image data or LDCT images from one particular scanner may not work well for another CT scanner and image processing task. To address this domain adaptation problem, in this study, a novel generative adversarial network (GAN) with noise-encoded transfer learning (NETL), or GAN-NETL, is proposed to generate pairings with different noise styles data set. Specifically, we propose a method that implements a noise encoding operator and incorporates it into a generator to extract noise patterns. Meanwhile, through the transfer learning (TL) method, the image noise coding operator converts the noise type of the source domain to the noise category of the target domain to generate real noise. The proposed method is evaluated using one public dataset and two private datasets. Experimental results demonstrate the feasibility and effectiveness of our proposed GAN-NETL model in LDCT image synthesis. Furthermore, we conduct additional image denoising studies using synthetic clinical LDCT data, validating the merits of the proposed synthesis in improving the performance of DL-based LDCT processing methods.

I. INTRODUCTION

The widespread and growing use of x-ray computed tomography (CT) for medical examination and diagnosis raises major concerns for patients about potential health risks from CT radiation doses. Some previous studies have shown that excessive CT x-ray radiation increases the risk of genetic diseases and cancer [1-2]. In addition, children are more likely than adults to be exposed to x-ray radiation during CT examinations [3]. Therefore, reducing the radiation dose of CT has become a major focus of the medical imaging research community. There are two main technical methods to reduce the radiation dose, namely reducing the intensity of x-ray radiation and reducing the sampling projection view. The former will reduce the number of photons used for imaging, but lead to increased noise and artifacts in the reconstructed image. The latter will lead to problems of undersampling or incomplete reconstruction of projection data. As a result, both dose reduction methods lead to a decrease in the reconstructed CT image quality. To address the above issues, low-dose CT (LDCT) imaging methods using deep learning (DL) to reduce radiation dose have recently attracted increasing research interest [4-10]. Through extensive experimental studies utilizing the LDCT dataset provided by AAPM [11], some DL-based network models have demonstrated their potential to preserve image details while improving the SNR of LDCT images. These network methods [12–16] learn a mapping function with network parameters from paired training data of LDCT and normal-dose CT (NDCT) images. In contrast, other attempts have been made to combine DL-based image processing methods with iterative image reconstruction methods to improve the performance of LDCT imaging [17–24].

In clinical practice, it is difficult to collect paired training data directly from CT equipment due to the patient's breathing motion and radiation dose limitations in data acquisition. Therefore, noise modeling approaches should be considered to generate LDCT data from NDCT scans. Under the assumption that the noise pattern of LDCT data obeys an approximately normal distribution stochastic process, LDCT data with arbitrary noise levels can be easily synthesized using some noise modeling methods [25-27]. However, real CT imaging situations involve many data processing procedures. Therefore, the noise distribution of real CT data may be more complex than noise modeled CT data (Fig. 1). Furthermore, CT scanners from different manufacturers often have different hardware designs, data preprocessing procedures, reconstruction algorithms, and low-dose scanning protocols. Therefore, the noise patterns of LDCT images from different imaging modalities may vary due to varying effects.

insert image description here

Generative Adversarial Networks (GANs) [29] have played an increasingly important role due to their excellent performance in image synthesis tasks. GAN is a special type of DL model that can simultaneously train a generative model and a discriminative model based on adversarial learning between these two models. Recently, a series of new network models based on GAN have been derived. In general, these models can be divided into two types. The first type consists of a supervised network model that utilizes pairs of training data to learn a mapping from a source domain to a target domain. Isola et al. [30] proposed a conditional GAN ​​(CGAN) called pix2pix and demonstrated that it is useful for solving pairwise image-to-image translation problems. Wang et al. [31] further improved pix2pix for high-resolution image synthesis tasks. To avoid washing away the semantic information of the above conditional image synthesis models, Park et al. [32] proposed a Spatial Adaptive Normalization (SPADE) layer, which can control the style and semantics of synthesized images. To reduce the extra parameters and computational cost introduced by SPADE, Tan et al. [33] designed a Class Adaptive Normalization (CLADE) for efficient semantic image synthesis. The second class of GANs consists of unsupervised network models based on image-to-image translation tasks with unpaired datasets. Lu et al. [34] proposed an image deblurring method using disentangled representations and adversarial learning. To address the problem of noise or artifact suppression in medical images, several disentanglement networks [35-37] derived from CycleGAN [38] have recently been proposed. Liao et al. [35] designed some specialized loss functions to separate out metal artifacts in CT images in the latent space. Huang et al. [36] developed a noise-driven disentangled representation learning method for suppressing speckle noise in optical coherence tomography images. In order to better accomplish the task of unpaired image-to-image translation, Zhang et al. [28] propose a new GAN structure with two discriminators, which can transfer the Image style for structural details. Furthermore, Zhu et al. [38] proposed an unsupervised image-to-image translation network architecture by introducing a cycle consistency loss. Wolterink et al. [39] proposed an unsupervised MR-to-CT synthesis method using a GAN with cycle consistency loss. Zhang et al. [40] added a shape consistency loss to alleviate geometric distortion in the task of cross-modal image synthesis.

In addition, other researchers proposed transfer learning (TL) methods [41-46] to achieve image style transfer. Gatys et al. [41] proposed a method to separate natural image content and style representations for advanced image synthesis. Liu et al. [42] proposed a unified style method to learn the style differences among different cameras to suppress the inconsistency of image styles. Lv et al. [43] proposed a novel adaptive method to lift the segmentation model from the synthetic domain to the realistic target domain. Kim et al. [44] propose an Adaptive Instance Normalization Network (AINDNet), which facilitates network models that work well from synthetic noise to real noisy images. Most of these methods are proposed to solve the problems of natural image synthesis and style transfer, where large-sized training datasets are readily available. However, they face new challenges when used for LDCT image synthesis tasks with small dataset sizes.

This study designs, develops and evaluates a DL method for generating pairwise LDCT datasets with realistic noise styles in target images. To address the noise adaptation problem, a new image synthesis model GAN-NETL is designed, which effectively separates the content representation and noise style of CT images. Specifically, we design a noisy encoding network and incorporate it into the generator to generate noisy feature kernels. Meanwhile, to lift the noise pattern of the synthesized image from the source domain to the target domain, we propose a TL method by transferring the noise encoding operator from one simulated CT noise domain to another real CT noise domain. One public dataset and two private datasets are used to evaluate the proposed GAN-NETL method. Furthermore, we conduct additional image denoising studies using synthetic clinical LDCT data to verify that the proposed GAN-NETL network can be used to improve the domain-adaptive imaging performance of DL-based methods.

In summary, the main contributions of this work include:

1) A novel DL-based approach is implemented to generate paired datasets with different noise styles for domain-adaptive LDCT imaging problems.

2) A new image synthesis model is designed, which can separately represent the content and noise style of CT images.

3) In order to effectively transfer the noise pattern, a noise encoding network is designed to extract the noise pattern, and a TL method is introduced in the network training to transfer the noise pattern of the synthesized LDCT image.

4) Extensive experimental results on public and private datasets demonstrate that the GAN-NETL network can generate high-fidelity LDCT data. Furthermore, we demonstrate that the proposed method indeed helps to improve the image denoising performance of DL-based LDCT processing methods.

II. METHODS AND MATERIALS

A. General Network Models for Image Synthesis

In typical deep learning-based LDCT image processing algorithms [4, 12], LDCT images are considered to be obtained through the degradation process of NDCT images. Let y ∈ RH × W y \in \mathbb{R}^{H \times W}yRH × W represents the LDCT image,x ∈ RH × W x \in \mathbb{R}^{H \times W}xRH × W denotes the corresponding NDCT. The degradation process can be described as:
y = G ( x ) + n , (1) y=G(x)+n, \tag{1}y=G(x)+n,
In formula ( 1 ) , G is the degradation process function caused by factors such as high quantum noise, and n is the additional image noise. The goal of solving the LDCT image synthesis problem can be considered to seek the mapping function G: x → y using a convolutional neural network (CNN), which estimates an LDCT image y given an NDCT image x.

To synthesize clinical LDCT image data with realistic noise style, we design a new LDCT image synthesis model. The model explicitly represents the noise pattern of CT images, and performs noise extraction and coding. The image synthesis model proposed in this paper is defined as follows:
E : x → k , (2) E:x\to k, \tag{2}E:xk,(2)

y = G ( z ; k ; θ ) , (3) y=G(z;k;\theta), \tag{3} y=G(z;k;i ) ,(3)

In the formula, E represents the noise extraction and encoding operation, k represents the specific noise feature kernel, and y represents the LDCT image synthesized by the model in this paper. z is the image content of the NDCT image x. It can be obtained by pre-denoising NDCT images with a stepwise network. The function G represents an image synthesis model with training parameters.

B. GAN-NETL network architecture

Based on the proposed image synthesis model, we design a novel content and noise complementary learning network structure to transfer the noise style of an image to the target image domain while preserving the CT attenuation content. Figure 2 shows a schematic diagram of the architecture of GAN-NETL. First, a Data Preprocessing Module (DPM) is introduced to separate the content image from the NDCT input. Then, two encoders E1 and E2 are designed in the generator to independently encode the content component and the noise component respectively. Here, we preprocess NDCT images using DPM to generate content images. Since this content image mainly consists of tissue attenuation information, the content encoder (E1) should be a good content extractor. Furthermore, since our supervised label images contain high levels of noise information, the noise encoder (E2) will be forced to perform noise extraction and encoding from the NDCT input. Through complementary learning of content and noise, GAN-NETL can achieve better image synthesis performance. At the same time, the design of the two encoders facilitates the complementary utilization of feature information from different datasets. Finally, to exploit the characteristics of different datasets, we also design a two-stage transfer learning (TL) training strategy. In the following, we describe each of its components in detail.

insert image description here

1) DPM module : The content image is generated using the DPM module (Fig. 2 (A)) in the proposed GAN-NETL network model. The modular adaptive processing neural network (MAPNN) proposed by Shan et al. [5] is used to process LDCT images and NDCT images, and output clinical images with different denoising degrees. The NDCT input is preprocessed by a pre-trained MAPNN model to obtain a content image, which is used as the semantic content input of the generator encoding network. At the same time, the NDCT image is used as the input of the noise encoding network in the generator.

2) Generator module : As shown in Figure 2 (A), the generator in the GAN-NETL network consists of three parts, namely Encoder1 (E1), Encoder2 (E2) and Decoder1 (D1). Both E1 and E2 use the same preprocessing block (PrB) and three downsampling operations. As shown in Figure 2 (B), PrB performs convolution and downsampling operations on the input, followed by 5 residual block (ResB) operations. However, the difference is that E2 contains a reflection and 77 conversion operation. In addition, E2 will continue to perform the 4 ResB operations shown in Figure 2 (A) on the obtained features, thus completing the estimation of the noise feature kernel k. Merge the noise feature kernel k with the encoded content features obtained by E1 to generate the input (D1) of Decoder1. In order to restore the feature map to the original size of the input, D1 performs three deconvolution operations on the fused information. Furthermore, we add a post-processing operation, the post-processing block (PoB), to D1. As shown in Figure 2 (B), PoB first performs 10 ResB operations on the obtained input, followed by a convolution operation. Then use pixel shuffling for upsampling, and the upscaling factor is set to 2. Finally, a "convolution plus Leaky Rectified Linear Unit (LeakyReLU)" operation and a convolution operation are used to obtain the final output. Among them, the PrB module and downsampling operation are used to extract multi-scale feature information from the input image, and the deconvolution operation and PoB module are used to convert low-resolution feature maps into high-resolution feature maps through convolution and multi-channel reconstruction.

3) Discriminator module : The multi-layer discriminator module [32] uses a combination of convolution, BatchNorm, and LeakyReLU to perform three downsampling operations and a convolution operation followed by a convolution operation to distinguish the real LDCT generated by the generator samples and pseudo-LDCT samples (synthetic LDCT samples).

For a more detailed understanding of the generator and discriminator and their components in the GAN-NETL network, Table 1 lists the details of the parameters used in the network.

insert image description here

C. Loss function for GAN-NETL training

The choice of loss function reflects the level of feature information in network training and is crucial to the performance of the proposed model learning. The pixel-wise loss measures the similarity between pixel intensities in the predicted and target images, compensating for their differences in pixel space [47]. However, the inter-pixel loss algorithm only extracts the underlying image feature information and ignores the high-level image structure information. Therefore, we add a perceptual loss to the objective function to incorporate knowledge of high-level perception and semantic image differences. The Wasserstein GAN loss from [48] is used in the adversarial loss function. The total loss function is obtained by weighting the three losses, expressed as:
L ual = min ⁡ G max ⁡ DLWG t N ( D ˙ , G ) + λ cb L cb + λ p L p , (4) L_{uatal}= \operatorname*{min}_{G}\operatorname*{max}_{D}L_{WG t N}(\dot{D},G)+\lambda_{cb}L_{cb}+\lambda_{p }L_{p}, \tag{4}Luatal=GminDmaxLWGtN(D˙,G)+lcbLcb+lpLp,( 4 )
where LWGAN,Lcb and Lp denote adversarial loss, pixel-wise loss and perceptual loss, respectively. D and G represent discriminator and generator. λ cb \lambda_{cb}lcbλ p \lambda_{p}lpis a hyperparameter to balance the contributions among the three terms.

1) Wasserstein GAN Loss : Arjovsky et al. [48] proposed to use EM distance to improve the GAN network, named Wasserstein GAN (WGAN). In this work, we use the adversarial loss function of WGAN:
min ⁡ G max ⁡ DL w GAN ( D , G ) = − E y [ D ( y ) ] + E x [ D ( G ( x ) ) ] , (5) \min\limits_G\max\limits_D L_{wGAN}(D,G)=-E_y[D(y)]+E_x[D(G(x))], \tag{5}GminDmaxLwG A N(D,G)=Ey[D(y)]+Ex[D(G(x))],( 5 )
Among them, the above two items E[] on the left side of formula 5 are expectation operators. During network training, optimizing the generative network G can reduce the EM distance x() between y and G, thus effectively shortening the distance between the generated distribution and the real distribution. However, the goal of the discriminator network D is to distinguish between these two distributions.

2) Pixel-wise Loss : In order to make the synthesized LDCT image closer to the original LDCT image in pixel space, inspired by Lai et al. [49], we add Cahrbonier loss (CB Loss) to the objective function As a pixel-level loss, measure the distance between the synthesized LDCT image G(x) and the corresponding original LDCT image y:
L cb = ∑ i M ( y − G ( x ) 2 ) + ε 2 , (6) L_{ cb}=\sum_i^M\sqrt{(yG(x)^2)+\varepsilon^2}, \tag{6}Lcb=iM(yG(x)2)+e2 ,(6)
其中 i ∈ M i \in M iM , M is the total training pair, where,ε \varepsilonε is a constant, which we empirically will be 1e-3.

3) Perceptual loss : We add a perceptual loss in the objective function, aiming to induce the synthesized LDCT image to have consistent structural details with the original LDCT image based on the metric of human visual perception. Zhang et al. [50] proposed learning perceptual image patch similarity (LPIPS) to construct a perceptual matrix for similarity judgment. Here, we use LPIPS as the perceptual loss:
L p = E ( G ( x ) , y ) [ LPIPS ( G ( x ) , y ) ] , (7) L_{p}=E_{(G(x), y)}[LPIPS(G(x),y)], \tag{7}Lp=E(G(x),y)[LPIPS(G(x),y)],( 7 )
Individual LPIPS:
d ( a , ao ) = ∑ l 1 H l W l ∑ h , w ∥ wl ⊙ ( y ^ hwl − y ^ 0 hwl ) ∥ 2 2 , (8) d(a ,a_o)=\sum_l\frac{1}{H_lW_l}\sum\limits_{h,w}\|w_l\odot(\hat{y}_{hw}^l-\hat{y}_{0hw} ^l)\|_2^2, \tag{8}d(a,ao)=lHlWl1h,wwl(y^hwly^0 h wl)22,( 8 )
where aoand a denote the reference and y and the patch generated by G(x) respectively. l represents the number of layers, H and W represent height and width,y ^ l , y ^ 0 l ∈ RH l × W l × C l \hat{y}^l,\hat{y}_0^l\in\mathbb {R}^{H_l\times W_l\times C_l}y^l,y^0lRHl×Wl×ClRepresents a feature stack, w I ∈ R c I w_I\in\mathbb{R}^{c_I}wIRcIis a vector used to scale activations by channel.

D. Data preparation

We use one public dataset and two private datasets to evaluate the versatility and practicality of the proposed network.

1) AAPM dataset [11]: Contains clinical NDCT images of 10 patients and corresponding simulated “quarter dose” LDCT images, each image size is 512×512 pixels. We randomly select 4793 pairs of images with a slice thickness of 1mm as the training set of the proposed network model in the first stage, and randomly select 500 pairs of images in the remaining dataset as validation and test sets to evaluate the proposed GAN-NETL network model performance.

2) Private phantom dataset : The data collection uses the anthropomorphic torso Phantom CTU-41 (Kyoto Kagaku, Japan). The Private phantom dataset was obtained by scanning the phantom with a CT scanner at six different dose levels. We selected 2360 LDCT images, the slice thickness was 1.25mm, the tube voltage was 120kVp, and the tube current was 20mAs ~ 60mAs. We took the CT images with a slice thickness of 1.25 mm, a tube voltage of 120 kVp, and a tube current of 120 mAs as the corresponding NDCT images. Each image contains 512×512 pixels. We randomly select 2000 image pairs as the training set of the second-stage GAN-NETL network, and 360 image pairs as the validation and test datasets.

3) Private clinical dataset : This dataset contains clinical NDCT images of 18 patients with a tube voltage of 120kVp and a tube current ranging from 100mAs to 150mAs. The slice thickness of the reconstructed images was 1.25 mm, and the image size was 512 × 512 pixels. Among them, 2 patients had low tube current data. CT image of patient 1, the tube current was 80 mAs and 120 mAs, and the slice thickness was 1.25 mm. Patient 2 contains CT images of 40 mAs and 120 mAs tube currents with a slice thickness of 1.25 mm. In this study, we use 384 images of two patients as a clinical test dataset to evaluate the performance of the proposed training GAN-NETL.

In addition, the phantom dataset and the clinical dataset were acquired using a ScintCare CT128 scanner (Minfind Medical Co., Ltd., China). For the phantom dataset, we chose the axial acquisition protocol to align paired LDCT and NDCT images. However, for clinical datasets, axial or helical acquisition protocols are used.

Considering that the AAPM dataset and the Private phantom dataset are paired, in the following image synthesis experiments, we denote their LDCT as "ground truth". Due to differences in respiratory motion in private clinical CT data obtained from different patients, we were unable to obtain "ground truth" images from CT scanners. Instead, we selected relatively similar LDCT images in the comparative analysis as "reference" images.

E. Training strategy and implementation rules

1) The training strategy of the GAN-NETL model : the AAPM public dataset is defined as the source domain, and the clinical private dataset is defined as the target domain. This work adopts a two-stage training scheme. In the first stage, the network is pre-trained using the AAPM dataset. In the second stage, the parameters of the E1 module in GAN-NETL are fixed and the network is fine-tuned using our private Phantom dataset. The main reason for the fixed parameters in the E1 module is based on the assumption that the NDCT images in the AAPM dataset come from patient studies with background content closer to the target domain. However, the noise in LDCT images is artificially added and may have a different noise style from the target domain. In contrast, the LDCT images in our private phantom dataset were obtained with reduced tube current using the same CT scanner. The noise pattern of this dataset is consistent with the target domain, while the image content information comes from the same torso phantom, which is relatively simple. Therefore, after completing the two-stage training process, we expect E1 to be able to mine the content information of clinical datasets in the target domain, and E2 to achieve the transformation of noise patterns from the source domain to the target domain.

2) Training details of the GAN-NETL model : The proposed network model contains a DPM module and an improved GAN network structure. The former mainly includes a pre-trained MAP-NN network, whose main function is to denoise NDCT images and generate corresponding clear content images. First, 2000 pairs of images are randomly selected from the AAPM and private phantom datasets, respectively. Then, we uniformly mix the two datasets to pre-train MAP-NN. The training parameters and details of the pre-trained MAP-NN in the DPM module are shown in Table 2. For the two stages of improved GAN network training, the generator and discriminator are optimized using the AdamW optimization algorithm [51]. Hyperparameter β 1 \beta_1b1and β 2 \beta_2b2They are set to 0.9 and 0.999 respectively, and the weight decay is set to 0. In addition, in order to increase the nonlinear factor of the network, we use LeakyReLU as the activation function and set the negative slope to 0.2. In the first stage, the initial learning rate of the generator and discriminator is set to 8.0×10-5, and the learning rate is decayed every 10 times, and the decay rate is set to 0.5. Furthermore, the mini-batch size is set to 16 and the patch size is 80×80. For the second stage, the learning rate of both networks is set to 1.0×10-5, and the learning rate is set to decay every 20 epochs with a decay rate of 0.5. At the same time, the mini-batch size is set to 20, and the patch size is set to 80×80. For hyperparameters λ cb \lambda_{cb}lcbλ p \lambda_plp, with experimental settings of 1.0 and 0.5. The total number of training epochs in the first and second stages of the GEN-NETL network is set to 100. All network models are implemented in Python based on the Pytorch [52] DL library, and NVIDIA Titan V GPUs are used in the study for network training/validation.

III. RESULTS

A. Image synthesis results

In the following experiments, we combine our proposed GAN-NETL network model with three state-of-the-art supervised image style transfer methods (namely, SPADE [32], CLADE [33] and AINDNet [44]) for LDCT image synthesis. Compare. To implement these three supervised image style transfer methods, we use the open-source codes provided in these studies. Since these methods are trained on natural images, to ensure that these methods are optimized to provide the best performance for a fair comparison, we optimize some of their hyperparameters in two stages. Qualitative and quantitative evaluations are performed on the AAPM and private phantom datasets containing paired NDCT and LDCT image data. However, for our private clinical dataset, a qualitative comparison is performed due to the lack of labeled images. In order to quantitatively evaluate the performance of network models for LDCT image synthesis, we choose fr Inception Distance (FID) [53] to measure the feature vector distance between the synthesized LDCT and the original LDCT image, since it is often used to evaluate the quality of generated images , and have been shown to correlate well with human assessments of visual quality. The smaller the FID value, the higher the quality of the synthesized image. In addition, structural similarity index (SSIM) and root mean square error (RMSE) were employed. Here, the larger the SSIM value or the smaller the RMSE value, the closer the result is to the target image.

1) Results from AAPM and Private Phantom datasets : All network models are retrained using the same dataset for qualitative and quantitative evaluation. Use the trained network model on the same test set. Table 3 presents the average test result model for the image quality metrics of all the synthetic result images of the tested networks. By observing the test results of the AAPM dataset, it is found that AINDNet's synthesis effect on LDCT images is very poor. In contrast, both the SPADE and CLADE network models provided more accurate synthetic LDCT images than AINDNet, while the GAN-NETL network model scored the highest in synthetic LDCT images (Table III) with the highest accuracy.

insert image description here

On the other hand, the proposed GAN-NETL network also achieves better scores than other network models when using the private phantom dataset as the test dataset and using the same network model after the second stage TL. Quantitative results on two representative synthetic CT slices are shown (Fig. 3 and Fig. 4), and the simulated test dataset is shown in Table 4. Furthermore, our proposed network achieves the highest SSIM and the smallest RMSE/FID. Overall, the quantitative experimental results show that the GAN-NETL network model provides the best tissue structure similarity to synthetic LDCT images.

insert image description here

insert image description here

In addition to the quantitative comparison and analysis above, we also evaluate the performance of different network models based on qualitative comparisons on the synthesized LDCT images. Figure 3 shows a comparison of representative CT slices from the AAPM dataset. From left to right, the images in the top row of Figure 3 are “ground truth” and synthetic LDCT images from four test network models (AINDNet, SPADE, CLADE, GAN-NETL), respectively. It can be seen from the difference images that the LDCT image structures synthesized by the AINDNet and SPADE network models are quite different, while the LDCT image structures synthesized by the CLADE and GAN-NETL models are slightly different. In summary, the comparison results show that the low-fidelity synthetic LDCT images given by the AINDNet model are quite different from the “ground truth” images in both CT image intensity values ​​and structural details. Both the SPADE and CLADE models outperformed the AINDNet model in terms of structural detail preservation, while their CT image intensity values ​​were higher than “ground truth” images. The proposed GANNETL outputs a synthetic LDCT image that is closest to the "ground truth" image, i.e., a high-fidelity image consistent with the "ground truth" image.

insert image description here

Figure 4 shows the results of a similar quantitative comparison of the performance of different network models using representative CT slices from the private phantom dataset. In order to observe the details of the tissue structure in the image more clearly, the same area marked by the blue box in the "ground truth" image is enlarged from the four network models of AINDNet, SPADE, CLADE and GANNETL in the top row. Compared with other network models, the image enlargement part of GAN-NETL provides the best LDCT image synthesis details. The next row of Figure 4 shows the absolute difference images of the “ground truth” and the synthetic LDCT images from the four test network models. The difference image again quantitatively shows how close the synthetic LDCT image is to the "ground truth" image. Overall, the comparison shows that our proposed image synthesis method has better image synthesis performance with high accuracy and fidelity.

2) Results on Clinical Datasets : To further verify the adaptability of the GAN-NETL model in target domain LDCT image synthesis, we conduct experimental studies using private clinical datasets. Figure 5 depicts the comparison results of the LDCT images synthesized by the four network models with the corresponding “reference” images.

insert image description here

By observing the results of the four network models, the noise patterns of the images generated by AINDNet are quite different from the corresponding reference images. In contrast, the LDCT images synthesized by SPADE and CLADE have similar noise patterns compared to the corresponding reference images, but significant differences in the average gray value. For better comparison, we select three regions of interest, ROI 1, ROI 2, and ROI 3, from the three sample slice images in Figure 5, and quantitatively evaluate the performance of different models. Table V shows the quantitative evaluation of four different network models by comparing the mean and standard deviation (STD) of the three ROIs.

insert image description here

The CT mean of the LDCT composite image obtained by the SPADE and CLADE models is significantly different from the reference image, which is consistent with the visual observation in Fig. 5. In addition, the mean and STD values ​​of the LDCT synthetic images obtained by the GAN-NETL model are closest to those of the reference images. Overall, the GAN-NETL network model proposed in this paper provides high-fidelity LDCT synthetic images that are consistent with the reference images in terms of image detail and average CT value levels.

B. Ablation Study of GAN-NETL

We conduct ablation studies to analyze the effectiveness of different components across our network. To verify the impact of multiple loss functions and DPM, we conduct qualitative and quantitative evaluations using AAPM and a private phantom dataset. To examine the effectiveness of the Noise Encoding Network E2 and transfer learning training strategies, we performed qualitative comparisons using our private phantom and clinical datasets. In the following, we discuss in detail the impact of several designs of our proposed GAN-NETL network model.

1) Effects of loss functions : In this section, we study the effects of adversarial loss, CB loss and LPIPS loss on the GAN-NETL network model. Here, we use a network model containing only adversarial losses as the base network. Subsequently, we add different losses to the base network. They include: 1) Base+Lcb with added CB loss; 2) Base+Lp with added LPIPS loss; and 3) GAN-NETL with added LPIPS loss and CB loss in the base network. Table 6 summarizes the quantitative results of the ablation studies.

insert image description here

Compared with the base network, both base+Lcb and base+Lp have obvious improvement in three metrics of SSIM, RMSE and FID, which shows that both CB loss and LPIPS loss contribute to our proposed GANNETL network. In addition, the FID value of Base+Lp is significantly smaller than that of Base+Lcb, which indicates that using LPIPS as a perceptual loss can shorten the eigenvector distance between synthetic data and real data distribution. When the two loss functions are used in combination, the proposed GAN-NETL obtains the best evaluation index value, indicating that it can make full use of the advantages of the two loss functions to further improve the network performance. Figure 6 presents the results of the four test methods. From the absolute difference image in the bottom row of Figure 6, it can be seen that the LDCT image synthesized by the GAN-NETL network (base+Lp+Lcb) proposed in this paper has the smallest difference and the best visual image quality. These results show that in our proposed GAN-NETL network, CB loss and LPIPS loss indeed improve the performance.

insert image description here

2) Effect of DPM : We use DPM to separate content images from NDCT input. Then, the encoders E1 and E2 in the generator can independently encode the complementary feature information. Here, we test the effect by removing DPM from the entire network. In Table 7, the results of “With DPM” (full network) scored the highest in terms of SSIM, RMSE and FID. Furthermore, as shown in Figure 7, compared with the ground truth (Figure 7(a)) and its error heatmap, our full GAN-NETL network (Figure 7(b)) outperforms the rectified network ( Figure 7©), and obtain minimal differences in noise pattern and background texture. The above results show that DPM does help our overall network.

insert image description here

3) Effect of Encoder E2 : We further improve the transfer network performance of GAN-NETL using Encoder E2. Here, encoder E2 is removed from our full network to test the effect. Figure 8 shows some comparison results of the full network (with encoder E2) and the modified network (without encoder E2) from the phantom dataset. The calculated FID values ​​for the images generated by the two networks are shown in the upper left corner of the image. The absolute difference image of the region of interest (indicated by the red wireframe) is enlarged and placed within the red rectangle of the same image. By comparison, the image obtained by our GAN-NETL network has a closer similarity to the rectified image. It can be seen that the qualitative and quantitative evaluations are consistent.

insert image description here

Figure 9 presents results for the modified network and our full network from a private clinical dataset. By visual comparison, the clinical LDCT images synthesized by the improved network have more blurred tissue details than the full images. These results show that Encoder E2 indeed improves the performance of the transmission network. To better describe why E2 can improve the transfer network performance of the proposed GANNETL, we show the 25 feature maps of encoders E1 and E2 when using NDCT images as input.

insert image description here

Visual observation from Fig. 10 shows that encoders E1 and E2 work well in extracting content and noise feature information. Meanwhile, the design of the encoder E2 can help the GAN-NETL network to learn the complementary feature information of the simulation dataset (AAPM) and the target phantom dataset. Therefore, the full GAN-NETL network can synthesize high-fidelity clinical LDCT images of the target domain.

insert image description here

4) Effect of transfer learning : Noise style conversion is achieved through transfer learning. In this study, transfer learning will be removed from the training of the GAN-NETL network model. Figure 11 presents the results of the trained model with TF (Figure 11(b)) and the model without TF. Figure 11(c)). Figure 11(b) shows a similar noise pattern to Figure 11(a). The noise types in Fig. 11© are different from those in Fig. 11(a), indicating that TL indeed enhances the performance of the network in terms of noise type transfer.

insert image description here

C. Application research in LDCT image denoising

The purpose of the GAN-NETL-based image synthesis method is to transfer the noise feature distribution of LDCT images from the source domain to a new target domain, so that the existing GAN-NETL-based denoising methods can be applied to realistic clinical denoising tasks. In this study, we compare the image denoising performance of different network models on the target domain with and without domain adaptation training. In LDCT image denoising research, the network model trained using the public AAPM dataset is recorded as the model without domain adaptation, while the model trained using the synthetic clinical dataset is called the model with domain adaptation. In the following experiments, the synthetic clinical dataset contains LDCT and NDCT image pairs of 18 patients. To train the network model, 5000 pairs of CT images are selected from the AAPM and synthetic clinical datasets respectively as the training set, while 1000 pairs of images are selected from the remaining data in the synthetic clinical dataset as the testing set. Taking MAPNN [5] as an example for denoising research, the network models that have undergone domain adaptation training and have not been trained by domain adaptation are denoted as "MAPNN-w" and "MAPNN-o" respectively. The training parameters and details of MAPNN-W and MAPNN-O are shown in Table II.

In Fig. 12 and Fig. 13, the maximum number of iterations D represents the number of cloning modules used for MAPNN network training, which can help radiologists optimize the denoising depth in a task-specific manner [5]. Here, we set D to be 5 according to the original paper. Figure 12 shows two representative evaluation results using two sets of independent image slices from the synthetic clinical dataset obtained from the “MAPNN-W” and “MAPNN-O” network models.

insert image description here

In the first group, as the denoising level increases, small blood vessel details (indicated by blue arrows) in the MAPNN-O images gradually disappear (see row 1), while in the MAPNN-W images, despite the denoising level increase, small blood vessel details are still preserved. In the second set of evaluations compared to bone area (indicated by the green arrow), the MAPNN-W model retains detailed information between high-density bone tissues, while MAPNN-O is found to lose some organization as the number of iterations increases information. In low-contrast regions of LDCT images (indicated by red circles), we find that MAPNN-W performs better in preserving more structures in low-contrast regions. In order to quantitatively evaluate the effect of network models with and without domain adaptation on LDCT image denoising, we use two commonly used image quality indicators PSNR and SSIM.

The results in Table 8 show that the denoising performance of the MAPNN-O network model degrades as the number of iterations increases, while MAPNN-W significantly improves the step-by-step denoising performance in terms of PSNR and SSIM.

insert image description here

To further verify the domain adaptability of MAPNN-W, we conduct another image denoising experiment using the real clinical LDCT dataset of the target imaging scene. The resulting image is shown in Figure 13.

insert image description here

Similar to the above results for the synthetic clinical dataset, the comparative results also show that the MAPNN-W network model outperforms the MAPNN-O network model in terms of preserving image contrast and details while reducing the image noise level. Qualitative and quantitative results show that the clinical dataset synthesized by the GAN-NETL network can help improve the imaging performance of domain adaptation based on deep learning methods.

IV. DISCUSSION AND CONCLUSION

To address domain adaptation, this paper proposes a novel DL-based approach to generate pairwise datasets of novel imaging scenes with realistic noise styles. For clinical LDCT image synthesis, the mapping function cannot be directly learned due to the lack of pairwise training data. Aiming at the above problems, a LDCT image synthesis network model based on Generative Adversarial Network (GAN) and Noise Encoding Transfer Learning (NETL), namely GAN-NETL, is proposed. The proposed GAN-NETL network model provides an explicit representation of noise patterns in CT images. Meanwhile, a two-stage training scheme is designed using transfer learning (TL) to facilitate the proposed network model to simultaneously represent the content and noise distribution of realistic clinical scenarios. Our main motivation is to consider the background content information learned from public datasets as public knowledge, and the noise distribution learned from target datasets as a given style. By using TL, we hypothesize that the proposed GAN-NETL network model is capable of learning new noise patterns while preserving the underlying content representation.

Due to the flexibility of GAN-NETL, the proposed image synthesis method can be easily extended to more applications in the field of CT imaging. First, besides being applied to LDCT image synthesis, GAN-NETL can also be used to solve the domain adaptation problem in LDCT processing tasks. Second, since GAN-NETL does not make any assumptions about the noise distribution, it can be extended to solve other image synthesis problems, such as metal artifact synthesis, motion artifact synthesis, projection data synthesis, etc. In addition, for CT manufacturers, GAN-NETL can also be used to generate private pair datasets with realistic noise styles to promote dl-based imaging methods towards more applications. Although the proposed GAN-NETL has many advantages, it also has some disadvantages. For example, it is proposed for supervised LDCT image synthesis, where experimental image data from phantom research is a prerequisite. However, for some new imaging scenarios that lack suitable phantoms, the development of new methods for semi-supervised or unsupervised domain adaptation needs to be considered.

In conclusion, we design a new LDCT image synthesis method, and push the GAN-NETL network model to a new LDCT image synthesis task by transferring the noise synthesis operator from one simulated noise domain to another real noise domain. To demonstrate the effectiveness of the proposed method, extensive experiments are conducted on one public dataset and two private datasets. The experimental results on the AAPM and phantom datasets well verify the good performance of the GAN-NETL network model in LDCT image synthesis. More importantly, by using the GAN-NETL network model proposed after TL, LDCT image pairs with realistic target domain noise style can be generated, thus enabling effective processing of LDCT images. Although many existing methods have tried to solve the LDCT imaging problem by modifying or designing new network architectures, loss functions, etc., from a new perspective, our work focuses on studying the LDCT imaging application in the target domain. Likelihood of paired training data.

Guess you like

Origin blog.csdn.net/weixin_43790925/article/details/131113774