|
1.IntroductionIn recent years, Deep convolutional neural network (CNN) has been one of the main driving forces for CT image denoising [1-7]. A majority of existing CNN-based denoising methods are supervised, which learn the mapping function between the low-quality image (e.g., low dose) and its high-quality (e.g., high dose) counterpart [1-5]. In order to have a CNN denoiser that generalizes well to new patient data, a large amount of low-/high-quality image pairs from a large number of patient and/or phantom images are needed to sufficiently cover the data distribution. However, this training process is costly, and the model trained from one dataset may not generalize well to another dataset acquired or reconstructed at a different condition. The inter-patient differences can also make it challenging to learn a model that generalizes well across patients. To tackle this challenge, we propose a self-trained deep CNN (ST_CNN) method for noise reduction in CT which does not rely on pre-existing training datasets. This method trains the network directly using the data itself through extensive data augmentation (random rotation and noise addition) through projection domain and the inference is applied to the data itself. We demonstrated that this method could achieve similar performance as conventional deep CNN denoising methods trained on external datasets. There are three major potential benefits of this method. First, by removing the need of a large number pre-existing training dataset, it can be applied to any CT data, even if the data condition were not previously trained. Second, the self-training mechanism eliminates the generalizability issue that may occur for network models applied to datasets that are different from the training datasets. Third, the trained model can be applied to and finetuned for each individual patient if repeated CT exams are expected, which may maximize the benefit of image quality improvement and radiation dose reduction. 2.MethodsThe proposed ST_CNN method belongs to the family of image-domain supervised deep learning techniques, but there is a distinct difference in the training scheme from the existing approaches, as described in Figs. 1 and 2. The availability of sufficient amount of patient cases for training is one key factor contributing to the performance of the conventional supervised deep learning methods. The proposed ST_CNN method is trained based on the data acquired from one single patient by generating a large amount of paired low-quality and high-quality images from the same patient (Figure 2). The trained model is used to denoise the data acquired from the same patient. The training scheme is described as follows: A. Low-quality image generation and augmentation:
B. High-quality image generation and augmentation:
C. Low-/high-quality image pairs generation: Generate matched high- and low-quality patches with multiple slices (e.g., 64×64×7 voxels) from the reconstructed images in the first 3 groups for model training. The images in the 4th groups are used to generate the matched patches for model validation. D. Model training: The CNN denoising model can be based many of the popular network architecture. Here we employed a recently developed 2D residual-based CNN denoiser [3] for both ST_CNN and conventional deep CNN methods. The identical network architecture (Figure 3) was used for both methods so that any performance difference can be attributed to the different training methods. To optimize the performance of the CNN model, we used 7 adjacent CT slices as the channel input of the 2D residual CNN model [5]. The CNN inputs were first standardized (derived by subtracting the mean value and dividing by the standard deviation), and then subjected to initial layers that generated 128 feature maps using 2D convolutional layers. The feature maps were further processed by a series of 2D residual blocks, each of which consisted of repeated layers of 2D convolutional, batch normalization, and rectified linear unit activation. Then the output of residual blocks was projected back to a single-channel image by using a single convolutional layer with linear activation. This single-channel image was the estimated noise, which was further subtracted from the central input slice to get the final denoising result. 3.ResultsFigure 4 compares full-dose (FD) images reconstructed and denoised using 4 different methods: (a) filtered-backprojection (FBP), (b) iterative reconstruction (IR), (c) conventional CNN, and (d) ST_CNN. The images were from a patient case in the Mayo/AAPM Low-dose CT Grand Challenge data library (Case number: L291). In Figure 4 and the following figures of this article, “CNN” refers to the conventional residual CNN method [3]. The FBP and IR reconstructions used matched kernels of B30 and I30 at a strength setting of 3. The conventional CNN was trained and validated using FBP FD and FBP QD image pairs from a subset of totally 30 patient data (17 patients for training and 5 patients for validation). The residual network architecture was identical to that used in the ST_CNN. The trained conventional CNN model was applied to denoise the FBP FD images of the rest of the patient data (e.g., L291). The ST_CNN was trained and validated using augmented FBP QD dose and FBP FD image pairs of a specific patient (e.g., L291), and then was applied to denoise the original FBP FD images of the same patient. The performance of two CNN models were assessed visually by an experienced radiologist. For the overall image quality evaluation, the radiologist ranked the ST_CNN method better than the conventional one because of more homogeneous liver parenchyma and better low-contrast lesion visibility (Arrows in the figure point to two subtle malignant liver tumors). To have a reference standard for quantitative evaluation of the performance, the self-trained CNN was trained and validated using augmented 10% dose (FBP) and QD (FBP) image pairs, and then applied to denoise the FBP QD images of the same patient. In this way, the original FD images can be used as the reference standard. The previously trained conventional CNN was used to denoise the same FBP QD images for CNN performance comparison. Figure 5 compares images reconstructed and denoised using 4 different conditions: (a) QD+FBP, (b) QD+IR, (c) QD+FBP+CNN, (d) QD+FBP+ST_CNN, and the two FD reconstructions were used as the reference standard: (e) FD+FBP and (f) FD+IR. Performance of the two CNN models were assessed visually by the same radiologist. In terms of low-contrast lesion visibility, conventional and self-trained CNN appeared to have a similar performance (Arrows in the figure point to two subtle malignant liver tumors). For the overall image quality evaluation, the radiologist ranked the self-trained CNN method better than the conventional one because of more homogeneous liver parenchyma and less false positive structures (A zoomed-in ROI in the liver parenchyma corresponding to the green box is shown in the bottom-right). Using FD+FBP as the reference, the root mean square error (RMSE), peak signal-to-noise ratio (PSNR), and structural similarity (SSIM) were calculated for the conventional CNN and ST_CNN-denoised QD images (Table 1). The results provide clear evidence that the ST_CNN method has a performance similar to that of conventional deep CNN denoising methods without the need of a large number of training data. TABLE IQuantitative results (mean±SDs) associated with conventional and self-trained CNN methods for patient case L291
4.ConclusionsWe have designed a patient-specific self-trained CNN denoising method, aided by data augmentation through projection domain. Preliminary clinical evaluation demonstrated that the proposed method may achieve similar image quality in comparison with conventional deep CNN denoising methods pre-trained on a large number of patient cases. This new technique has the potential to overcome the generalizability issue of conventional training methods and to provide optimized noise reduction for each individual patient. AcknowledgementThe authors acknowledge the computing facility (mForge) provided by Mayo Clinic for research computing. Dr. Zhou was supported by Mayo Radiology Research Fellowship. ReferencesH. Chen, Y. Zhang, W. Zhang, P. Liao, K. Li, J. Zhou, and G. Wang,
“Low-dose CT via convolutional neural network,”
Biomed. Opt. Express, 8
(2), 679
–694
(2017). https://doi.org/10.1364/BOE.8.000679 Google Scholar
H. Chen, Y. Zhang, MK Kalra, F. Ling, Y. Chen, P. Liao, J. Zhou, and G. Wang,
“Low-dose CT with a residual encoder-decoder convolutional neural network,”
IEEE Trans. Med. Imag., 36
(12), 2524
–2535
(2017). https://doi.org/10.1109/TMI.2017.2715284 Google Scholar
Nathan R.Huber, Andrew D. Missert, Lifeng Yu, Shuai Leng, Cynthia H.McCollough,
“Evaluating a Convolutional Neural Network Noise Reduction Method When Applied to CT Images Reconstructed Differently Than Training Data,”
Journal of Computer Assisted Tomography, 45
(4), 544
–551
(2021). https://doi.org/10.1097/RCT.0000000000001150 Google Scholar
W. Yang, H. Zhang, J. Yang, J.Wu, X. Yin, Y. Chen, H. Shu, L. Luo, G.Coatrieux, and Z. Gui,
“Improving low-dose CT image using residual convolutional network,”
IEEE Access, 5 24698
–24705
(2017). https://doi.org/10.1109/ACCESS.2017.2766438 Google Scholar
Zhongxing Zhou, Nathan R.Huber, Akitoshi Inoue, Cynthia H. McCollough, Lifeng Yu,
“Residual-based convolutional-neural-network (CNN) for low-dose CT denoising: impact of multi-slice input,”
in in SPIE Medical Imaging.,
(2022). Google Scholar
J. M. Wolterink, T. Leiner, M. A. Viergever and I. Isgum,
“Generative adversarial networks for noise reduction in low-dose CT,”
IEEE Trans. Med. Imag., 36
(12), 2536
–2545
(2017). https://doi.org/10.1109/TMI.2017.2708987 Google Scholar
Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, and G. Wang,
“Low dose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,”
IEEE Trans. Med. Imag., 37
(6), 1348
–1357
(2018). https://doi.org/10.1109/TMI.2018.2827462 Google Scholar
|