In response to the critical need for timely and precise detection of lung lesions, we explored an innovative active learning approach for optimally selecting training data for deep-learning segmentation of computed tomography scans from nonhuman primates. Our guiding hypothesis was that by maximizing the information within a training set—accomplished by choosing images uniformly distributed in n-dimensional radiomic feature space—we may attain similar or superior segmentation results to random dataset selection, despite the use of fewer labeled images. To test this hypothesis, we compared segmentation models trained on different subsets of the available training data. Subsets that maximized the diversity among datasets (i.e., diverse data) were compared with subsets that minimized diversity among datasets (i.e., concentrated data) and randomly chosen subsets (i.e., random data). A two-tiered feature-selection technique was used to reduce the radiomic feature space to reliable, relevant, and non-redundant features. We generated learning curves to assess the model performance as a function of the number of training dataset samples. We found that models trained on uniformly distributed data consistently outperformed those trained on concentrated data, achieving higher median test Dice scores with less variance. These results suggest that active learning and intelligent selection of data that are diverse and uniformly distributed within a radiomic feature space can significantly enhance segmentation model performance. This improvement has substantial implications for optimizing lung lesion characterization, disease management, and evaluation of treatments and underscores the potential benefit of active learning and intelligent data selection in medical imaging segmentation tasks.
KEYWORDS: Computed tomography, Data modeling, Lung, Education and training, Viruses, Pulmonary disorders, Deep learning, Medical imaging, Image segmentation, Image enhancement
Agile development of reliable and accurate segmentation models during an infectious disease outbreak has the potential to reduce the need for already-strained human expertise. Global research and data-sharing efforts during the COVID-19 pandemic have shown how rapidly Deep-Learning (DL) models can be developed when public datasets are available for training. However, these efforts have been rare, usually limited by the unavailability of Computed Tomography (CT) imaging datasets from patients in the clinical setting. In the absence of human data, animal models faithful to human disease are used to investigate the imaging phenotype of high-consequence and emerging pathogens. As simultaneous access to both human and Nonhuman Primate (NHP) data for the same respiratory infection is unusual, we were interested in whether the inclusion of NHP data might enhance DL image segmentation of lung lesions associated with severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) infection. Thus, we set out to evaluate DL performance and generalizability to a human test set. We found that combining human and NHP data and utilizing pretrained NHP models to initialize model training outperformed a model trained solely on human CT data. By studying the interaction between human and NHP CT imaging in developing these models, we can assess the potential value of NHP datasets for known or novel viruses that emerge in settings where medical imaging capacity is limited. Understanding and leveraging NHP datasets to improve the agility and quality of model development capabilities could better prepare us to respond to disease outbreaks in the human population.
PurposeWe describe a method to identify repeatable liver computed tomography (CT) radiomic features, suitable for detection of steatosis, in nonhuman primates. Criteria used for feature selection exclude nonrepeatable features and may be useful to improve the performance and robustness of radiomics-based predictive models.ApproachSix crab-eating macaques were equally assigned to two experimental groups, fed regular chow or an atherogenic diet. High-resolution CT images were acquired over several days for each macaque. First-order and second-order radiomic features were extracted from six regions in the liver parenchyma, either with or without liver-to-spleen intensity normalization from images reconstructed using either a standard (B-filter) or a bone-enhanced (D-filter) kernel. Intrasubject repeatability of each feature was assessed using a paired t-test for all scans and the minimum p-value was identified for each macaque. Repeatable features were defined as having a minimum p-value among all macaques above the significance level after Bonferroni’s correction. Features showing a significant difference with respect to diet group were identified using a two-sample t-test.ResultsA list of repeatable features was generated for each type of image. The largest number of repeatable features was achieved from spleen-normalized D-filtered images, which also produced the largest number of second-order radiomic features that were repeatable and different between diet groups.ConclusionsRepeatability depends on reconstruction kernel and normalization. Features were quantified and ranked based on their repeatability. Features to be excluded for more robust models were identified. Features that were repeatable but different between diet groups were also identified.
Evaluation of the intra-subject reproducibility of radiomic features is pivotal but challenging because it requires multiple replicate measurements, typically lacking in the clinical setting. Radiomics analysis based on computed tomography (CT) has been increasingly used to characterize liver malignancies and liver diffusive diseases. However, radiomic features are greatly affected by scanning parameters and reconstruction kernels, among other factors. In this study, we examined the effects of diets, reconstruction kernels, and liver-to-spleen normalization on the intra-subject reproducibility of radiomic features. The final goal of this work is to create a framework that may help identify reproducible radiomics features suitable for further diagnosis and grading of fatty liver disease in nonhuman primates using radiomics analysis. As a first step, the identification of reproducible features is essential. To accomplish this aim, we retrospectively analyzed serial CT images from two groups of crab-eating macaques, fed a normal or atherogenic diet. Serial CT examinations resulted in 45 high-resolution scans. From each scan, two CT images were reconstructed using a standard B kernel and a bone-enhanced D kernel, with and without normalization relative to the spleen. Radiomic features were extracted from six regions in the liver parenchyma. Intra-subject variability showed that many features are fully reproducible regardless of liver disease status whereas others are significantly different in a limited number of tests. Features significantly different between the normal and atherogenic diet groups were also investigated. Reproducible features were listed, with normalized images having more reproducible features.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.