Joint Head and Neck Radiotherapy-MRI Development Cooperative, Travis Salzillo, M. Alex Dresner, Ashley Way, Kareem Wahid, Brigid McDonald, Sam Mulder, Mohamed Naser, Renjie He, Yao Ding, Alison Yoder, Sara Ahmed, Kelsey Corrigan, Gohar Manzar, Lauren Andring, Chelsea Pinnix, R. Jason Stafford, Abdallah S. Mohamed, John Christodouleas, Jihong Wang, Clifton David Fuller
PurposeTo improve segmentation accuracy in head and neck cancer (HNC) radiotherapy treatment planning for the 1.5T hybrid magnetic resonance imaging/linear accelerator (MR-Linac), three-dimensional (3D), T2-weighted, fat-suppressed magnetic resonance imaging sequences were developed and optimized.ApproachAfter initial testing, spectral attenuated inversion recovery (SPAIR) was chosen as the fat suppression technique. Five candidate SPAIR sequences and a nonsuppressed, T2-weighted sequence were acquired for five HNC patients using a 1.5T MR-Linac. MR physicists identified persistent artifacts in two of the SPAIR sequences, so the remaining three SPAIR sequences were further analyzed. The gross primary tumor volume, metastatic lymph nodes, parotid glands, and pterygoid muscles were delineated using five segmentors. A robust image quality analysis platform was developed to objectively score the SPAIR sequences on the basis of qualitative and quantitative metrics.ResultsSequences were analyzed for the signal-to-noise ratio and the contrast-to-noise ratio and compared with fat and muscle, conspicuity, pairwise distance metrics, and segmentor assessments. In this analysis, the nonsuppressed sequence was inferior to each of the SPAIR sequences for the primary tumor, lymph nodes, and parotid glands, but it was superior for the pterygoid muscles. The SPAIR sequence that received the highest combined score among the analysis categories was recommended to Unity MR-Linac users for HNC radiotherapy treatment planning.ConclusionsOur study led to two developments: an optimized, 3D, T2-weighted, fat-suppressed sequence that can be disseminated to Unity MR-Linac users and a robust image quality analysis pathway that can be used to objectively score SPAIR sequences and can be customized and generalized to any image quality optimization protocol. Improved segmentation accuracy with the proposed SPAIR sequence will potentially lead to improved treatment outcomes and reduced toxicity for patients by maximizing the target coverage and minimizing the radiation exposure of organs at risk.
The performance of Deep Learning (DL) segmentation algorithms is routinely determined using quantitative metrics like the Dice score and Hausdorff distance. However, these metrics show a low concordance with humans’ perception of segmentation quality. The successful collaboration of health care professionals with DL segmentation algorithms will require a detailed understanding of experts’ assessment of segmentation quality. Here, we present the results of a study on expert quality perception of brain tumor segmentations of brain MR images generated by a DL segmentation algorithm. Eight expert medical professionals were asked to grade the quality of segmentations on a scale from 1 (worst) to 4 (best). To this end, we collected four ratings for a dataset of 60 cases. We observed a low inter-rater agreement among all raters (Krippendorff’s alpha: 0.34), which potentially is a result of different internal cutoffs for the quality ratings. Several factors, including the volume of the segmentation and model uncertainty, were associated with high disagreement between raters. Furthermore, the correlations between the ratings and commonly used quantitative segmentation quality metrics ranged from no to moderate correlation. We conclude that, similar to the inter-rater variability observed for manual brain tumor segmentation, segmentation quality ratings are prone to variability due to the ambiguity of tumor boundaries and individual perceptual differences. Clearer guidelines for quality evaluation could help to mitigate these differences. Importantly, existing technical metrics do not capture clinical perception of segmentation quality. A better understanding of expert quality perception is expected to support the design of more human-centered DL algorithms for integration into the clinical workflow.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.