The software “Multireader sample size program for diagnostic studies,” written by Kevin Schartz and Stephen Hillis, performs sample size computations for diagnostic reader-performance studies. The program computes the sample size needed to detect a specified difference in a reader-performance measure between two imaging modalities when using the analysis methods initially proposed by Dorfman, Berbaum, and Metz, and Obuchowski and Rockette, and later unified and improved by Hillis and colleagues. A commonly used reader-performance measure is the area under the receiver-operating-characteristic curve. The program has an easy-to-use step-by-step intuitive interface that walks the user through the entry of the needed information. It can be used with several different study designs, inference procedures, hypotheses, and input and output formats. The program is functional in Windows, OS X, and Linux. The methodology underlying the software is discussed for the most common diagnostic study design, where each reader evaluates each case using each modality.
Our goal was to ascertain how fatigue affects performance in reading computed tomography (CT) examinations of patients with multiple injuries. CT images with multiple fractures from a previous study of satisfaction of search (SOS) were read by radiologists after a day of clinical work. Performance in this study with fatigued readers was compared to a previous study in which readers were not fatigued. Detection accuracy for obvious injuries was not affected by fatigue, but accuracy for subtle fractures was reduced (P=0.016). An SOS effect on decision thresholds was evident mirroring recent studies. Without fatigue, readers spent more time interpreting and reporting findings as the number of the injuries increased. When fatigued, readers did not increase reading time as fracture number increased. Without fractures, reading time for not-fatigued and fatigued readers was the same (P=0.493) but was significant (P=0.016) with an added subtle fracture. The difference increased with a major injury (P=0.003) and increased further with both a major injury and subtle fracture (P=0.0007). Fatigue and multiple abnormalities have independent effects on detection performance but do interact in determining search time.
Previous studies have demonstrated that fatigue impacts diagnostic accuracy, especially for those in training. We continued this line of investigation to determine if fatigue has any impact on a common source of errors – satisfaction of search (SOS). SOS requires subjects to participate in 2 sessions (SOS and non-SOS) and so does fatigue (fatigued and not fatigued) so we ran subjets in only the fatigued condition and used a previous non-fatigued study as the comparison. We used 64 chest computed radiographs half demonstrating various ‘‘test’’ abnormalities were read twice by 20 radiologists, once with and once without the addition of a simulated pulmonary nodule. Receiver-operating characteristic detection accuracy and decision thresholds were analyzed to study the effects of adding the nodule on detecting the test abnormalities. Adding nodules did not influence detection accuracy (ROC AUC SOS = 0.667; non-SOS = 0.679), but did induce a reluctance to report them. Adding nodules did not affect inspection time so the reluctance to report was not associated with reduced search. Fatigue did not appear to exacerbate the SOS effect. A second study with fractures revealed the same shift in performance but did reduce viewing times when fatigued. The results of these two studies suggest that the impact of fatigue on SOS is more complicated than expected and thus may require more investigation to fully understand its impact in the clinic.
The recently released software Multi- and Single-Reader Sample Size Sample Size Program for Diagnostic Studies, written by Kevin Schartz and Stephen Hillis, performs sample size computations for diagnostic reader-performance studies. The program computes the sample size needed to detect a specified difference in a reader performance measure between two modalities, when using the analysis methods initially proposed by Dorfman, Berbaum, and Metz (DBM) and Obuchowski and Rockette (OR), and later unified and improved by Hillis and colleagues. A commonly used reader performance measure is the area under the receiver-operating-characteristic curve. The program can be used with typical common reader-performance measures which can be estimated parametrically or nonparametrically. The program has an easy-to-use step-by-step intuitive interface that walks the user through the entry of the needed information. Features of the software include the following: (1) choice of several study designs; (2) choice of inputs obtained from either OR or DBM analyses; (3) choice of three different inference situations: both readers and cases random, readers fixed and cases random, and readers random and cases fixed; (4) choice of two types of hypotheses: equivalence or noninferiority; (6) choice of two output formats: power for specified case and reader sample sizes, or a listing of case-reader combinations that provide a specified power; (7) choice of single or multi-reader analyses; and (8) functionality in Windows, Mac OS, and Linux.
KEYWORDS: Data modeling, Statistical analysis, Interference (communication), Statistical modeling, Contamination, Medical imaging, Information visualization, Error analysis, Electronic filtering, Space operations
Introduction. Perception experiments collecting rating method ROC data sometimes result in operating points at only relatively high specificities for some treatment-reader combinations. In the extreme, no operating points are internal to the feasible space of many parametric models (i.e. for all points, FP = 0). Dorfman & Berbaum1 developed a contaminated binormal model (CBM) to account for ROC data that have few false-positive reports even though many healthy subjects are sampled. Unfortunately, CBM can give very different ROC curve shapes for similar ROC points and when there are no internal operating points, the ROC curve shape will often differ substantially from that obtained when there are internal operating points. Materials and Methods. We eliminate the CBM limiting case by adding a small constant to each cell of the rating data matrix2,3 and to set μ, the difference between the visible signal and noise distributions, to the same high value for all conditions.1 Results. We illustrate the resulting ROC curves using an example dataset from Schartz et al.4 All observed ROC points become internal. The fitted ROC curves are similar to those of the limiting CBM and empirical ROC, but all curves using the same μ have the same shape and never cross. ROC accuracy parameters such area, partial area, and sensitivity at any fixed specificity correspond perfectly. Conclusions. Constraining the CBM to a fixed large μ provides a more effective way to apply it to difficult-to-fit data.
Satisfaction of search (SOS) occurs when an abnormality is missed because another abnormality has been detected
in radiology examinations. This research includes our study of whether the severity of a detected fracture determines
whether subsequent fractures are overlooked. Each of 70 simulated multitrauma patients presented radiographs of three anatomic areas. Readers evaluated each patient under two experimental conditions: when the images of the first anatomic
area included a severe fracture (the SOS condition), and when it did not (the control condition). The SOS effect was
measured on detection accuracy for subtle test fractures presented on examinations of the second or third anatomic areas.
SOS reduction in ROC area for detecting subtle test fractures with the addition of a major fracture to the first radiograph
was not observed. The same absence of SOS that had been observed when high-morbidity added fractures were
presented on CT was replicated with the high-morbidity added fractures presented on radiographs. This finding rules out
the possibility that there was no SOS in the prior study with CT because SOS effects do not extend from one imaging
modality to another. Taken together, the evidence rejects the hypothesis that the severity of a detected fracture determines the SOS for subsequently viewed fractures.
Radiologists are reading more cases with more images, especially in CT and MRI and thus working longer hours than
ever before. There have been concerns raised regarding fatigue and whether it impacts diagnostic accuracy. This study
measured the impact of reader visual fatigue by assessing symptoms, visual strain via dark focus of accommodation, and
diagnostic accuracy. Twenty radiologists and 20 radiology residents were given two diagnostic performance tests
searching CT chest sequences for a solitary pulmonary nodule before (rested) and after (tired) a day of clinical reading.
10 cases used free search and navigation, and the other 100 cases used preset scrolling speed and duration. Subjects filled
out the Swedish Occupational Fatigue Inventory (SOFI) and the oculomotor strain subscale of the Simulator Sickness
Questionnaire (SSQ) before each session. Accuracy was measured using ROC techniques. Using Swensson's technique
yields an ROC area = 0.86 rested vs. 0.83 tired, p (one-tailed) = 0.09. Using Swensson's LROC technique yields an area
= 0.73 rested vs. 0.66 tired, p (one-tailed) = 0.09. Using Swensson's Loc Accuracy technique yields an area = 0.77 rested
vs. 0.72 tired, p (one-tailed) = 0.13). Subjective measures of fatigue increased significantly from early to late reading. To
date, the results support our findings with static images and detection of bone fractures. Radiologists at the end of a long
work day experience greater levels of measurable visual fatigue or strain, contributing to a decrease in diagnostic
accuracy. The decrease in accuracy was not as great however as with static images.
Collecting clinical cases for medical imaging perception studies is often challenging. We have developed a suite of
software tools for manipulating medical tomographic image sets that overcome these difficulties. In our initial
development, abnormalities were removed or inserted on a slice-by-slice basis. To circumvent the problem with potential
artifacts in orthogonal views, we have redesigned the tools so that they operate in 3 dimensions. An operator controlled
ellipsoid mask region is used to select the removal and the replacement areas. This new approach has been validated on
PET data sets and has also been implemented for CT studies.
KEYWORDS: Medical imaging, Software development, Computed tomography, Inspection, Radiography, Control systems, Medical research, Java, Data transmission, Bone
We developed image presentation software that mimics the functionality available in the clinic, but also records time-stamped, observer-display interactions and is readily deployable on diverse workstations making it possible to collect comparable observer data at multiple sites. Commercial image presentation software for clinical use has limited application for research on image perception, ergonomics, computer-aids and informatics because it does not collect observer responses, or other information on observer-display interactions, in real time. It is also very difficult to collect observer data from multiple institutions unless the same commercial software is available at different sites. Our software not only records observer reports of abnormalities and their locations, but also inspection time until report, inspection time for each computed radiograph and for each slice of tomographic studies, window/level, and magnification settings used by the observer. The software is a modified version of the open source ImageJ software available from the National Institutes of Health. Our software involves changes to the base code and extensive new plugin code. Our free software is currently capable of displaying computed tomography and computed radiography images. The software is packaged as Java class files and can be used on Windows, Linux, or Mac systems. By deploying our software together with experiment-specific script files that administer experimental procedures and image file handling, multi-institutional studies can be conducted that increase reader and/or case sample sizes or add experimental conditions.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.