In many fields of medical imaging, image segmentation is required as a basis for further analysis and diagnosis. Convolutional neural networks are a promising approach providing high accuracy. However, large-scale annotated datasets are necessary to train these networks. As expert annotations are costly, crowdsourcing has shown to be an adequate alternative. In previous work, we examined how the workforce of a crowd should be distributed for obtaining annotations with an optimal trade-off between quantity and quality. In this work, we present a gamification approach by transforming the tedious task of image segmentation into a game. This approach aims at motivating users by having fun but nevertheless generating annotations of adequate quality. Therefore, this work presents a gamified crowdsourcing concept for medical image segmentation. We give an overview of incentives applied in state-of-the-art literature and propose two different gamification approaches on how the image segmentation task can be realized as a game. Finally, we propose a integrated game concept that combines both approaches with the following incentives: (a) points / scoring to reward users instantly for accurate segmentations, (b) leaderboard / rankings to let users accumulate scores for long-term motivation, (c) badges / achievements to give users a visual representation of their ”strength” in segmentation, and (d) levels to visualize the learning curve of users in performing the segmentation. We give details on a first prototype implementation and describe how the game concept complies with the guidelines from our prior work.
For medical image segmentation, deep learning approaches using convolutional neural networks (CNNs) are currently superseding classical methods. For good accuracy, large annotated training data sets are required. As expert annotations are costly to acquire, crowdsourcing–obtaining several annotations from a large group of non-experts–has been proposed. Medical applications, however, require a high accuracy of the segmented regions. It is agreed that a larger training set yields increased CNN performance. However, it is unclear, to which quality standards the annotations need to comply to for sufficient accuracy. In case of crowdsourcing, this translates to the question on how many annotations per image need to be obtained. In this work, we investigate the effect of the annotation quality used for model training on the predicted results of a CNN. Several annotation sets with different quality levels were generated using the Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm on crowdsourced segmentations. CNN models were trained using these annotations and the results were compared to a ground-truth. It was found that increasing annotation quality results in a better performance of the CNN in a logarithmic way. Furthermore, we evaluated whether a higher number of annotations can compensate lower annotation quality by comparing CNN predictions from models trained on differently sized training data sets. We found that when a minimum quality of at least 3 annotations per image can be acquired, it is more efficient to then distribute crowdsourced annotations over as many images as possible. The results can serve as a guideline for the image assignment mechanism of future crowdsourcing applications. The usage of gamification, i.e., getting users to segment as many images of a data set as possible for fun, is motivated.
The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm is frequently used in medical image segmentation without available ground truth (GT). In this paper, we investigate the number of inexperi- enced users required to establish a reliable STAPLE-based GT and the number of vertices the user’s shall place for a point-based segmentation. We employ “WeLineation”, a novel web-based system for crowdsourcing seg- mentations. Within the study, 2,060 masks have been delivered by 44 users on 75 different photographic images of the human eye, where users had to segment the sclera. For all masks, GT was estimated using STAPLE. Then, STAPLE is computed using fewer user contributions and results are compared to the GT. Requiring an error rate lower than 2%, same segmentation performance is obtained with 13 experienced and 22 rather inexperienced users. More than 10 vertices shall be placed on the delineation contour in order to reach an accuracy larger than 95%. In average, a vertex along the segmentation contour shall be placed every 81 pixels. The results indicate that knowledge about the users performance can reduce the number of segmentation masks per image, which are needed to estimate reliable GT. Therefore, gathering performance parameters of users during a crowdsourcing study and applying this information to the assignment process is recommended. In this way, benefits in the cost-effectiveness of a crowdsourcing segmentation study can be achieved.
Crowdsourcing is a concept to encourage humans all over the world to generate ground truth for classification data such as images. While frameworks for binary and multi-label classification exist, crowdsourcing of medical image segmentation is covered only by few work. In this paper, we present a web-based platform supporting scientists of various domains to obtain segmentations, which are close to ground-truth references. The system is composed of frontend, authentication, management, processing, and persistence layers which are implemented combining various javascript tools, the django web framework, an asynchronous celery task, and a PostgreSQL database, respectively. It is deployed on a kubernetes cluster. A set of image data accompanied by a task instruction can be uploaded. Users can be invited or subscribe to join in. After passing a guided tutorial of pre- segmented example images, segmentations can be obtained from non-expert users from all over the world. The Simultaneous Truth and Performance Level Estimation (STAPLE) algorithm generates estimated ground truth segmentation masks and evaluates the users performance continuously in the backend. As a proof of concept, a test-study with 75 photographs of human eyes was performed by 44 users. In just a few days, 2,060 segmentation masks with a total of 52,826 vertices along the mask contour have been collected.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.