For deep learning-based machine learning, not only are large and sufficiently diverse data crucial but their good qualities are equally important. However, in real-world applications, it is very common that raw source data may contain incorrect, noisy, inconsistent, improperly formatted and sometimes missing elements, particularly, when the datasets are large and sourced from many sites. In this paper, we present our work towards preparing and making image data ready for the development of AI-driven approaches for studying various aspects of the natural history of oral cancer. Specifically, we focus on two aspects: 1) cleaning the image data; and 2) extracting the annotation information. Data cleaning includes removing duplicates, identifying missing data, correcting errors, standardizing data sets, and removing personal sensitive information, toward combining data sourced from different study sites. These steps are often collectively referred to as data harmonization. Annotation information extraction includes identifying crucial or valuable texts that are manually entered by clinical providers related to the image paths/names and standardizing of the texts of labels. Both are important for the successful deep learning algorithm development and data analyses. Specifically, we provide details on the data under consideration, describe the challenges and issues we observed that motivated our work, present specific approaches and methods that we used to clean and standardize the image data and extract labelling information. Further, we discuss the ways to increase efficiency of the process and the lessons learned. Research ideas on automating the process with ML-driven techniques are also presented and discussed. Our intent in reporting and discussing such work in detail is to help provide insights in automating or, minimally, increasing the efficiency of these critical yet often under-reported processes.
Oral cavity cancer is a common cancer that can result in breathing, swallowing, drinking, eating problems as well as speech impairment, and there is high mortality for the advanced stage. Its diagnosis is confirmed through histopathology. It is of critical importance to determine the need for biopsy and identify the correct location. Deep learning has demonstrated great promise/success in several image-based medical screening/diagnostic applications. However, automated visual evaluation of oral cavity lesions has received limited attention in the literature. Since the disease can occur in different parts of the oral cavity, a first step is to identify the images of different anatomical sites. We automatically generate labels for six sites which will help in lesion detection in a subsequent analytical module. We apply a recently proposed network called ResNeSt that incorporates channel-wise attention with multi-path representation and demonstrate high performance on the test set. The average F1-score for all classes and accuracy are both 0.96. Moreover, we provide a detailed discussion on class activation maps obtained from both correct and incorrect predictions to analyze algorithm behavior. The highlighted regions in the class activation maps generally correlate considerably well with the region of interest perceived and expected by expert human observers. The insights and knowledge gained from the analysis are helpful in not only algorithm improvement, but also aiding the development of the other key components in the process of computer assisted oral cancer screening.
Cervical cancer disproportionately hurts underserved women from disadvantaged communities. Automated visual evaluation (AVE), which analyzes white light cervical images using machine learning, is being considered for management of screen-positive patients. Gaussian noise was identified as degrading AVE performance. Two noise correction approaches were tested on images from historic data with added Gaussian noise. One denoising method (VDNet) was based on neural networks; the other used conventional Gaussian blur filtering. Images were evaluated by an object detection network (RetinaNet), and by a binary pathology ResNeSt classifier. VDNet filtering limited AVE performance degradation at higher noise levels, while Guassian blur only worked on low noise levels.
Liquid Based Cytology (LBC) is an effective technique for cervical cancer screening through the Papanicolaou (Pap) test. Currently, most LBC screening is done by cytologists, which is very time consuming and expensive. Reliable automated methods are needed to assist cytologists to quickly locate abnormal cells. State of the art in cell classification assumes that cells have already been segmented. However, clustered cells are very challenging to segment. We noticed that in contrast to cells, nuclei are relatively easier to segment, and according to The Bethesda System (TBS), the gold standard for cervical cytology reporting, cervical cytology abnormalities are often closely correlated with nucleus abnormalities. We propose a two-step algorithm, which avoids cell segmentation. We train a Mask R-CNN model to segment nuclei, and then classify cell patches centered at the segmented nuclei in roughly the size of a healthy cell. Evaluation with a dataset of 25 high resolution NDPI whole slide images shows that nuclei segmentation followed by cell patch classification is a promising approach to build practically useful automated Pap test applications.
For automated evaluation of changes on uterine cervix, the external os (here simply os) is a primary anatomical landmark in locating the transformation zone (T-zone). Any abnormal tissue changes typically occur at or within the T-zone. This makes localizing the os on cervical images of great interest for detecting and classifying changes. However, there has been very limited work reported on segmentation of the os region in digitized cervix images, and to our knowledge no work has been done on sets of cervix images acquired from independent data collections exhibiting variabilities due to collection devices, environments, and procedures. In this paper, we present a process pipeline which consists of deep learning os region segmentation over such multiple datasets, followed by comprehensive evaluation of the performance. First, we evaluate of two state-of-the-art deep learning-based localization and classification algorithms, viz., Mask R-CNN and MaskX R-CNN, on multiple datasets. Second, in consideration of the os being small and irregularly-shaped, and of the variabilities in image quality, we use performance measurements beyond the commonly used DICE/IoU scores. We obtain higher performance, on a larger dataset, as compared with the work reported in the literature, and achieve a highest detection rate of 99.1% and an average minimal distance of 1.02 pixels. Furthermore, the network models we obtained in this study show potential use of quality control for data acquisition.
In this paper, we present a method for automatically identifying the gender of an imaged person using their frontal chest x-ray images. Our work is motivated by the need to determine missing gender information in some datasets. The proposed method employs the technique of convolutional neural network (CNN) based deep learning and transfer learning to overcome the challenge of developing handcrafted features in limited data. Specifically, the method consists of four main steps: pre-processing, CNN feature extractor, feature selection, and classifier. The method is tested on a combined dataset obtained from several sources with varying acquisition quality resulting in different pre-processing steps that are applied for each. For feature extraction, we tested and compared four CNN architectures, viz., AlexNet, VggNet, GoogLeNet, and ResNet. We applied a feature selection technique, since the feature length is larger than the number of images. Two popular classifiers: SVM and Random Forest, are used and compared. We evaluated the classification performance by cross-validation and used seven performance measures. The best performer is the VggNet-16 feature extractor with the SVM classifier, with accuracy of 86.6% and ROC Area being 0.932 for 5-fold cross validation. We also discuss several misclassified cases and describe future work for performance improvement.
Chest radiography (CXR) has been used as an effective tool for screening tuberculosis (TB). Because of the lack of radiological expertise in resource-constrained regions, automatic analysis of CXR is appealing as a "first reader". In addition to screening the CXR for disease, it is critical to highlight locations of the disease in abnormal CXRs. In this paper, we focus on the task of locating TB in CXRs which is more challenging due to the intrinsic difficulty of locating the abnormality. The method is based on applying a convolutional neural network (CNN) to classify the superpixels generated from the lung area. Specifically, it consists of four major components: lung ROI extraction, superpixel segmentation, multi-scale patch generation/labeling, and patch classification. The TB regions are located by identifying those superpixels whose corresponding patches are classified as abnormal by the CNN. The method is tested on a publicly available TB CXR dataset which contains 336 TB images showing various manifestations of TB. The TB regions in the images were marked by radiologists. To evaluate the method, the images are split into training, validation, and test sets with all the manifestations being represented in each set. The performance is evaluated at both the patch level and image level. The classification accuracy on the patch test set is 72.8% and the average Dice index for the test images is 0.67. The factors that may contribute to misclassification are discussed and directions for future work are addressed.
According to the World Health Organization (WHO), tuberculosis (TB) remains the most deadly infectious disease in the world. In a 2015 global annual TB report, 1.5 million TB related deaths were reported. The conditions worsened in 2016 with 1.7 million reported deaths and more than 10 million people infected with the disease. Analysis of frontal chest X-rays (CXR) is one of the most popular methods for initial TB screening, however, the method is impacted by the lack of experts for screening chest radiographs. Computer-aided diagnosis (CADx) tools have gained significance because they reduce the human burden in screening and diagnosis, particularly in countries that lack substantial radiology services. State-of-the-art CADx software typically is based on machine learning (ML) approaches that use hand-engineered features, demanding expertise in analyzing the input variances and accounting for the changes in size, background, angle, and position of the region of interest (ROI) on the underlying medical imagery. More automatic Deep Learning (DL) tools have demonstrated promising results in a wide range of ML applications. Convolutional Neural Networks (CNN), a class of DL models, have gained research prominence in image classification, detection, and localization tasks because they are highly scalable and deliver superior results with end-to-end feature extraction and classification. In this study, we evaluated the performance of CNN based DL models for population screening using frontal CXRs. The results demonstrate that pre-trained CNNs are a promising feature extracting tool for medical imagery including the automated diagnosis of TB from chest radiographs but emphasize the importance of large data sets for the most accurate classification.
Tuberculosis (TB) is a severe comorbidity of HIV and chest x-ray (CXR) analysis is a necessary step in screening
for the infective disease. Automatic analysis of digital CXR images for detecting pulmonary abnormalities is
critical for population screening, especially in medical resource constrained developing regions. In this article,
we describe steps that improve previously reported performance of NLM’s CXR screening algorithms and help
advance the state of the art in the field. We propose a local-global classifier fusion method where two complementary
classification systems are combined. The local classifier focuses on subtle and partial presentation of the
disease leveraging information in radiology reports that roughly indicates locations of the abnormalities. In addition,
the global classifier models the dominant spatial structure in the gestalt image using GIST descriptor for
the semantic differentiation. Finally, the two complementary classifiers are combined using linear fusion, where
the weight of each decision is calculated by the confidence probabilities from the two classifiers. We evaluated
our method on three datasets in terms of the area under the Receiver Operating Characteristic (ROC) curve,
sensitivity, specificity and accuracy. The evaluation demonstrates the superiority of our proposed local-global
fusion method over any single classifier.
We present a technique to annotate multiple organs shown in 2-D abdominal/pelvic CT images using CBIR. This annotation task is motivated by our research interests in visual question-answering (VQA). We aim to apply results from this effort in Open-iSM, a multimodal biomedical search engine developed by the National Library of Medicine (NLM). Understanding visual content of biomedical images is a necessary step for VQA. Though sufficient annotational information about an image may be available in related textual metadata, not all may be useful as descriptive tags, particularly for anatomy on the image. In this paper, we develop and evaluate a multi-label image annotation method using CBIR. We evaluate our method on two 2-D CT image datasets we generated from 3-D volumetric data obtained from a multi-organ segmentation challenge hosted in MICCAI 2015. Shape and spatial layout information is used to encode visual characteristics of the anatomy. We adapt a weighted voting scheme to assign multiple labels to the query image by combining the labels of the images identified as similar by the method. Key parameters that may affect the annotation performance, such as the number of images used in the label voting and the threshold for excluding labels that have low weights, are studied. The method proposes a coarse-to-fine retrieval strategy which integrates the classification with the nearest-neighbor search. Results from our evaluation (using the MICCAI CT image datasets as well as figures from Open-i) are presented.
The National Library of Medicine (NLM) has made a collection of over a 1.2 million research articles containing 3.2
million figure images searchable using the Open-iSM multimodal (text+image) search engine. Many images are visible
light photographs, some of which are images containing faces (“face images”). Some of these face images are acquired
in unconstrained settings, while others are studio photos. To extract the face regions in the images, we first applied one
of the most widely-used face detectors, a pre-trained Viola-Jones detector implemented in Matlab and OpenCV. The
Viola-Jones detector was trained for unconstrained face image detection, but the results for the NLM database included
many false positives, which resulted in a very low precision. To improve this performance, we applied a deep learning
technique, which reduced the number of false positives and as a result, the detection precision was improved
significantly. (For example, the classification accuracy for identifying whether the face regions output by this Viola-
Jones detector are true positives or not in a test set is about 96%.) By combining these two techniques (Viola-Jones and
deep learning) we were able to increase the system precision considerably, while avoiding the need to manually construct
a large training set by manual delineation of the face regions.
This study proposes a novel automated method for cardiomegaly detection in chest X-rays (CXRs). The algo- rithm has two main stages: i) heart and lung region localization on CXRs, and ii) radiographic index extraction from the heart and lung boundaries. We employed a lung detection algorithm and extended it to automatically compute the heart boundaries. The typical models of heart and lung regions are learned using a public CXR dataset with boundary markings. The method estimates the location of these regions in candidate ('patient') CXR images by registering models to the patient CXR. For the radiographic index computation, we implemented the traditional and recently published indexes in the literature. The method is tested on a database with 250 abnormal, and 250 normal CXRs. The radiographic indexes are combined through a classifier, and the method successfully classifies the patients with cardiomegaly with a 0:77 accuracy, 0:77 sensitivity and 0:76 specificity.
Modality filtering is an important feature in biomedical image searching systems and may significantly improve the retrieval performance of the system. This paper presents a new method for extracting endoscopic image figures from photograph images in biomedical literature, which are found to have highly diverse content and large variability in appearance. Our proposed method consists of three main stages: tissue image extraction, endoscopic image candidate extraction, and ophthalmic image filtering. For tissue image extraction we use image patch level clustering and MRF relabeling to detect images containing skin/tissue regions. Next, we find candidate endoscopic images by exploiting the round shape characteristics that commonly appear in these images. However, this step needs to compensate for images where endoscopic regions are not entirely round. In the third step we filter out the ophthalmic images which have shape characteristics very similar to the endoscopic images. We do this by using text information, specifically, anatomy terms, extracted from the figure caption. We tested and evaluated our method on a dataset of 115,370 photograph figures, and achieved promising precision and recall rates of 87% and 84%, respectively.
Accuracy of content-based image retrieval is affected by image resolution among other factors. Higher resolution images
enable extraction of image features that more accurately represent the image content. In order to improve the relevance
of search results for our biomedical image search engine, Open-I, we have developed techniques to extract and label
high-resolution versions of figures from biomedical articles supplied in the PDF format. Open-I uses the open-access
subset of biomedical articles from the PubMed Central repository hosted by the National Library of Medicine. Articles
are available in XML and in publisher supplied PDF formats. As these PDF documents contain little or no meta-data to
identify the embedded images, the task includes labeling images according to their figure number in the article after they
have been successfully extracted. For this purpose we use the labeled small size images provided with the XML web
version of the article. This paper describes the image extraction process and two alternative approaches to perform image
labeling that measure the similarity between two images based upon the image intensity projection on the coordinate
axes and similarity based upon the normalized cross-correlation between the intensities of two images. Using image
identification based on image intensity projection, we were able to achieve a precision of 92.84% and a recall of 82.18%
in labeling of the extracted images.
“Imaging signs” are a critical part of radiology’s language. They not only are important for conveying diagnosis, but may
also aid in indexing radiology literature and retrieving relevant cases and images. Here we report our work towards
representing and categorizing imaging signs of abdominal abnormalities in figures in the radiology literature. Given a
region-of-interest (ROI) from a figure, our goal was to assign a correct imaging sign label to that ROI from the following
seven: accordion, comb, ring, sandwich, small bowel feces, target, or whirl. As training and test data, we created our
own “gold standard” dataset of regions containing imaging signs. We computed 2997 feature attributes to represent
imaging sign characteristics for each ROI in training and test sets. Following feature selection they were reduced to 70
attributes and were input to a Support Vector Machine classifier. We applied image-enhancement methods to
compensate for variable quality of the images in radiology articles. In particular we developed a method for automatic
detection and removal of pointers/markers (arrows, arrowheads, and asterisk symbols) on the images. These
pointers/markers are valuable for approximately locating ROIs; however, they degrade the classification because they are
often (partially) included in the training ROIs. On a test set of 283 ROIs, our method achieved an overall accuracy of
70% in labeling the seven signs, which we believe is a promising result for using imaging signs to search/retrieve
radiology literature. This work is also potentially valuable for the creation of a visual ontology of biomedical imaging
entities.
Articles in the literature routinely describe advances in Content Based Image Retrieval (CBIR) and its potential for
improving clinical practice, biomedical research and education. Several systems have been developed to address
particular needs, however, surprisingly few are found to be in routine practical use. Our collaboration with the
National Cancer Institute (NCI) has identified a need to develop tools to annotate and search a collection of over
100,000 cervigrams and related, anonymized patient data. One such tool developed for a projected need for
retrieving similar patient images is the prototype CBIR system, called CervigramFinder, which retrieves images
based on the visual similarity of particular regions on the cervix. In this article we report the outcomes from a
usability study conducted at a primary meeting of practicing experts. We used the study to not only evaluate the
system for software errors and ease of use, but also to explore its "user readiness", and to identify obstacles that
hamper practical use of such systems, in general. Overall, the participants in the study found the technology
interesting and bearing great potential; however, several challenges need to be addressed before the technology can
be adopted.
Uterine cervix image analysis is of great importance to the study of uterine cervix cancer, which is among the
leading cancers affecting women worldwide. In this paper, we describe our proof-of-concept, Web-accessible
system for automated segmentation of significant tissue regions in uterine cervix images, which also demonstrates
our research efforts toward promoting collaboration between engineers and physicians for medical image analysis
projects. Our design and implementation unifies the merits of two commonly used languages, MATLAB and Java. It
circumvents the heavy workload of recoding the sophisticated segmentation algorithms originally developed in MATLAB into Java while allowing remote users who are not experienced programmers and algorithms developers to apply those processing methods to their own cervicographic images and evaluate the algorithms. Several other practical issues of the systems are also discussed, such as the compression of images and the format of the segmentation results.
The National Library of Medicine (NLM), in collaboration with the National Cancer Institute (NCI), is creating a large digital repository of cervicographic images for the study of uterine cervix cancer prevention. One of the research goals is to automatically detect diagnostic bio-markers in these images. Reliable bio-marker segmentation in large biomedical image collections is a challenging task due to the large variation in image appearance. Methods described in this paper focus on segmenting mosaicism, which is an important vascular feature used to visually assess the degree of cervical intraepithelial neoplasia. The proposed approach uses support vector machines (SVM) trained on a ground truth dataset annotated by medical experts (which circumvents the need for vascular structure extraction). We have evaluated the performance of the proposed algorithm and experimentally demonstrated its feasibility.
In this paper, we propose a new method for automated detection and segmentation of different tissue types in digitized uterine cervix images using mean-shift clustering and support vector machines (SVM) classification on cluster features. We specifically target the segmentation of precancerous lesions in a NCI/NLM archive of 60,000 cervigrams. Due to large variations in image appearance in the archive, color and texture features of a tissue type in one image often overlap with that of a different tissue type in another image. This makes reliable tissue segmentation in a large number of images a very challenging problem. In this paper, we propose the use of powerful machine learning techniques such as Support Vector Machines (SVM) to learn, from a database with ground truth annotations, critical visual signs that correlate with important tissue types and to use the learned classifier for tissue segmentation in unseen images. In our experiments, SVM performs better than un-supervised methods such as Gaussian Mixture clustering, but it does not scale very well to large training sets and does not always guarantee improved performance given more training data. To address this problem, we combine SVM and clustering so that the features we extracted for classification are features of clusters returned by the mean-shift clustering algorithm. Compared to classification using individual pixel features, classification by cluster features greatly reduces the dimensionality of the problem, thus it is more efficient while producing results with comparable accuracy.
Content-based image retrieval (CBIR) is the process of retrieving images by directly using image visual characteristics.
In this paper, we present a prototype system implemented for CBIR for a uterine cervix image (cervigram) database.
This cervigram database is a part of data collected in a multi-year longitudinal effort by the National Cancer Institute
(NCI), and archived by the National Library of Medicine (NLM), for the study of the origins of, and factors related to,
cervical precancer/cancer. Users may access the system with any Web browser. The system is built with a distributed
architecture which is modular and expandable; the user interface is decoupled from the core indexing and retrieving
algorithms, and uses open communication standards and open source software. The system tries to bridge the gap
between a user's semantic understanding and image feature representation, by incorporating the user's knowledge.
Given a user-specified query region, the system returns the most similar regions from the database, with respect to
attributes of color, texture, and size. Experimental evaluation of the retrieval performance of the system on "groundtruth"
test data illustrates its feasibility to serve as a possible research tool to aid the study of the visual characteristics of
cervical neoplasia.
Cervicography is a technique for visual screening of uterine cervix images for cervical cancer. One of our research goals
is the automated detection in these images of acetowhite (AW) lesions, which are sometimes correlated with cervical
cancer. These lesions are characterized by the whitening of regions along the squamocolumnar junction on the cervix
when treated with 5% acetic acid. Image preprocessing is required prior to invoking AW detection algorithms on
cervicographic images for two reasons: (1) to remove Specular Reflections (SR) caused by camera flash, and (2) to
isolate the cervix region-of-interest (ROI) from image regions that are irrelevant to the analysis. These image regions
may contain medical instruments, film markup, or other non-cervix anatomy or regions, such as vaginal walls. We have
qualitatively and quantitatively evaluated the performance of alternative preprocessing algorithms on a test set of 120
images. For cervix ROI detection, all approaches use a common feature set, but with varying combinations of feature
weights, normalization, and clustering methods. For SR detection, while one approach uses a Gaussian Mixture Model
on an intensity/saturation feature set, a second approach uses Otsu thresholding on a top-hat transformed input image.
Empirical results are analyzed to derive conclusions on the performance of each approach.
Digital colposcopy is an emerging technology, replacing the traditional colposcope for diagnosis of cervical lesions. Incorporating automated algorithms within a digital colposcopy system can improve the reliability and the diagnostic accuracy of cervical precancer and cancer. An automated computer-aided diagnosis (CAD) system can assess the three important cervical diagnostic cues: the color, the vascular patterns and the lesion margins with quantitative measures, similar to the way colposcopists use the Reid's index in traditional colposcopy. In this work we present a novel way to analyze and classify the global and the local features of one of the three major components in colposcopy diagnosis - the lesion margins. The margins of cervical lesion can be described as 'feathered,' 'geographic,' 'satellite,' 'regular or smooth' and 'margin-in-margin,' or they can be of mixed type. As margin characterization is a complex task, we use irregularity descriptors such as compactness indices and curvature descriptors. To address the complexity of the problem, the dependency of scale and the position of the lesion on the cervical image, our method use novel Fourier energy descriptors. The conceptually complex analysis of describing lesions as 'satellite' lesions or lesions with multiple margins is performed using descriptors, where the distance, the position and the local statistical estimates of image intensity play important role. We trained this new algorithm to classify and diagnose the cervix, evaluating only the lesions. The accuracy of the results is assessed against a 'ground truth' scheme, using colposcopists' annotations and pathology results. We report the resulted accuracy of the classification method assessed against this scheme.
Access to the requested content is limited to institutions that have purchased or subscribe to SPIE eBooks.
You are receiving this notice because your organization may not have SPIE eBooks access.*
*Shibboleth/Open Athens users─please
sign in
to access your institution's subscriptions.
To obtain this item, you may purchase the complete book in print or electronic format on
SPIE.org.
INSTITUTIONAL Select your institution to access the SPIE Digital Library.
PERSONAL Sign in with your SPIE account to access your personal subscriptions or to use specific features such as save to my library, sign up for alerts, save searches, etc.