Oral cavity cancer is a common cancer that can result in breathing, swallowing, drinking, eating problems as well as speech impairment, and there is high mortality for the advanced stage. Its diagnosis is confirmed through histopathology. It is of critical importance to determine the need for biopsy and identify the correct location. Deep learning has demonstrated great promise/success in several image-based medical screening/diagnostic applications. However, automated visual evaluation of oral cavity lesions has received limited attention in the literature. Since the disease can occur in different parts of the oral cavity, a first step is to identify the images of different anatomical sites. We automatically generate labels for six sites which will help in lesion detection in a subsequent analytical module. We apply a recently proposed network called ResNeSt that incorporates channel-wise attention with multi-path representation and demonstrate high performance on the test set. The average F1-score for all classes and accuracy are both 0.96. Moreover, we provide a detailed discussion on class activation maps obtained from both correct and incorrect predictions to analyze algorithm behavior. The highlighted regions in the class activation maps generally correlate considerably well with the region of interest perceived and expected by expert human observers. The insights and knowledge gained from the analysis are helpful in not only algorithm improvement, but also aiding the development of the other key components in the process of computer assisted oral cancer screening.
|