Detecting the background is often one of the first tasks when analyzing whole slide images. However, these methods greatly vary and are generally not reported in sufficient detail for subsequent studies to be able to reproduce. Therefore, we sought to determine the effect of varying background detection methods on downstream performance metrics. Specifically, we applied attention-based multiple instance learning, CLAM, and Attention2majority to classify whole slide images from Camelyon16 using four different background detection methods. Our results show that performance metrics (i.e., accuracy and area under the curve - AUC) differ when different background detection methods are utilized. Furthermore, all classification methods perform worse when using hand-drawn tissue annotations for background detection. Finally, we show that depending on the background detection method, the best performing method changes. We conclude that background detection methods must be fully reported so that results can be accurately reproduced and techniques can be fairly compared.
|