In computational pathology, random sampling of patches during training of Multiple Instance Learning (MIL) methods is computationally efficient and serves as a regularization strategy. Despite its promising benefits, questions concerning performance trends for varying sample sizes and its influence on model interpretability remain. Addressing these, we reach an optimal performance enhancement of 1.7% using thirty percent of patches on the CAMELYON16 dataset, and 3.7% with only eight samples on the TUPAC16 dataset. We also find interpretability effects are strongly dataset-dependent, with interpretability impacted on CAMELYON16, while remaining unaffected on TUPAC16. This reinforces that both the performance and interpretability relationships with sampling are closely task-specific. End-to-end training with 1024 samples reveals improvements across both datasets compared to pre-extracted features, further highlighting the potential of this efficient approach.
|