Biological Spatial Priors Regularize Foundation Model Representations for Cross-Site MSI Generalization in Colorectal Cancer
arXiv:2605.02660v1 Announce Type: cross
Abstract: Predicting microsatellite instability (MSI) status from routine hematoxylin and eosin (H&E) whole slide images (WSIs) offers a practical alternative to molecular testing, but models trained at one institution tend to generalize poorly to slides acquired at a different site. Foundation model representations, despite their generality, still encode site-specific texture alongside the conserved biological morphology underlying MSI. We investigate whether tile-level spatial priors derived from known MSI histology can guide these representations toward more site-invariant features. We introduce a biologically motivated spatial prior based on peripheral distance encoding, reflecting the Crohn's-like peripheral lymphocytic reaction at the tumor invasive margin, and evaluate a secondary local immune neighborhood encoding reflecting the lymphocyte-to-tumor ratio in each tile's immediate spatial neighborhood. Both priors are injected into a TransMIL aggregator before self-attention, allowing the transformer to integrate spatial biological context with UNI2-h or Virchow2 features across all attention layers. We evaluate six foundation model and MIL aggregator combinations as a reference, then assess the effect of each spatial prior. Training on TCGA-COAD (137 slides) and evaluating externally on TCGA-READ (50 slides) without retraining, peripheral distance encoding achieves MSI AUC 0.959 +/- 0.012 on COAD and MSS specificity 1.000 on READ, compared to 0.957 and 0.939 for the strongest reference configuration. Local immune neighborhood encoding achieves comparable internal AUC but lower cross-site specificity, suggesting margin proximity encodes a more site-invariant biological signal than local immune density. Results suggest biologically grounded spatial priors act as regularizers that reduce reliance on site-specific imaging patterns.