Adapting Medical Vision Foundation Models for Volumetric Medical Image Segmentation via Active Learning and Selective Semi-supervised Fine-tuning

arXiv:2509.10784v3 Announce Type: replace-cross Abstract: Medical vision foundation models remain limited in downstream tasks, particularly volumetric medical image segmentation. While fine-tuning on labeled target-domain data improves performance, existing approaches typically rely on randomly selected samples, which may fail to identify the most informative data and thus hinder adaptation. To address the limitations, we propose an Active Selective Semi-supervised Fine-tuning framework for efficient adaptation of Med-VFMs to generalize across volumetric medical image segmentation. ASSFT integrates a novel active learning strategy with selective semi-supervised learning to maximize adaptation performance under a limited annotation budget, without requiring access to source data. Specifically, we introduce an Active Test-Time Sample Query strategy that identifies informative samples from the target domain using two complementary query metrics: Diversified Knowledge Divergence and Anatomical Segmentation Difficulty. DKD quantifies both the knowledge gap between pre-training and target domains and the semantic diversity within the target dataset, enabling the selection of samples that contain previously unlearned knowledge while maintaining intra-domain diversity. ASD estimates the segmentation difficulty of target anatomical structures by measuring predictive uncertainty within foreground regions of interest, allowing the model to prioritize samples with complex anatomical patterns rather than those dominated by background uncertainty. Second, we propose a Selective Semi-supervised Fine-tuning strategy to further improve adaptation performance by leveraging unlabeled target samples. Instead of utilizing all pseudo-labeled data, the proposed method selectively incorporates reliable unlabeled samples based on predictive confidence and semantic distance to labeled samples, enabling stable semi-supervised training while avoiding noisy pseudo-labels.

Leave a Comment