From Attribution to Abstention: Training-Free Attention-Based Auditing for Clinical Summarization

arXiv:2601.16397v2 Announce Type: replace Abstract: Deploying multimodal large language models (MLLMs) for clinical summarization demands not only fluent generation but also transparency about where each statement originates-and a mechanism to flag when statements lack evidential support. We present ClinTrace, a training-free framework that extracts two clinically useful signals from the decoder attention weights that every transformer-based MLLM already produces during generation: (i) fine-grained source attributions linking each output sentence to supporting text spans or images, and (ii) per-sentence groundedness scores that identify poorly supported claims as candidate hallucinations. Both signals are derived from the same attention tensors in a single pass, requiring no retraining, no auxiliary models, and no additional inference cost. We evaluate on two clinical summarization tasks: doctor-patient dialogue summarization (CliConSummation) and radiology report summarization (MIMIC-CXR) using a general-purpose MLLM (Qwen3-8B) and a medical-finetuned model (HuatuoGPT-Vision-7B). For source attribution, ClinTrace achieves over 92% text F1 on radiology and 88% on dialogue summarization, substantially outperforming embedding-based and self-attribution baselines. For hallucination detection, groundedness scores achieve 0.77 AUROC with the medical-finetuned model: competitive with embedding-based confidence at zero additional cost, and enable an abstention mechanism that improves faithfulness from 61.7% to 72.6% by withholding the least: grounded 20% of output for clinician review. Notably, medical finetuning substantially improves the reliability of attention-based hallucination detection, suggesting that domain adaptation produces more semantically structured attention patterns amenable to self-auditing.

Leave a Comment