Online Conformal Abstention for Factuality Control Under Adversarial Bandit Feedback

arXiv:2506.14067v4 Announce Type: replace Abstract: As interactive generative systems are increasingly deployed in real-world applications, their tendency to generate unreliable or false responses raises serious concerns. Conformal abstention mitigates this risk by ensuring that the system answers only when confident. However, real-world deployments typically provide only partial user feedback (e.g., thumbs up/down) on the selected response and often operate in non-stationary or adversarial environments, for which effective learning methods are largely missing. To bridge this gap, we propose ExAUL, a novel online learning framework for conformal abstention with adversarial and partial feedback. Technically, we introduce (i) a novel conversion lemma}that translates the regret of any bandit algorithm into an FDR bound, and (ii) feedback unlocking, a strategy that exploits the structure of conformal abstention to extract additional learning signals from partial feedback. We prove that ExAUL achieves a regret bound of $O(\sqrt{T \ln |{H}|})$, which translates into an ${O}(\sqrt{T})$ bound on FDR risk control, matching the controllability of full-information settings despite receiving only partial feedback. While applicable to general generative tasks, we demonstrate the efficacy of ExAUL for ensuring the reliability of Large Language Models (LLMs) through empirical validation on question-answering tasks across diverse non-stationary and adversarial settings. Our results demonstrate that ExAUL robustly controls the FDR while maintaining competitive answering coverage.

Leave a Comment