Subramanyam Sahoo - Provide.ai

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs

Subramanyam Sahoo / April 14, 2026

arXiv:2604.10585v1 Announce Type: cross
Abstract: Modern large language models (LLMs) are increasingly fine-tuned via reinforcement learning from human feedback (RLHF) or related reward optimisation schemes. While such procedures improve perceived hel…

Author name: Subramanyam Sahoo

Calibration Collapse Under Sycophancy Fine-Tuning: How Reward Hacking Breaks Uncertainty Quantification in LLMs