How Language Models Process Out-of-Distribution Inputs: A Two-Pathway Framework
arXiv:2605.00269v1 Announce Type: new
Abstract: Recent white-box OOD detection methods for LLMs — including CED, RAUQ, and WildGuard confidence scores — appear effective, but we show they are structurally confounded by sequence length (|r| >= 0.61) …