cs.AI, cs.CL, cs.SE

Revisiting the Reliability of Language Models in Instruction-Following

arXiv:2512.14754v2 Announce Type: replace-cross
Abstract: Advanced LLMs have achieved near-ceiling instruction-following accuracy on benchmarks such as IFEval. However, these impressive scores do not necessarily translate to reliable services in real-…