How Long Reasoning Chains Influence LLMs’ Judgment of Answer Factuality
arXiv:2604.06756v1 Announce Type: new
Abstract: Large language models (LLMs) has been widely adopted as a scalable surrogate for human evaluation, yet such judges remain imperfect and susceptible to surface-level biases. One possible reason is that th…