From Laboratory to Real-World Applications: Benchmarking Agentic Code Reasoning at the Repository Level
arXiv:2601.03731v3 Announce Type: replace-cross
Abstract: As large language models (LLMs) evolve into autonomous agents, evaluating repository-level reasoning, the ability to maintain logical consistency across massive, real-world, interdependent file…