Beyond Verifiable Rewards: Rubric-Based GRM for Reinforced Fine-Tuning SWE Agents
arXiv:2604.16335v1 Announce Type: new
Abstract: Despite recent progress in Large Language Model (LLM) Agents for Software Engineering (SWE) tasks, end-to-end fine-tuning typically relies on verifiable terminal rewards such as whether all unit tests pa…