cs.AI, cs.LG

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

arXiv:2604.15149v1 Announce Type: new
Abstract: As reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for scaling reasoning capabilities in LLMs, a new failure mode emerges: LLMs gaming verifiers. We study this phen…

cs.CL, cs.LG

AdaSplash-2: Faster Differentiable Sparse Attention

arXiv:2604.15180v1 Announce Type: new
Abstract: Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context training. A promising line of work is $\alpha$-entmax attention, a differ…

cs.AI

AI-Assisted Peer Review at Scale: The AAAI-26 AI Review Pilot

arXiv:2604.13940v1 Announce Type: new
Abstract: Scientific peer review faces mounting strain as submission volumes surge, making it increasingly difficult to sustain review quality, consistency, and timeliness. Recent advances in AI have led the commu…

Scroll to Top