cs.AI, cs.LG

LLMs Gaming Verifiers: RLVR can Lead to Reward Hacking

arXiv:2604.15149v1 Announce Type: new
Abstract: As reinforcement Learning with Verifiable Rewards (RLVR) has become the dominant paradigm for scaling reasoning capabilities in LLMs, a new failure mode emerges: LLMs gaming verifiers. We study this phen…

cs.LG

TOPCELL: Topology Optimization of Standard Cell via LLMs

arXiv:2604.14237v1 Announce Type: new
Abstract: Transistor topology optimization is a critical step in standard cell design, directly dictating diffusion sharing efficiency and downstream routability. However, identifying optimal topologies remains a …

cs.CL, cs.LG

AdaSplash-2: Faster Differentiable Sparse Attention

arXiv:2604.15180v1 Announce Type: new
Abstract: Sparse attention has been proposed as a way to alleviate the quadratic cost of transformers, a central bottleneck in long-context training. A promising line of work is $\alpha$-entmax attention, a differ…

Scroll to Top