cs.LG

Data-Efficient RLVR via Off-Policy Influence Guidance

arXiv:2510.26491v2 Announce Type: replace
Abstract: Data selection is a critical aspect of Reinforcement Learning with Verifiable Rewards (RLVR) for enhancing the reasoning capabilities of large language models (LLMs). Current data selection methods a…

cs.LG

Guided Transfer Learning for Discrete Diffusion Models

arXiv:2512.10877v4 Announce Type: replace
Abstract: Discrete diffusion models (DMs) have achieved strong performance in language and other discrete domains, offering a compelling alternative to autoregressive modeling. Yet this performance typically d…

Scroll to Top