cs.CL, cs.LG

Sample-efficient LLM Optimization with Reset Replay

arXiv:2508.06412v3 Announce Type: replace-cross
Abstract: Recent advancements in LLM post-training, particularly through reinforcement learning and preference optimization, are key to boosting their reasoning capabilities. However, these methods often…