Exploration-Driven Optimization for Test-Time Large Language Model Reasoning
arXiv:2605.09853v1 Announce Type: new
Abstract: Post-training techniques combined with inference-time scaling significantly enhance the reasoning and alignment capabilities of large language models (LLMs). However, a fundamental tension arises: infere…