Efficient Test-Time Inference via Deterministic Exploration of Truncated Decoding Trees
arXiv:2604.20500v1 Announce Type: new
Abstract: Self-consistency boosts inference-time performance by sampling multiple reasoning traces in parallel and voting. However, in constrained domains like math and code, this strategy is compute-inefficient b…