The Model Knows, the Decoder Finds: Future Value Guided Particle Power Sampling
arXiv:2605.02427v1 Announce Type: new
Abstract: A recurring pattern in “reasoning without training” is that base LLMs already assign non-trivial probability mass to correct multi-step solutions; the bottleneck is locating these modes efficiently at in…