interwhen: A Generalizable Framework for Steering Reasoning Models with Test-time Verification
arXiv:2602.11202v3 Announce Type: replace-cross
Abstract: Reasoning models produce long traces of intermediate decisions and tool calls, making test-time verification important for ensuring correctness. Existing approaches either verify only the final…