Certain Head, Uncertain Tail: Expert-Sample for Test-Time Scaling in Fine-Grained MoE
arXiv:2602.02443v2 Announce Type: replace
Abstract: Test-time scaling improves LLM performance by generating multiple candidate solutions, yet token-level sampling requires temperature tuning that trades off diversity against stability. Fine-grained M…