Pyramid MoA: A Probabilistic Framework for Cost-Optimized Anytime Inference
arXiv:2602.19509v3 Announce Type: replace
Abstract: We observe that LLM cascading and routing implicitly solves an anytime computation problem — a class of algorithms, well-studied in classical AI, that improve solutions as additional computation is …