cs.AI

A Survey of Scaling in Large Language Model Reasoning

arXiv:2504.02181v2 Announce Type: replace
Abstract: The rapid advancements in large Language models (LLMs) have significantly enhanced their reasoning capabilities, driven by various strategies such as multi-agent collaboration. However, unlike the we…

cs.LG

Super Apriel: One Checkpoint, Many Speeds

arXiv:2604.19877v1 Announce Type: new
Abstract: We release Super Apriel, a 15B-parameter supernet in which every decoder layer provides four trained mixer choices — Full Attention (FA), Sliding Window Attention (SWA), Kimi Delta Attention (KDA), and …

Scroll to Top