Super Apriel: One Checkpoint, Many Speeds
arXiv:2604.19877v1 Announce Type: new
Abstract: We release Super Apriel, a 15B-parameter supernet in which every decoder layer provides four trained mixer choices — Full Attention (FA), Sliding Window Attention (SWA), Kimi Delta Attention (KDA), and …