/u/Spare_Pair_9198 - Provide.ai

Why MoE models keep converging on ~10B active parameters

/u/Spare_Pair_9198 / April 7, 2026

Interesting pattern: despite wildly different total sizes, many recent MoE models land around 10B active params. Qwen 3.5 122B activates 10B. MiniMax M2.7 runs 230B total with 10B active via Top 2 routing. Training cost scales as C ≈ 6 × N_active × T. …

Author name: /u/Spare_Pair_9198

Why MoE models keep converging on ~10B active parameters