Dense vs. MoE gap is shrinking fast with the 3.6-27B release

27B Dense vs. 35B-A3B MoE):

- Dense still holds the crown: It still wins out on most tasks overall.

- The gap is closing: In 7 out of 10 benchmarks, the MoE model is quietly creeping up and closing the distance.

- Coding is getting a massive boost: MoE is making serious strides here. For example, the dense model's lead on the SWE-bench Multilingual benchmark dropped from +9.0 down to just +4.1.

- The one weird outlier: Terminal-Bench 2.0. For whatever reason, the dense model absolutely pulled ahead here, widening its lead from +1.1 to a massive +7.8.

TL;DR: Dense is still technically better, but MoE is catching up fast—especially for coding. If you're running on 24GB VRAM and want massive context windows, the trade-off for MoE is looking better than ever right now.

Thoughts?

Anyone tested the 256k context on the MoE yet?

More details.

Check more details in the link: https://x.com/i/status/2047004358500614152

submitted by /u/Usual-Carrot6352
[link] [comments]

Leave a Comment