| We ran open-weight 27B–32B models on Terminal-Bench 2.0 (89 tasks, One interesting find is that MOE models still has a order of magnitude of improve in terms of inference speeds. The interesting part isn't 38.2% in absolute terms — current verified SOTA is ~80% (GPT-5.5 / Opus 4.6 / Gemini 3.1 Pro). The interesting part is what 38.2% maps to in time. Anchoring on model release dates of verified leaderboard entries:
So today's best runnable-offline coding model lands roughly where the hosted frontier was in late 2025 — about a 6–8 month lag. That's the first time this has been close enough to matter for real deployments (regulated environments, air-gapped, on-prem CI, batch workloads). more details on our blog: https://antigma.ai/blog/2026/04/24/offline-coding-models [link] [comments] |