Anthropic locked Claude Code to native apps in Jan 2026. Are we still comparing models or just ecosystems

Quick context: both vendors publish SWE-bench scores on their own scaffolds. The same model swings 22 points depending on harness design — more than the gap between any two frontier models.

Since January 2026, Claude plans are restricted to Anthropic's native apps. Kilocode, Aider, any third-party tool: can't consume a Max subscription. Codex went the opposite direction and opened up to external integration.

Result: every "Claude crushes GPT on code" comparison is Claude on Claude Code vs GPT on a random harness. That's not a model comparison. That's a product comparison.

Opus 4.7 leads GPT-5.4 on SWE-bench Pro, 64.3% vs 57.7%, and third-party reproductions broadly confirm it. The model is probably good. But the subscription doesn't just fund the model — it locks you into the harness that makes the benchmark possible.

Ended up going with ChatGPT Pro myself. Not because GPT-5.4 is objectively better. Because it's the only $100/month plan that actually works with my open-source toolchain 🤷

submitted by /u/Fresh-Daikon-9408
[link] [comments]

Leave a Comment