Anthropic locked Claude Code to native apps in Jan 2026. Are we still comparing models or just ecosystems

Anthropic locked Claude Code to native apps in Jan 2026. Are we still comparing models or just ecosystems

Quick context: both vendors publish SWE-bench scores on their own scaffolds. The same model swings 22 points depending on harness design — more than the gap between any two frontier models.

Since January 2026, Claude plans are restricted to Anthropic's native apps. Kilocode, Aider, any third-party tool: can't consume a Max subscription. Codex went the opposite direction and opened up to external integration.

Result: every "Claude crushes GPT on code" comparison is Claude on Claude Code vs GPT on a random harness. That's not a model comparison. That's a product comparison.

Opus 4.7 leads GPT-5.4 on SWE-bench Pro, 64.3% vs 57.7%, and third-party reproductions broadly confirm it. The model is probably good. But the subscription doesn't just fund the model — it locks you into the harness that makes the benchmark possible.

Ended up going with ChatGPT Pro myself. Not because GPT-5.4 is objectively better. Because it's the only $100/month plan that actually works with my open-source toolchain 🤷

submitted by /u/Fresh-Daikon-9408
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top