| Spent some time on the SubQ launch today. Some things don't line up. RULER is reported at 128K, that's well below where sparse attention actually has to prove itself. Standard long-context evals should run at the lengths the marketing is advertising. MRCR v2 at 1M: research model 83, production 65.9. The drop alone says something about how the architecture survives serving. And 65.9 is below Opus 4.6 (78.3) and GPT-5.5 (74) on the same benchmark. The homepage says "without quality loss" but those numbers don't fit that claim. There's also a comparator selection issue. The blog prose cites Opus 4.7 at 32.2. The homepage table also lists Opus 4.6 at 78.3. Only the favorable one ends up in the narrative. The 52x faster than FlashAttention is a kernel-level comparison, not end-to-end inference. Fair architecture result on its own but most people will read it as wall-clock speed unless that's labeled. When does the technical report drop? I have no idea. But my bullshit radar is high on this one [link] [comments] |