Could it be that this take is not too far fetched?

Sources:

- https://www.reddit.com/r/LocalLLaMA/comments/1sgd7fp/its_insane_how_lobotomized_opus_46_is_right_now/

- https://www.threads.com/@hasanahmad/post/DW2B7kRj1PB

- lots of people complaining that few weeks after launch, sota models degrade. Many speculate about: cost savings, strained compute, etc...

- we actually need a constant benchmark about this, but I think if the benchmark gets too notable AI providers (or even those that provide infrastructure for open weight models, as quantization and routing are a thing) could ensure that the accounts that do the benchmark get access to the full model.

The only two bench that I know of that track performances (that again become moot if the provider notices) are:

- https://marginlab.ai/trackers/claude-code-historical-performance/

- https://aistupidlevel.info/

E: Minor clarification point: I posted the graph, the one with black background where one can infer the "the three biggest competitors in the market are colluding to nerf themselves at the same time", to make it more legible, even if the graph itself is a meme to convey the concept (it is not a serious graph). If I would post only the threads screenshot, people could rightfully say that the graph is not legible because it would be too small. Personally I believe the chart from source is made up to convey the point in text visually (sort of a meme). I don't think it is based on real data.

submitted by /u/pier4r
[link] [comments]

Leave a Comment