Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Differences Between Kimi K2.5 and Kimi K2.6 on MineBench

Some Notes:

  • The one caveat though is that I find Kimi's results to be quite inconsistent; the model clearly has a very high ceiling, but you'll see that some of it's builds (in my opinion) lack in quality compared to the others (though they're all a massive improvement from Kimi K
  • Total cost was $
    • Think this is by far the most cost effective model for it's performance
    • If you enjoy these posts please feel free to help fund the benchmark

Benchmark: https://minebench.ai/

Git Repository: https://github.com/Ammaar-Alam/minebench

Previous Posts:

Extra Information (if you're confused):

Essentially it's a benchmark that tests how well a model can create a 3D Minecraft like structure.

So the models are given a palette of blocks (think of them like legos) and a prompt of what to build, so like the first prompt you see in the post was a fighter jet. Then the models had to build a fighter jet by returning a JSON in which they gave the coordinate of each block/lego (x, y, z). It's interesting to see which model is able to create a better 3D representation of the given prompt.

The smarter models tend to design much more detailed and intricate builds. The repository readme might provide might help give a better understanding.

(Disclaimer: This is a public benchmark I created, so technically self-promotion :)

submitted by /u/ENT_Alam
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top