| Hey yall, I was inspired by this post : https://www.reddit.com/r/LocalLLaMA/comments/1tf3p6c/local_qwen_36_vs_frontier_models_on_a_coding/ And I know this isn't exactly local, but I wanted to share what I tested out and what results each model delivered so I decided to share this. I ran the same single-file Canvas prompt across multiple models using my harness The results are here: https://aidengeungeun.github.io/oco-canvas-car-scene-compare/ Setup:
Models included:
I used whatever highest thinking possible for each model. tok/s and time for generation were not measured. The results are here: Gallery: https://aidengeungeun.github.io/oco-canvas-car-scene-compare/ Source: https://github.com/AidenGeunGeun/oco-canvas-car-scene-compare We know that models are capable of doing these kind of work, but I was wondering how a wide variety of Open weights models compare to frontier models, especially the ones that are used often. I tried to use MiMo-V2.5-pro too, but since that model had billing issues with the OpenCode Go subscription, I couldn't use it. Take a look! [link] [comments] |