Qwen 3.6 for Claude Code in 1L

https://preview.redd.it/a96i13zyemvg1.png?width=374&format=png&auto=webp&s=d1850127462849eab4ff37a3e10159d092bcc994

I use a p3 tiny gen 2 with an rtx 2000 ada (16gb vram). It gets hot, so I modeled and printed a fan hanger to keep it cool. It's dumb, but it feels like Claude Code, just unlimited.
I did have to use the change in this PR to make llamacpp work well with cc though: https://github.com/ggml-org/llama.cpp/pull/21793/
Qwen 3.6 35b a3b q4km unsloth, 400 t/s prompt, 24 t/s generation. With the change to let prompt prefixes cache, I'm amazed at what these newfangled tools can generate. Have a great day folks, I just wanted to share my experience with someone <3

submitted by /u/brickinthefloor
[link] [comments]

Leave a Comment