Author name: /u/Jorlen

Used over a million tokens in three separate sessions to test Qwen 3.6 35b (new Multi-token Prediction version)

/u/Jorlen / May 15, 2026

In my opinion, MTP models are 100% game changer for local LLMs. In terms of speed, I was getting around 1.5x the tok/sec of previous tests. The project was a test – building a full iterative step-by-step pygame; a small mystery dungeon-style game. At f…

LocalLLaMA

Linux – Why does llama.cpp ROCm consume SO much VRAM for KV cache compared to Vulkan?

/u/Jorlen / May 14, 2026

I have a docker stack with a bunch of AI services and llama.cpp server is the brain. I've got a working vulkan yml snippet for llama.cpp but out of curiosity, I flipped it to ROCM (latest build) and did not see ANY performance improvement. In fact,…