LocalLLaMA

Testing llama.cpp MTP support on Qwen3.6 – RTX 5090

Setup: – RTX 5090, 32 GB, Linux – Built llama.cpp from 4f13cb7 (the official ghcr.io/ggml-org/llama.cpp:server-cuda image hasn't picked up the merge yet as of writing — had to docker build from source with CUDA_DOCKER_ARCH=120) – Unsloth'…