/u/DrBearJ3w - Provide.ai

Turboquant+MTP for ROCm(Llama CPP)

/u/DrBearJ3w / May 14, 2026

TL;DR: I got TBQ4 KV cache + MTP working on AMD ROCm for RX 7900 XTX / RDNA3 / gfx1100 in llama.cpp. Main win: 64k context fits on 24 GB VRAM and remains usable. Branch: tbq4-rdna3-experiment (https://github.com/DrBearJew/llama.cpp/tree/tbq4-rdna3-expe…

Author name: /u/DrBearJ3w

Turboquant+MTP for ROCm(Llama CPP)