/u/Legitimate-Dog5690

Dual GPU llama.cpp speedup

/u/Legitimate-Dog5690 / May 17, 2026

Llama.cpp has had a long standing issue with "–split-mode tensor", you'll get great results but it only supports non-quantized KV caches, for this very reason a lot of people decide to go with a healthy sized KV cache and ignore tensor p…

Author name: /u/Legitimate-Dog5690

Dual GPU llama.cpp speedup