Author name: /u/pmttyji

TurboQuant – Extreme KV Cache Quantization · ggml-org/llama.cpp · Discussion #20969

/u/pmttyji / April 7, 2026

14+ independent validators now across Metal, CUDA, HIP, Vulkan, and MLX. Apple Silicon, NVIDIA (4090, 5090, H100, A100, V100, 1080 Ti), AMD (RX 9070 XT, RX 6600). from M1 to Blackwell. this is what open source research looks like. the data converges. …

LocalLLaMA

ggml: add Q1_0 1-bit quantization support (CPU) – 1-bit Bonsai models

/u/pmttyji / April 6, 2026

Bonsai's 8B model is just 1.15GB so CPU alone is more than enough. https://huggingface.co/collections/prism-ml/bonsai submitted by /u/pmttyji [link] [comments]

LocalLLaMA

llama.cpp – llama-bench: add `-fitc` and `-fitt` to arguments

/u/pmttyji / April 6, 2026

Was expecting this for sometime. This is available b8679 onwards. submitted by /u/pmttyji [link] [comments]

LocalLLaMA

MiniMax-M2.7 …. this weekend for sure

/u/pmttyji / April 6, 2026

Sorry to all OOS developers. I underestimated the workload required for open-sourcing. We still have some infrastructure adaptation work in progress. M2.7 is expected to be released this weekend. Thank you for your understanding. submitt…