/u/wombweed - Provide.ai

Kv cache quantization: ignorance, or malice?

/u/wombweed / May 2, 2026

I run Qwen-3.6 27B FP8 on vllm for long-horizon agentic coding harness workloads with high context window and concurrent sub-agents. On two 3090s that aren’t used for anything else, it seems reasonable to expect a good balance between speed and reliabi…

Author name: /u/wombweed

Kv cache quantization: ignorance, or malice?