Kv cache quantization: ignorance, or malice?
I run Qwen-3.6 27B FP8 on vllm for long-horizon agentic coding harness workloads with high context window and concurrent sub-agents. On two 3090s that aren’t used for anything else, it seems reasonable to expect a good balance between speed and reliabi…