Disclosure: I am the author of this evaluation SDK.
I released an independent TurboQuant-compatible KV backend evaluation package for compressed-KV ABI testing, smoke tests, and partial attention decode experiments.
The goal is narrow: test whether compressed KV-cache workloads can be routed through a clean low-level backend ABI for:
- compressed KV block registration
- KV dot / QK partial execution
- block-local attention partial decode
- capability probing
- fallback and correctness reporting
- minimal benchmark validation
Repository:
https://github.com/ixu2486/tq_compat_eval
This is not a Google project, not an official TurboQuant implementation, and not a replacement for TurboQuant, llama.cpp, or existing model runtimes.
It is also not the full RetryIX runtime. The private runtime, scheduling policy, hardware-interface contracts, and internal routing logic are not included.
I would appreciate feedback from people working on KV-cache optimization, quantized inference, compressed-KV formats, long-context decoding, or backend integration.
[link] [comments]