Released a TurboQuant-compatible KV backend evaluation SDK

Disclosure: I am the author of this evaluation SDK.

I released an independent TurboQuant-compatible KV backend evaluation package for compressed-KV ABI testing, smoke tests, and partial attention decode experiments.

The goal is narrow: test whether compressed KV-cache workloads can be routed through a clean low-level backend ABI for:

- compressed KV block registration

- KV dot / QK partial execution

- block-local attention partial decode

- capability probing

- fallback and correctness reporting

- minimal benchmark validation

Repository:

https://github.com/ixu2486/tq_compat_eval

This is not a Google project, not an official TurboQuant implementation, and not a replacement for TurboQuant, llama.cpp, or existing model runtimes.

It is also not the full RetryIX runtime. The private runtime, scheduling policy, hardware-interface contracts, and internal routing logic are not included.

I would appreciate feedback from people working on KV-cache optimization, quantized inference, compressed-KV formats, long-context decoding, or backend integration.

submitted by /u/inhogon
[link] [comments]

Leave a Comment