Implemented TurboQuant and results don’t fully match paper

I attempted to implement TurboQuant (arXiv:2504.19874) from scratch over the last few days.

Thought I would check something with folks here since my numbers do not match those in the paper.

Observations:

MSE version performs well (compression & distortion as expected)

PROD version:

claims in paper exceed 99% correlation

my number sits around 95.8% at 4-bit

But what’s more interesting:

even at this ~95% correlation level, attention quality degrades significantly

(only ~67% top-1 accuracy on a simple simulation)

My hypothesis:

correlation != ranking preservation

attention is highly sensitive to any order error

Other things I ran into:

variance scaling (unit vs 1/d) initially killed the MSE variant

QJL variance scaling had to be re-derived

bit packing is required for compression to work

Not sure if:

I am simply missing something in the PROD scaling

this is expected behavior when d=256

or paper results depend on larger dimensions / setup

The code is here if anyone is interested in taking a look:

https://github.com/Ashx098/Turboquant-Implementation

Would really appreciate feedback from anyone who has worked on KV cache quantization / similar techniques.

submitted by /u/Routine-Thanks-572
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top