Turboquant on llama.cpp?

By /u/StupidScaredSquirrel / April 24, 2026

Now that the financebro hype has faded, is there an implementation of turboquant for llama.cpp somewhere? Saving even 50% of kv cache memory would be nice.

submitted by /u/StupidScaredSquirrel
[link] [comments]

Leave a Comment