Now that the financebro hype has faded, is there an implementation of turboquant for llama.cpp somewhere? Saving even 50% of kv cache memory would be nice.
[link] [comments]
Now that the financebro hype has faded, is there an implementation of turboquant for llama.cpp somewhere? Saving even 50% of kv cache memory would be nice.