kv-cache : support attention rotation for heterogeneous iSWA by ggerganov · Pull Request #21513 · ggml-org/llama.cpp

kv-cache : support attention rotation for heterogeneous iSWA by ggerganov · Pull Request #21513 · ggml-org/llama.cpp

tl;dr: Fixes KV-cache rotation for hybrid-attention models like Gemma 4

(Not actually TurboQuant, but you can call it TurboQuant if that makes you feel better)

submitted by /u/jacek2023
[link] [comments]

Leave a Comment

Your email address will not be published. Required fields are marked *

Scroll to Top