HeatKV: Head-tuned KV-cache Compression for Visual Autoregressive Modeling
arXiv:2605.14877v1 Announce Type: new
Abstract: Visual Autoregressive (VAR) models have recently demonstrated impressive image generation quality while maintaining low latency. However, they suffer from severe KV-cache memory constraints, often requir…