cs.CV

From Per-Image Low-Rank to Encoding Mismatch: Rethinking Feature Distillation in Vision Transformers

arXiv:2511.15572v2 Announce Type: replace
Abstract: Feature-map knowledge distillation (KD) transfers internal representations well between comparably sized Vision Transformers (ViTs), but it often fails in compression. We revisit this failure and unc…