cs.AI, cs.LG

Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

arXiv:2604.04037v2 Announce Type: replace-cross
Abstract: Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is ge…