cs.AI, cs.LG

Geometric Limits of Knowledge Distillation: A Minimum-Width Theorem via Superposition Theory

arXiv:2604.04037v1 Announce Type: cross
Abstract: Knowledge distillation compresses large teachers into smaller students, but performance saturates at a loss floor that persists across training methods and objectives. We argue this floor is geometric:…