Variance Is Not Importance: Structural Analysis of Transformer Compressibility Across Model Scales
arXiv:2604.20682v1 Announce Type: new
Abstract: We present a systematic empirical study of transformer compression through over 40 experiments on GPT-2 (124M parameters) and Mistral 7B (7.24B parameters). Our analysis covers spectral compression, bloc…