Structural Sensitivity in Compressed Transformers: Relative Error Propagation and Layer Removal
arXiv:2603.20991v2 Announce Type: replace-cross
Abstract: Compressing transformer weights makes large language models cheaper to deploy. But each layer’s compression introduces an error. These errors accumulate as the signal passes through later layer…