BaldWhisper: Faster Whisper with Head Shearing and Layer Merging
arXiv:2510.08599v2 Announce Type: replace-cross
Abstract: Pruning large pre-trained transformers in a data-scarce scenario is challenging, as it often requires massive retraining data to recover performance. For instance, Distill-Whisper prunes Whispe…