Deterministic Differentiable Structured Pruning for Large Language Models
arXiv:2603.08065v2 Announce Type: replace
Abstract: Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity constra…