cs.CL, cs.LG

Deterministic Differentiable Structured Pruning for Large Language Models

arXiv:2603.08065v2 Announce Type: replace
Abstract: Structured pruning reduces LLM inference cost by removing low-importance architectural components. This can be viewed as learning a multiplicative gate for each component under an l0 sparsity constra…