cs.AI, cs.LG, cs.PF

PATCH: Learnable Tile-level Hybrid Sparsity for LLMs

arXiv:2509.23410v4 Announce Type: replace-cross
Abstract: Large language models (LLMs) deliver impressive performance but incur prohibitive memory and compute costs at deployment. Model pruning is an effective way to reduce these overheads, yet existi…