SLaB: Sparse-Lowrank-Binary Decomposition for Efficient Large Language Models
arXiv:2604.04493v1 Announce Type: cross
Abstract: The rapid growth of large language models (LLMs) presents significant deployment challenges due to their massive computational and memory demands. While model compression, such as network pruning, offe…