cs.CL

Compressed-Sensing-Guided, Inference-Aware Structured Reduction for Large Language Models

arXiv:2604.14156v1 Announce Type: new
Abstract: Large language models deliver strong generative performance but at the cost of massive parameter counts, memory use, and decoding latency. Prior work has shown that pruning and structured sparsity can pr…