Sparse or Dense? A Mechanistic Estimation of Computation Density in Transformer-based LLMs
arXiv:2601.22795v2 Announce Type: replace
Abstract: Transformer-based large language models (LLMs) are comprised of billions of parameters arranged in deep and wide computational graphs. Several studies on LLM efficiency optimization argue that it is …