cs.AI, cs.CL

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models

arXiv:2510.13860v2 Announce Type: replace
Abstract: While the transformer architecture has achieved state-of-the-art performance on natural language processing tasks, these models impose substantial memory and computational overhead. Recent research h…