Shivanshu Kumar, Gopalakrishnan Srinivasan

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models

Shivanshu Kumar, Gopalakrishnan Srinivasan / April 1, 2026

arXiv:2510.13860v2 Announce Type: replace
Abstract: While the transformer architecture has achieved state-of-the-art performance on natural language processing tasks, these models impose substantial memory and computational overhead. Recent research h…

Author name: Shivanshu Kumar, Gopalakrishnan Srinivasan

ShishuLM : Achieving Optimal and Efficient Parameterization with Low Attention Transformer Models