Hyperloop Transformers
arXiv:2604.21254v1 Announce Type: cross
Abstract: LLM architecture research generally aims to maximize model quality subject to fixed compute/latency budgets. However, many applications of interest such as edge and on-device deployment are further con…