cs.LG

FlashNorm: Fast Normalization for Transformers

arXiv:2407.09577v4 Announce Type: replace
Abstract: Normalization layers are ubiquitous in large language models (LLMs) yet represent a compute bottleneck: on hardware with distinct vector and matrix execution units, the RMS calculation blocks the sub…