Hardware-Efficient Softmax and Layer Normalization with Guaranteed Normalization for Edge Devices
arXiv:2604.23647v1 Announce Type: cross
Abstract: In Transformer models, non-GEMM (non-General Matrix Multiplication) operations — especially Softmax and Layer Normalization (LayerNorm) — often dominate hardware cost due to their nonlinear nature. T…