LP-GEMM: Integrating Layout Propagation into GEMM Operations
arXiv:2604.04599v1 Announce Type: cross
Abstract: In Scientific Computing and modern Machine Learning (ML) workloads, sequences of dependent General Matrix Multiplications (GEMMs) often dominate execution time. While state-of-the-art BLAS libraries ag…