cs.LG, cs.PL

Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs

arXiv:2510.08726v2 Announce Type: replace-cross
Abstract: Operator fusion has become a key optimization for deep learning, which combines multiple deep learning operators to improve data reuse and reduce global memory transfers. However, existing tens…