Neptune: Advanced ML Operator Fusion for Locality and Parallelism on GPUs
arXiv:2510.08726v2 Announce Type: replace-cross
Abstract: Operator fusion has become a key optimization for deep learning, which combines multiple deep learning operators to improve data reuse and reduce global memory transfers. However, existing tens…