Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems
arXiv:2511.16964v2 Announce Type: replace-cross
Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model c…