cs.AI, cs.DC, cs.MA

Optimizing PyTorch Inference with LLM-Based Multi-Agent Systems

arXiv:2511.16964v2 Announce Type: replace-cross
Abstract: Maximizing performance on available GPU hardware is an ongoing challenge for modern AI inference systems. Traditional approaches include writing custom GPU kernels and using specialized model c…