Strait: Perceiving Priority and Interference in ML Inference Serving
arXiv:2604.28175v1 Announce Type: new
Abstract: Machine learning (ML) inference serving systems host deep neural network (DNN) models and schedule incoming inference requests across deployed GPUs. However, limited support for task prioritization and i…