Taming Asynchronous CPU-GPU Coupling for Frequency-aware Latency Estimation on Mobile Edge
arXiv:2604.15357v1 Announce Type: cross
Abstract: Precise estimation of model inference latency is crucial for time-critical mobile edge applications, enabling devices to calculate latency margins against deadlines and trade them for enhanced model pe…