Multimodal Remote Inference

arXiv:2508.07555v3 Announce Type: replace Abstract: We consider a remote inference system with multiple modalities, where a multimodal machine learning (ML) model performs real-time inference using features collected from remote sensors. When sensor observations evolve dynamically over time, fresh features are critical for inference tasks. However, timely delivery of features from all modalities is often infeasible under limited network resources. To address this challenge, we formulate a multimodal scheduling problem to minimize the ML model's inference error. We model this error as a general function of the Age of Information (AoI) vector, where AoI quantifies data freshness. We cast the problem as a semi-Markov decision process (SMDP) and derive an equivalent reformulation with a reduced state set. We then show that the problem has fundamentally different chain structures in the two-modality and multi-modality cases. For the two-modality case, we prove that the optimal policy has an index-based threshold structure. For the general multi-modality case (i.e., with more than two modalities), we develop the optimal error-aware switching-and-transmission policy (EAST), which is computed using a multichain policy iteration algorithm (MPI). To further reduce complexity, we also develop two low-complexity policies under special settings: the error-aware transmission policy (EAT) and the fixed threshold policy (FT). Numerical results from three case studies show that the proposed policies outperform several simple heuristics, including round-robin, greedy, and uniform random policies. In particular, EAST reduces the inference error by up to 44.8% compared with the best baseline in each case. In the five-modality case, EAT and FT reduce computation time by 6.6$\times$ and 3000$\times$, respectively, relative to EAST, while increasing the inference error by 20.2% and 38.6%, respectively.

Leave a Comment