Nicholas E. Corrado, Josiah P. Hanna

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies

Nicholas E. Corrado, Josiah P. Hanna / May 14, 2026

arXiv:2508.01049v2 Announce Type: replace
Abstract: Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games, but they are known to converge sub-optimally when …

Author name: Nicholas E. Corrado, Josiah P. Hanna

Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies