Centralized Adaptive Sampling for Reliable Co-Training of Independent Multi-Agent Policies
arXiv:2508.01049v2 Announce Type: replace
Abstract: Independent on-policy policy gradient algorithms are widely used for multi-agent reinforcement learning (MARL) in cooperative and no-conflict games, but they are known to converge sub-optimally when …