cs.AI, cs.LG, cs.RO, stat.ML

TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing

arXiv:2605.11473v1 Announce Type: cross
Abstract: Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) …