TOPPO: Rethinking PPO for Multi-Task Reinforcement Learning with Critic Balancing
arXiv:2605.11473v1 Announce Type: cross
Abstract: Soft Actor-Critic (SAC) and its variants dominate Multi-Task Reinforcement Learning (MTRL) due to their off-policy sample efficiency, while on-policy methods such as Proximal Policy Optimization (PPO) …