Learning to Optimize Multi-Objective Alignment Through Dynamic Reward Weighting
arXiv:2509.11452v2 Announce Type: replace-cross
Abstract: Prior work in multi-objective reinforcement learning typically uses linear reward scalarization with fixed weights, which provably fails to capture non-convex Pareto fronts and thus yields subo…