POP: Prior-Fitted First-Order Optimization Policies
arXiv:2602.15473v2 Announce Type: replace
Abstract: Gradient-based optimizers are highly sensitive to design choices in their adaptive learning rate mechanisms. To address this limitation, we introduce POP, a meta-learned Reinforcement Learning (RL) policy that predicts adaptive learning rates for gradient descent, conditioned on the contextual information provided in the optimization trajectory. Our method introduces a novel RL reward formulation, a new function-scaling strategy for in-distribution generalization, and a novel prior that is used to sample millions of synthetic optimization problems. We evaluate POP on an established benchmark including 43 optimization functions of various complexity, where it significantly outperforms gradient-based methods. Our evaluation demonstrates strong generalization capabilities without task-specific tuning.