Projection-free Algorithms for Online Convex Optimization with Adversarial Constraints

arXiv:2501.16919v2 Announce Type: replace Abstract: We study a generalization of the Online Convex Optimization (OCO) framework with time-varying adversarial constraints. In this setting, at each round, the learner selects an action from a convex decision set $X$, after which both a convex cost function and a convex constraint function are revealed. The objective is to design a computationally efficient learning policy that simultaneously achieves low regret with respect to the cost functions and low cumulative constraint violation (CCV) over a horizon of length $T$. A major computational bottleneck in standard OCO algorithms is the projection operation onto the decision set $X$. However, for many structured decision sets, linear optimization can be performed efficiently. Motivated by this, we propose a projection-free online conditional gradient (OCG)-based algorithm that requires only a single call to a linear optimization oracle over $X$ per round. Our approach improves upon the state of the art for projection-free online learning with adversarial constraints, achieving $\tilde{O}(T^{\frac{3}{4}})$ bounds for both regret and CCV. Our algorithm is conceptually simple. It constructs a surrogate cost function as a nonnegative linear combination of the cost and constraint functions, and feeds these surrogate costs into a novel adaptive online conditional gradient subroutine introduced in this paper. We further extend our framework to the bandit setting, where we show that a new form of surrogate loss is necessary to properly handle bandit feedback - an issue overlooked in prior work. Finally, we develop an efficient Follow-the-Perturbed-Leader (FTPL)-based algorithm, particularly well-suited for online combinatorial optimization problems with discrete actions, which also achieves $O(T^{\frac{3}{4}})$ regret and CCV.

Leave a Comment