Optimal High-Probability Regret for Online Convex Optimization with Two-Point Bandit Feedback
arXiv:2603.25029v2 Announce Type: replace
Abstract: We consider the problem of Online Convex Optimization (OCO) with two-point bandit feedback in an adversarial environment.
In this setting, a player attempts to minimize a sequence of adversarially …