Optimal last-iterate convergence in matrix games with bandit feedback using the log-barrier
arXiv:2604.15242v1 Announce Type: new
Abstract: We study the problem of learning minimax policies in zero-sum matrix games. Fiegel et al. (2025) recently showed that achieving last-iterate convergence in this setting is harder when the players are unc…