cs.LG, math.OC

Gradient Descent’s Last Iterate is Often (slightly) Suboptimal

arXiv:2604.13870v1 Announce Type: cross
Abstract: We consider the well-studied setting of minimizing a convex Lipschitz function using either gradient descent (GD) or its stochastic variant (SGD), and examine the last iterate convergence. By now, it i…