A Split-Client Approach to Second-Order Optimization

arXiv:2510.15714v3 Announce Type: replace-cross Abstract: Second-order optimization methods offer superior convergence rates but are often bottlenecked by the wall-clock cost of Hessian computation and factorization. In the moderate-dimensional regime where the full Hessian fits in memory, factorization $\mathcal{O}(d^3)$ typically dominates gradient evaluation $\mathcal{O}(nd)$, creating a synchronization barrier that negates the per-iteration progress of classical second-order methods. We propose the \emph{Split-Client} framework, which decouples optimization into parallel gradient and curvature processes. Unlike Lazy Hessian approaches, whose arithmetic-complexity analysis does not charge factorization time and whose optimal reuse frequency requires tuning, our method is fully \textbf{delay-adaptive}: its wall-clock complexity scales with the \emph{average} delay $\Bar{\tau}$, and it matches the optimally-tuned Lazy rate of $\mathcal{O}(\eps^{-3/2}\sqrt{\Bar{\tau}})$ without any tuning. For persistent curvature error, we provide a noise-adaptive schedule with $\widetilde{\mathcal{O}}(T^{-3/4})$ rate (on $E[\|\nabla f\|]^{3/2}$), recovering the rate that uniform-error analyses such as Kamzolov et al (2023) achieve via inflated regularization. Under a verifiable subspace-alignment condition, an additional \emph{structured} analysis based on the secant condition of L-BFGS gives a faster $\mathcal{O}(T^{-1})$ rate, with a hybrid theorem interpolating smoothly between the two regimes. We extend the framework to Subsampled Cubic Newton with adaptive batch sizes and an aggregate sampling budget linear in $T$. Experiments on two non-convex problems show wall-clock speedups of up to $800\times$ over Vanilla and $30\times$ over Lazy in the strongly factorization-dominated regime.

Leave a Comment