Long-time dynamics and universality of nonconvex gradient descent
Abstract
This paper develops a general approach to characterize the long-time trajectory behavior of nonconvex gradient descent in generalized single-index models in the large aspect ratio regime. In this regime, we show that for each iteration the gradient descent iterate concentrates around a deterministic vector called the `Gaussian theoretical gradient descent', whose dynamics can be tracked by a state evolution system of two recursive equations for two scalars. Our concentration guarantees hold universally for a broad class of design matrices and remain valid over long time horizons until algorithmic convergence or divergence occurs. Moreover, our approach reveals that gradient descent iterates are in general approximately independent of the data and strongly incoherent with the feature vectors, a phenomenon previously known as the `implicit regularization' effect of gradient descent in specific models under Gaussian data. As an illustration of the utility of our general theory, we present two applications of different natures in the regression setting. In the first, we prove global convergence of nonconvex gradient descent with general independent initialization for a broad class of structured link functions, and establish universality of randomly initialized gradient descent in phase retrieval for large aspect ratios. In the second, we develop a data-free iterative algorithm for estimating state evolution parameters along the entire gradient descent trajectory, thereby providing a low-cost yet statistically valid tool for practical tasks such as hyperparameter tuning and runtime determination. As a by-product of our analysis, we show that in the large aspect ratio regime, the Gaussian theoretical gradient descent coincides with a recent line of dynamical mean-field theory for gradient descent over the constant-time horizon.