Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Dynamical Systems
Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann
Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann
Theses
In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally popular due to its computational efficiency, is analyzed using Markov chain methods. We compute both numerically, and in some cases analytically, the stationary probability distributions (invariant measures) for the SGD Markov operator over all step sizes or learning rates. The stationary probability distributions provide insight into how the long-time behavior of SGD samples the objective function minimum.
A key focus of this thesis is to provide a systematic study in one dimension comparing the exact SGD stationary distributions to the Fokker-Planck diffusion approximation equations —which are commonly used in …