Dynamical Systems | Open Access Articles | Digital Commons Network™

Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann

Theses

In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally popular due to its computational efficiency, is analyzed using Markov chain methods. We compute both numerically, and in some cases analytically, the stationary probability distributions (invariant measures) for the SGD Markov operator over all step sizes or learning rates. The stationary probability distributions provide insight into how the long-time behavior of SGD samples the objective function minimum.

A key focus of this thesis is to provide a systematic study in one dimension comparing the exact SGD stationary distributions to the Fokker-Planck diffusion approximation equations —which are commonly used in …

Go to article

From Optimization To Equilibration: Understanding An Emerging Paradigm In Artificial Intelligence And Machine Learning, Ian Gemp

Doctoral Dissertations

Many existing machine learning (ML) algorithms cannot be viewed as gradient descent on some single objective. The solution trajectories taken by these algorithms naturally exhibit rotation, sometimes forming cycles, a behavior that is not expected with (full-batch) gradient descent. However, these algorithms can be viewed more generally as solving for the equilibrium of a game with possibly multiple competing objectives. Moreover, some recent ML models, specifically generative adversarial networks (GANs) and its variants, are now explicitly formulated as equilibrium problems. Equilibrium problems present challenges beyond those encountered in optimization such as limit-cycles and chaotic attractors and are able to abstract …

Go to article

Gradient Estimation For Attractor Networks, Thomas Flynn

Dissertations, Theses, and Capstone Projects

It has been hypothesized that neural network models with cyclic connectivity may be more powerful than their feed-forward counterparts. This thesis investigates this hypothesis in several ways. We study the gradient estimation and optimization procedures for several variants of these networks. We show how the convergence of the gradient estimation procedures are related to the properties of the networks. Then we consider how to tune the relative rates of gradient estimation and parameter adaptation to ensure successful optimization in these models. We also derive new gradient estimators for stochastic models. First, we port the forward sensitivity analysis method to the …

Go to article

Dynamical Systems Commons^™

Full-Text Articles in Dynamical Systems

Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann

Theses

From Optimization To Equilibration: Understanding An Emerging Paradigm In Artificial Intelligence And Machine Learning, Ian Gemp

Doctoral Dissertations

Gradient Estimation For Attractor Networks, Thomas Flynn

Dissertations, Theses, and Capstone Projects