Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Mathematics

Claremont Colleges

CMC Faculty Publications and Research

Stochastic gradient descent

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Batched Stochastic Gradient Descent With Weighted Sampling, Deanna Needell, Rachel Ward Aug 2016

Batched Stochastic Gradient Descent With Weighted Sampling, Deanna Needell, Rachel Ward

CMC Faculty Publications and Research

We analyze a batched variant of Stochastic Gradient Descent (SGD) with weighted sampling distribution for smooth and non-smooth objective functions. We show that by distributing the batches computationally, a significant speedup in the convergence rate is provably possible compared to either batched sampling or weighted sampling alone. We propose several computationally efficient schemes to approximate the optimal weights, and compute proposed sampling distributions explicitly for the least squares and hinge loss problems. We show both analytically and experimentally that substantial gains can be obtained


Stochastic Gradient Descent, Weighted Sampling, And The Randomized Kaczmarz Algorithm, Deanna Needell, Nathan Srebro, Rachel Ward Mar 2014

Stochastic Gradient Descent, Weighted Sampling, And The Randomized Kaczmarz Algorithm, Deanna Needell, Nathan Srebro, Rachel Ward

CMC Faculty Publications and Research

We obtain an improved finite-sample guarantee on the linear convergence of stochastic gradient descent for smooth and strongly convex objectives, improving from a quadratic dependence on the conditioning (L/µ) 2 (where L is a bound on the smoothness and µ on the strong convexity) to a linear dependence on L/µ. Furthermore, we show how reweighting the sampling distribution (i.e. importance sampling) is necessary in order to further improve convergence, and obtain a linear dependence in the average smoothness, dominating previous results. We also discuss importance sampling for SGD more broadly and show how it can improve convergence also in other …