Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

New Jersey Institute of Technology

2021

Machine learning

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee Dec 2021

On Performance Optimization And Prediction Of Parallel Computing Frameworks In Big Data Systems, Haifa Alquwaiee

Dissertations

A wide spectrum of big data applications in science, engineering, and industry generate large datasets, which must be managed and processed in a timely and reliable manner for knowledge discovery. These tasks are now commonly executed in big data computing systems exemplified by Hadoop based on parallel processing and distributed storage and management. For example, many companies and research institutions have developed and deployed big data systems on top of NoSQL databases such as HBase and MongoDB, and parallel computing frameworks such as MapReduce and Spark, to ensure timely data analyses and efficient result delivery for decision making and business …


Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann May 2021

Stationary Probability Distributions Of Stochastic Gradient Descent And The Success And Failure Of The Diffusion Approximation, William Joseph Mccann

Theses

In this thesis, Stochastic Gradient Descent (SGD), an optimization method originally popular due to its computational efficiency, is analyzed using Markov chain methods. We compute both numerically, and in some cases analytically, the stationary probability distributions (invariant measures) for the SGD Markov operator over all step sizes or learning rates. The stationary probability distributions provide insight into how the long-time behavior of SGD samples the objective function minimum.

A key focus of this thesis is to provide a systematic study in one dimension comparing the exact SGD stationary distributions to the Fokker-Planck diffusion approximation equations —which are commonly used in …