Physical Sciences and Mathematics | Open Access Articles

Scaling Bayesian Network Parameter Learning With Expectation Maximization Using Mapreduce, Erik B. Reed, Ole J. Mengshoel

Ole J Mengshoel

Bayesian network (BN) parameter learning from incomplete data can be a computationally expensive task for incomplete data. Applying the EM algorithm to learn BN parameters is unfortunately susceptible to local optima and prone to premature convergence. We develop and experiment with two methods for improving EM parameter learning by using MapReduce: Age-Layered Expectation Maximization (ALEM) and Multiple Expectation Maximization (MEM). Leveraging MapReduce for distributed machine learning, these algorithms (i) operate on a (potentially large) population of BNs and (ii) partition the data set as is traditionally done with MapReduce machine learning. For example, we achieved gains using the Hadoop implementation …

Go to article

Mapreduce For Bayesian Network Parameter Learning Using The Em Algorithm, Aniruddha Basak, Irina Brinster, Ole J. Mengshoel

Ole J Mengshoel

This work applies the distributed computing framework MapReduce to Bayesian network parameter learning from incomplete data. We formulate the classical Expectation Maximization (EM) algorithm within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present details of the MapReduce formulation of EM, report speed-ups versus the sequential case, and carefully compare various Hadoop cluster configurations in experiments with Bayesian networks of different sizes and structures.

Go to article

Accelerating Bayesian Network Parameter Learning Using Hadoop And Mapreduce, Aniruddha Basak, Irina Brinster, Xianheng Ma, Ole J. Mengshoel

Ole J Mengshoel

Learning conditional probability tables of large Bayesian Networks (BNs) with hidden nodes using the Expectation Maximization algorithm is heavily computationally intensive. There are at least two bottlenecks, namely the potentially huge data set size and the requirement for computation and memory resources. This work applies the distributed computing framework MapReduce to Bayesian parameter learning from complete and incomplete data. We formulate both traditional parameter learning (complete data) and the classical Expectation Maximization algorithm (incomplete data) within the MapReduce framework. Analytically and experimentally we analyze the speed-up that can be obtained by means of MapReduce. We present the details of our …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Scaling Bayesian Network Parameter Learning With Expectation Maximization Using Mapreduce, Erik B. Reed, Ole J. Mengshoel

Ole J Mengshoel

Mapreduce For Bayesian Network Parameter Learning Using The Em Algorithm, Aniruddha Basak, Irina Brinster, Ole J. Mengshoel

Ole J Mengshoel

Accelerating Bayesian Network Parameter Learning Using Hadoop And Mapreduce, Aniruddha Basak, Irina Brinster, Xianheng Ma, Ole J. Mengshoel

Ole J Mengshoel