Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Missouri University of Science and Technology

2012

Hadoop scheduling

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Network-Aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks, Praveenkumar Kondikoppa, Chui Hui Chiu, Cheng Cui, Lin Xue, Seung Jong Park Oct 2012

Network-Aware Scheduling Of Mapreduce Framework On Distributed Clusters Over High Speed Networks, Praveenkumar Kondikoppa, Chui Hui Chiu, Cheng Cui, Lin Xue, Seung Jong Park

Computer Science Faculty Research & Creative Works

Google's MapReduce has gained significant popularity as a platform for large scale distributed data processing. Hadoop [1] is an open-source implementation of MapReduce [11] framework, originally it was developed to operate over single cluster environment and could not be leveraged for distributed data processing across federated clusters. At multiple federated clusters connected with high-speed networks, computing resources are provisioned from any of the clusters from the federation. Placement of map tasks close to its data split is critical for performance of Hadoop. In this work, we add network awareness in Hadoop while scheduling the map tasks over federated clusters. We …