Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

LSU Doctoral Dissertations

Computer Sciences

MapReduce

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Physical Sciences and Mathematics

Performance Improvement Of Distributed Computing Framework And Scientific Big Data Analysis, Praveenkumar Kondikoppa Jan 2014

Performance Improvement Of Distributed Computing Framework And Scientific Big Data Analysis, Praveenkumar Kondikoppa

LSU Doctoral Dissertations

Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, due to the requirement of massive parallel processing. Distributed computing frameworks are being explored as a viable solution. For example, Google proposed MapReduce, which is becoming a de facto computing architecture for Big data solutions. However, scheduling in MapReduce is coarse grained and remains as a challenge for improvement. Related with MapReduce scheduler when configured over distributed clusters, we identify two issues: data locality disruption and …


A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai Jan 2013

A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai

LSU Doctoral Dissertations

Developing platforms for large scale data processing has been a great interest to scientists. Hadoop is a widely used computational platform which is a fault-tolerant distributed system for data storage due to HDFS (Hadoop Distributed File System) and performs fault-tolerant distributed data processing in parallel due to MapReduce framework. It is quite often that actual computations require multiple MapReduce cycles, which needs chained MapReduce jobs. However, Design by Hadoop is poor in addressing problems with iterative structures. In many iterative problems, some invariant data is required by every MapReduce cycle. The same data is uploaded to Hadoop file system in …


On-The-Fly Tracing For Data-Centric Computing : Parallelization, Workflow And Applications, Lei Jiang Jan 2013

On-The-Fly Tracing For Data-Centric Computing : Parallelization, Workflow And Applications, Lei Jiang

LSU Doctoral Dissertations

As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization …