Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

Theses/Dissertations

Computer Sciences

Louisiana State University

MapReduce

Articles 1 - 4 of 4

Full-Text Articles in Entire DC Network

Performance Improvement Of Distributed Computing Framework And Scientific Big Data Analysis, Praveenkumar Kondikoppa Jan 2014

Performance Improvement Of Distributed Computing Framework And Scientific Big Data Analysis, Praveenkumar Kondikoppa

LSU Doctoral Dissertations

Analysis of Big data to gain better insights has been the focus of researchers in the recent past. Traditional desktop computers or database management systems may not be suitable for efficient and timely analysis, due to the requirement of massive parallel processing. Distributed computing frameworks are being explored as a viable solution. For example, Google proposed MapReduce, which is becoming a de facto computing architecture for Big data solutions. However, scheduling in MapReduce is coarse grained and remains as a challenge for improvement. Related with MapReduce scheduler when configured over distributed clusters, we identify two issues: data locality disruption and …


On-The-Fly Tracing For Data-Centric Computing : Parallelization, Workflow And Applications, Lei Jiang Jan 2013

On-The-Fly Tracing For Data-Centric Computing : Parallelization, Workflow And Applications, Lei Jiang

LSU Doctoral Dissertations

As data-centric computing becomes the trend in science and engineering, more and more hardware systems, as well as middleware frameworks, are emerging to handle the intensive computations associated with big data. At the programming level, it is crucial to have corresponding programming paradigms for dealing with big data. Although MapReduce is now a known programming model for data-centric computing where parallelization is completely replaced by partitioning the computing task through data, not all programs particularly those using statistical computing and data mining algorithms with interdependence can be re-factorized in such a fashion. On the other hand, many traditional automatic parallelization …


A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai Jan 2013

A Hybrid Framework Of Iterative Mapreduce And Mpi For Molecular Dynamics Applications, Shuju Bai

LSU Doctoral Dissertations

Developing platforms for large scale data processing has been a great interest to scientists. Hadoop is a widely used computational platform which is a fault-tolerant distributed system for data storage due to HDFS (Hadoop Distributed File System) and performs fault-tolerant distributed data processing in parallel due to MapReduce framework. It is quite often that actual computations require multiple MapReduce cycles, which needs chained MapReduce jobs. However, Design by Hadoop is poor in addressing problems with iterative structures. In many iterative problems, some invariant data is required by every MapReduce cycle. The same data is uploaded to Hadoop file system in …


An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha Jan 2012

An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha

LSU Master's Theses

The volume and complexity of data that must be analyzed in scientific applications is increasing exponentially. Often, this data is distributed; thus, the ability to analyze data by localizing it will yield limited returns. Therefore, an efficient processing of large distributed datasets is required, whilst ideally not introducing fundamentally new programming models or methods. For example, extending MapReduce - a proven effective programming model for processing large datasets, to work more effectively on distributed data and on different infrastructure (such as non-Hadoop, general-purpose clusters) is desirable. We posit that this can be achieved with an effective and efficient runtime environment …