Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Louisiana State University

2012

Computer Sciences

Distributed Computing

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Semantically-Aware Data Discovery And Placement In Collaborative Computing Environments, Xinqi Wang Jan 2012

Semantically-Aware Data Discovery And Placement In Collaborative Computing Environments, Xinqi Wang

LSU Doctoral Dissertations

As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplinary scientific research.

For discovering relevant data out of large-scale interdisciplinary datasets. The development and integration of cross-domain metadata is critical as metadata serves as the key guideline for organizing data. To develop and integrate cross-domain metadata management systems in interdisciplinary collaborative computing environment, three key issues need to be addressed: the development of a cross-domain metadata schema; the implementation of a metadata management system …


An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha Jan 2012

An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha

LSU Master's Theses

The volume and complexity of data that must be analyzed in scientific applications is increasing exponentially. Often, this data is distributed; thus, the ability to analyze data by localizing it will yield limited returns. Therefore, an efficient processing of large distributed datasets is required, whilst ideally not introducing fundamentally new programming models or methods. For example, extending MapReduce - a proven effective programming model for processing large datasets, to work more effectively on distributed data and on different infrastructure (such as non-Hadoop, general-purpose clusters) is desirable. We posit that this can be achieved with an effective and efficient runtime environment …