Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Physical Sciences and Mathematics
Semantically-Aware Data Discovery And Placement In Collaborative Computing Environments, Xinqi Wang
Semantically-Aware Data Discovery And Placement In Collaborative Computing Environments, Xinqi Wang
LSU Doctoral Dissertations
As the size of scientific datasets and the demand for interdisciplinary collaboration grow in modern science, it becomes imperative that better ways of discovering and placing datasets generated across multiple disciplines be developed to facilitate interdisciplinary scientific research.
For discovering relevant data out of large-scale interdisciplinary datasets. The development and integration of cross-domain metadata is critical as metadata serves as the key guideline for organizing data. To develop and integrate cross-domain metadata management systems in interdisciplinary collaborative computing environment, three key issues need to be addressed: the development of a cross-domain metadata schema; the implementation of a metadata management system …
An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha
An Extensible And Scalable Pilot-Mapreduce Framework For Data Intensive Applications On Distributed Cyberinfrastructure, Pradeep Kumar Mantha
LSU Master's Theses
The volume and complexity of data that must be analyzed in scientific applications is increasing exponentially. Often, this data is distributed; thus, the ability to analyze data by localizing it will yield limited returns. Therefore, an efficient processing of large distributed datasets is required, whilst ideally not introducing fundamentally new programming models or methods. For example, extending MapReduce - a proven effective programming model for processing large datasets, to work more effectively on distributed data and on different infrastructure (such as non-Hadoop, general-purpose clusters) is desirable. We posit that this can be achieved with an effective and efficient runtime environment …