Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Exploiting Redundancy To Boost Performance In A Raid-10 Style Cluster-Based File System, Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David Swanson Sep 2013

Exploiting Redundancy To Boost Performance In A Raid-10 Style Cluster-Based File System, Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David Swanson

Yifeng Zhu

While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less …


Amp: An Affinity-Based Metadata Prefetching Scheme In Large-Scale Distributed Storage Systems, Lin Li, Xuemin Li, Hong Jiang, Yifeng Zhu Sep 2013

Amp: An Affinity-Based Metadata Prefetching Scheme In Large-Scale Distributed Storage Systems, Lin Li, Xuemin Li, Hong Jiang, Yifeng Zhu

Yifeng Zhu

Prefetching is an effective technique for improving file access performance, which can reduce access latency for I/O systems. In distributed storage system, prefetching for metadata files is critical for the overall system performance. In this paper, an Affinity-based Metadata Prefetching (APM) scheme is proposed for metadata servers in large-scale distributed storage systems to provide aggressive metadata prefetching. Through mining useful information about metadata assesses from past history, AMP can discover metadata file affinities accurately and intelligently for prefetching. Compared with LRU and some of the latest file prefetching algorithms such as NEXUS and C-miner, trace-driven simulations show that AMP can …


Hba: Distributed Metadata Management For Large Cluster-Based Storage Systems, Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian Sep 2013

Hba: Distributed Metadata Management For Large Cluster-Based Storage Systems, Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian

Yifeng Zhu

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial distribution information and exploits the …


Smartstore: A New Metadata Organization Paradigm With Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian Sep 2013

Smartstore: A New Metadata Organization Paradigm With Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian

Yifeng Zhu

Existing storage systems using hierarchical directory tree do not meet scalability and functionality requirements for exponentially growing datasets and increasingly complex queries in Exabyte-level systems with billions of files. This paper proposes semantic-aware organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. Decentralized design improves system scalability and reduces query latency for complex queries (range and top-k queries), which is conducive to constructing semantic-aware caching, and conventional filename-based query. SmartStore limits search scope of complex query to a single or a minimal number of semantically related groups …