Physical Sciences and Mathematics | Open Access Articles

Ceft: A Cost-Effective, Fault-Tolerant Parallel Virtual File System, Yifeng Zhu, Hong Jiang Sep 2013

Ceft: A Cost-Effective, Fault-Tolerant Parallel Virtual File System, Yifeng Zhu, Hong Jiang

Yifeng Zhu

The vulnerability of computer nodes due to component failures is a critical issue for cluster-based file systems. This paper studies the development and deployment of mirroring in cluster-based parallel virtual file systems to provide fault tolerance and analyzes the tradeoffs between the performance and the reliability in the mirroring scheme. It presents the design and implementation of CEFT, a scalable RAID-10 style file system based on PVFS, and proposes four novel mirroring protocols depending on whether the mirroring operations are server-driven or client-driven, whether they are asynchronous or synchronous. The comparisons of their write performances, measured in a real cluster, …

Go to article

A Novel Weighted-Graph-Based Grouping Algorithm For Metadata Prefetching, Peng Gu, Jun Wang, Yifeng Zhu, Hong Jiang, Pengju Shang Sep 2013

A Novel Weighted-Graph-Based Grouping Algorithm For Metadata Prefetching, Peng Gu, Jun Wang, Yifeng Zhu, Hong Jiang, Pengju Shang

Yifeng Zhu

Although data prefetching algorithms have been extensively studied for years, there is no counterpart research done for metadata access performance. Existing data prefetching algorithms, either lack of emphasis on group prefetching, or bearing a high level of computational complexity, do not work well with metadata prefetching cases. Therefore, an efficient, accurate, and distributed metadata-oriented prefetching scheme is critical to leverage the overall performance in large distributed storage systems. In this paper, we present a novel weighted-graph-based prefetching technique, built on both direct and indirect successor relationship, to reap performance benefit from prefetching specifically for clustered metadata servers, an arrangement envisioned …

Go to article

Race: A Robust Adaptive Caching Strategy For Buffer Cache, Yifeng Zhu, Hong Jiang Sep 2013

Race: A Robust Adaptive Caching Strategy For Buffer Cache, Yifeng Zhu, Hong Jiang

Yifeng Zhu

While many block replacement algorithms for buffer caches have been proposed to address the well-known drawbacks of the LRU algorithm, they are not robust and cannot maintain an consistent performance improvement over all workloads. This paper proposes a novel and simple replacement scheme, called RACE (Robust Adaptive buffer Cache management schemE), which differentiates the locality of I/O streams by actively detecting access patterns inherently exhibited in two correlated spaces: the discrete block space of program contexts from which I/O requests are issued and the continuous block space within files to which I/O requests are addressed. This scheme combines global I/O …

Go to article

Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng Sep 2013

Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng

Yifeng Zhu

Explosive growth in volume and complexity of data exacerbates the key challenge to effectively and efficiently manage data in a way that fundamentally improves the ease and efficacy of their use. Existing large-scale file systems rely on hierarchically structured namespace that leads to severe performance bottlenecks and renders it impossible to support real-time queries on multi-dimensional attributes. This paper proposes a novel semantic-sensitive scheme, called Rapport, to provide dynamic and adaptive namespace management and support complex queries. The basic idea is to build files’ namespace by utilizing their semantic correlation and exploiting dynamic evolution of attributes to support namespace management. …

Go to article

Exploiting Redundancy To Boost Performance In A Raid-10 Style Cluster-Based File System, Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David Swanson Sep 2013

Exploiting Redundancy To Boost Performance In A Raid-10 Style Cluster-Based File System, Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David Swanson

Yifeng Zhu

While aggregating the throughput of existing disks on cluster nodes is a cost-effective approach to alleviate the I/O bottleneck in cluster computing, this approach suffers from potential performance degradations due to contentions for shared resources on the same node between storage data processing and user task computation. This paper proposes to judiciously utilize the storage redundancy in the form of mirroring existed in a RAID-10 style file system to alleviate this performance degradation. More specifically, a heuristic scheduling algorithm is developed, motivated from the observations of a simple cluster configuration, to spatially schedule write operations on the nodes with less …

Go to article

Smartstore: A New Metadata Organization Paradigm With Metadata Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian Sep 2013

Smartstore: A New Metadata Organization Paradigm With Metadata Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian

Yifeng Zhu

Existing data storage systems based on hierarchical directory tree do not meet scalability and functionality requirements for exponentially growing datasets and increasingly complex metadata queries in large-scale file systems with billions of files and Exabytes of data. This paper proposes a novel decentralized semantic-aware metadata organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. The decentralized design of SmartStore can improve system scalability and reduce query latency for both complex queries (including range and top-k queries), which is helpful to construct semantic-aware caching, and conventional filename-based point …

Go to article

Scalable And Adaptive Metadata Management In Ultra Large-Scale File Systems, Yu Hua, Yifeng Zhu, Hong Jiang Sep 2013

Scalable And Adaptive Metadata Management In Ultra Large-Scale File Systems, Yu Hua, Yifeng Zhu, Hong Jiang

Yifeng Zhu

This paper presents a scalable and adaptive decentralized metadata lookup scheme for ultra large-scale file systems (≥ Petabytes or even Exabytes). Our scheme logically organizes metadata servers (MDS) into a multi-layered query hierarchy and exploits grouped Bloom filters to efficiently route metadata requests to desired MDSs through the hierarchy. This metadata lookup scheme can be executed at the network or memory speed, without being bounded by the performance of slow disks. An effective workload balance algorithm is also developed in this paper for server reconfigurations. This scheme is evaluated through extensive trace-driven simulations and prototype implementation in Linux. Experimental results …

Go to article

Amp: An Affinity-Based Metadata Prefetching Scheme In Large-Scale Distributed Storage Systems, Lin Li, Xuemin Li, Hong Jiang, Yifeng Zhu Sep 2013

Amp: An Affinity-Based Metadata Prefetching Scheme In Large-Scale Distributed Storage Systems, Lin Li, Xuemin Li, Hong Jiang, Yifeng Zhu

Yifeng Zhu

Prefetching is an effective technique for improving file access performance, which can reduce access latency for I/O systems. In distributed storage system, prefetching for metadata files is critical for the overall system performance. In this paper, an Affinity-based Metadata Prefetching (APM) scheme is proposed for metadata servers in large-scale distributed storage systems to provide aggressive metadata prefetching. Through mining useful information about metadata assesses from past history, AMP can discover metadata file affinities accurately and intelligently for prefetching. Compared with LRU and some of the latest file prefetching algorithms such as NEXUS and C-miner, trace-driven simulations show that AMP can …

Go to article

Hba: Distributed Metadata Management For Large Cluster-Based Storage Systems, Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian Sep 2013

Hba: Distributed Metadata Management For Large Cluster-Based Storage Systems, Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian

Yifeng Zhu

An efficient and distributed scheme for file mapping or file lookup is critical in decentralizing metadata management within a group of metadata servers. This paper presents a novel technique called Hierarchical Bloom Filter Arrays (HBA) to map filenames to the metadata servers holding their metadata. Two levels of probabilistic arrays, namely, the Bloom filter arrays with different levels of accuracies, are used on each metadata server. One array, with lower accuracy and representing the distribution of the entire metadata, trades accuracy for significantly reduced memory overhead, whereas the other array, with higher accuracy, caches partial distribution information and exploits the …

Go to article

Smartstore: A New Metadata Organization Paradigm With Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian Sep 2013

Smartstore: A New Metadata Organization Paradigm With Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian

Yifeng Zhu

Existing storage systems using hierarchical directory tree do not meet scalability and functionality requirements for exponentially growing datasets and increasingly complex queries in Exabyte-level systems with billions of files. This paper proposes semantic-aware organization, called SmartStore, which exploits metadata semantics of files to judiciously aggregate correlated files into semantic-aware groups by using information retrieval tools. Decentralized design improves system scalability and reduces query latency for complex queries (range and top-k queries), which is conducive to constructing semantic-aware caching, and conventional filename-based query. SmartStore limits search scope of complex query to a single or a minimal number of semantically related groups …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Ceft: A Cost-Effective, Fault-Tolerant Parallel Virtual File System, Yifeng Zhu, Hong Jiang

Yifeng Zhu

A Novel Weighted-Graph-Based Grouping Algorithm For Metadata Prefetching, Peng Gu, Jun Wang, Yifeng Zhu, Hong Jiang, Pengju Shang

Yifeng Zhu

Race: A Robust Adaptive Caching Strategy For Buffer Cache, Yifeng Zhu, Hong Jiang

Yifeng Zhu

Rapport: Semantic-Sensitive Namespace Management In Large-Scale File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng

Yifeng Zhu

Exploiting Redundancy To Boost Performance In A Raid-10 Style Cluster-Based File System, Yifeng Zhu, Hong Jiang, Xiao Qin, Dan Feng, David Swanson

Yifeng Zhu

Smartstore: A New Metadata Organization Paradigm With Metadata Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian

Yifeng Zhu

Scalable And Adaptive Metadata Management In Ultra Large-Scale File Systems, Yu Hua, Yifeng Zhu, Hong Jiang

Yifeng Zhu

Amp: An Affinity-Based Metadata Prefetching Scheme In Large-Scale Distributed Storage Systems, Lin Li, Xuemin Li, Hong Jiang, Yifeng Zhu

Yifeng Zhu

Hba: Distributed Metadata Management For Large Cluster-Based Storage Systems, Yifeng Zhu, Hong Jiang, Jun Wang, Feng Xian

Yifeng Zhu

Smartstore: A New Metadata Organization Paradigm With Semantic-Awareness For Next-Generation File Systems, Yu Hua, Hong Jiang, Yifeng Zhu, Dan Feng, Lei Tian

Yifeng Zhu