Open Access. Powered by Scholars. Published by Universities.®

Data Storage Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

1082 Full-Text Articles 2757 Authors 207240 Downloads 74 Institutions

All Articles in Data Storage Systems

Faceted Search

1082 full-text articles. Page 1 of 51.

Variance-Optimal Offline And Streaming Stratified Random Sampling, Trong Duc Nguyen, Ming-Hung Shih, AT&T Labs–Research, Srikanta Tirthapura, Bojian Xu 2018 Iowa State University

Variance-Optimal Offline And Streaming Stratified Random Sampling, Trong Duc Nguyen, Ming-Hung Shih, At&T Labs–Research, Srikanta Tirthapura, Bojian Xu

Electrical and Computer Engineering Publications

Stratified random sampling (SRS) is a fundamental sampling technique that provides accurate estimates for aggregate queries using a small size sample, and has been used widely for approximate query processing. A key question in SRS is how to partition a target sample size among different strata. While Neyman's allocation provides a solution that minimizes the variance of an estimate using this sample, it works under the assumption that each stratum is abundant, i.e. has a large number of data points to choose from. This assumption may not hold in general: one or more strata may be bounded, and ...


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie FANG, Quan Z. SHENG, Xianzhi WANG, Mahmoud BARHAMGI, Lina YAO, Anne H.H. NGU 2017 Singapore Management University

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these ...


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie FANG, Quan Z. SHENG, Xianzhi WANG, Mahmoud BARHAMGI, Lina YAO, Anne H.H. NGU 2017 Singapore Management University

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these ...


Building A Data “Deep State” @Gvsu, Matt Schultz 2017 Grand Valley State University

Building A Data “Deep State” @Gvsu, Matt Schultz

Matt Schultz

GVSU Libraries is advancing an agenda to evolve its data management support from that of ad-hoc faculty consultations to enacting a suite of new collaborative and dependable library services. This presentation will share details and lessons-learned from experimentations that range from liaison training to repository software developments.


Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed 2017 WMU

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso ...


Understanding The Determinants Affecting The Continuance Intention To Use Cloud Computing, Shailja Tripathi Dr. 2017 IFHE University, IBS Hyderabad

Understanding The Determinants Affecting The Continuance Intention To Use Cloud Computing, Shailja Tripathi Dr.

Journal of International Technology and Information Management

Cloud computing has been progressively implemented in the organizations. The purpose of the paper is to understand the fundamental factors influencing the senior manager’s continuance intention to use cloud computing in organizations. A conceptual framework was developed by using the Technology Acceptance Model (TAM) as a base theoretical model. A questionnaire was used to collect the data from several companies in IT, manufacturing, finance, pharmaceutical and retail sectors in India. The data analysis was done using structural equation modeling technique. Perceived usefulness and perceived ubiquity are identified as important factors that affect continuance intention to use cloud computing. In ...


Table Of Contents Jitim Vol 26 Issue 3, 2017, 2017 California State University, San Bernardino

Table Of Contents Jitim Vol 26 Issue 3, 2017

Journal of International Technology and Information Management

Table of Contents


Accelerating Dynamic Graph Analytics On Gpus, Mo SHAN, Yuchen LI, Bingsheng HE, Kian-Lee TAN 2017 Singapore Management University

Accelerating Dynamic Graph Analytics On Gpus, Mo Shan, Yuchen Li, Bingsheng He, Kian-Lee Tan

Research Collection School Of Information Systems

As graph analytics often involves compute-intensive operations,GPUs have been extensively used to accelerate theprocessing. However, in many applications such as socialnetworks, cyber security, and fraud detection, their representativegraphs evolve frequently and one has to perform arebuild of the graph structure on GPUs to incorporate theupdates. Hence, rebuilding the graphs becomes the bottleneckof processing high-speed graph streams. In this paper,we propose a GPU-based dynamic graph storage schemeto support existing graph algorithms easily. Furthermore,we propose parallel update algorithms to support ecientstream updates so that the maintained graph is immediatelyavailable for high-speed analytic processing on GPUs. Ourextensive experiments with three ...


Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, KS Rajan 2017 International Institute of Information Technology Hyderabad Gachibowli, Hyderabad, India

Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, Ks Rajan

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Relational databases have been around for a long time and spatial databases have exploited this feature for close to two decades. The recent past has seen the development of NoSQL non-relational databases, which are now being adopted for spatial object storage and handling, too. While SQL databases face scalability and agility challenges and fail to take the advantage of the cheap memory and processing power available these days, NoSQL databases can handle the rise in the data storage and frequency at which it is accessed and processed - which are essential features needed in geospatial scenarios, which do not deal with ...


Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu 2017 University of Nebraska - Lincoln

Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu

Computer Science and Engineering: Theses, Dissertations, and Student Research

Advanced computing and sensing technologies enable scientists to study natural and physical phenomena with unprecedented precision, resulting in an explosive growth of data. The unprecedented amounts of data generated from large scientific simulations impose a grand challenge in data analytics and visualization due to the fact that data are too massive for transferring, storing, and processing.

This dissertation makes the first contribution to the design of novel transfer functions and application-aware data replacement policy to facilitate feature classification on highly parallel distributed systems. We design novel transfer functions that advance the classification of continuously changed volume data by combining the ...


Personalized Microtopic Recommendation On Microblogs, Yang LI, Jing JIANG, Ting LIU, Minghui QIU, Xiaofei SUN 2017 Singapore Management University

Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun

Research Collection School Of Information Systems

Microblogging services such as Sina Weibo and Twitter allow users to create tags explicitly indicated by the # symbol. In Sina Weibo, these tags are called microtopics, and in Twitter, they are called hashtags. In Sina Weibo, each microtopic has a designate page and can be directly visited or commented on. Recommending these microtopics to users based on their interests can help users efficiently acquire information. However, it is non-trivial to recommend microtopics to users to satisfy their information needs. In this article, we investigate the task of personalized microtopic recommendation, which exhibits two challenges. First, users usually do not give ...


Do Your Friends Make You Buy This Brand?: Modeling Social Recommendation With Topics And Brands, Minh Duc LUU, Ee-peng LIM 2017 Singapore Management University

Do Your Friends Make You Buy This Brand?: Modeling Social Recommendation With Topics And Brands, Minh Duc Luu, Ee-Peng Lim

Research Collection School Of Information Systems

Consumer behavior and marketing research have shown that brand has significant influence on product reviews and product purchase decisions. However, there is very little work on incorporating brand related factors into product recommender systems. Meanwhile, the similarity in brand preference between a user and other socially connected users also affects her adoption decisions. To integrate seamlessly the individual and social brand related factors into the recommendation process, we propose a novel model called Social Brand–Item–Topic (SocBIT). As the original SocBIT model does not enforce non-negativity, which poses some difficulty in result interpretation, we also propose a non-negative version ...


Attribute-Based Keyword Search Over Hierarchical Data In Cloud Computing, Yinbin MIAO, Jianfeng MA, Ximeng LIU, Xinghua LI, Qi JIANG, Junwei ZHANG 2017 Singapore Management University

Attribute-Based Keyword Search Over Hierarchical Data In Cloud Computing, Yinbin Miao, Jianfeng Ma, Ximeng Liu, Xinghua Li, Qi Jiang, Junwei Zhang

Research Collection School Of Information Systems

Searchable encryption (SE) has been a promising technology which allows users to perform search queries over encrypted data. However, the most of existing SE schemes cannot deal with the shared records that have hierarchical structures. In this paper, we devise a basic cryptographic primitive called as attribute-based keyword search over hierarchical data (ABKS-HD) scheme by using the ciphertext-policy attribute-based encryption (CP-ABE) technique, but this basic scheme cannot satisfy all the desirable requirements of cloud systems. The facts that the single keyword search will yield many irrelevant search results and the revoked users can access the unauthorized data with the old ...


Clsters: A General System For Reducing Errors Of Trajectories Under Challenging Localization Situations, Hao WU, Weiwei SUN, Baihua ZHENG, Li YANG, Wei ZHOU 2017 Singapore Management University

Clsters: A General System For Reducing Errors Of Trajectories Under Challenging Localization Situations, Hao Wu, Weiwei Sun, Baihua Zheng, Li Yang, Wei Zhou

Research Collection School Of Information Systems

Trajectory data generated by outdoor activities have great potential for location based services. However, depending on the localization technique used, certain trajectory data could contain large errors. For example, the error of trajectories generated by cellular-based localization techniques is around 100m which is ten times larger than that of GPS-based trajectories. Hence, enhancing the utility of those large-error trajectories becomes a challenge. In this paper we show how to improve the quality of trajectory data having large errors. Some existing works reduce the error through hardware which requires information such as the time of arrival (TOA), received signal strength indication ...


Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe 2017 Dublin Institute of Technology

Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe

Conference papers

This paper provides an overview of historic and current organizational limitations emerging in the Architecture, Engineering, Construction, Building Owner / Operations (AECOO) Industry. It then provides an overview of new technologies that attempt to mitigate these limitations. However, these technologies, taken together, appear to be converging and creating entirely new organizational structures in the AEC industries. This may be characterized by the emergence of what is called the Network Effect and it’s related calculus. This paper culminates with an introduction to Blockchain Technology (BT) and it’s integration with the emergence of groundbreaking technologies such as Internet of Things (IoT ...


The Practicality Of Cloud Computing, Xiaohua (Cindy) Li 2017 Sacred Heart University

The Practicality Of Cloud Computing, Xiaohua (Cindy) Li

Cindy Li

Since its inception, cloud computing has become the current paradigm. Organizations of different size and type have embraced the concept because of its both technological and economic advantages. Sacred Heart University Library has recently published its newly designed website on the cloud. For a small academic library, what does it mean to put their online data on the cloud? This paper will analyze and discuss the advantages of cloud computing, and some potential obstacles created by it through the author’s observations. This paper hopes the uniqueness of the case will contribute to the improvement of cloud computing experience of ...


Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan 2017 Purdue University

Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan

The Summer Undergraduate Research Fellowship (SURF) Symposium

Thousands of public cameras live-stream an abundance of data to the Internet every day. If analyzed in real-time by computer programs, these cameras could provide unprecedented utility as a global sensory tool. For example, if cameras capture the scene of a fire, a system running image analysis software on their footage in real-time could be programmed to react appropriately (perhaps call firefighters). No such technology has been deployed at large scale because the sheer computing resources needed have yet to be determined. In order to help us build computer systems powerful enough to achieve such lifesaving feats, we developed a ...


A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong 2017 University of Tennessee, Knoxville

A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong

Doctoral Dissertations

Data has overwhelmed the digital world in terms of volume, variety and velocity. Data- intensive applications are facing unprecedented challenges. On the other hand, computation resources, such as memory, suffer from shortage comparing to data scale. However, in certain applications, it is a must to process large amount of data in a time efficient manner. Probabilistic approaches are compromises between these three perspectives: large amount of data, limited computation resources and high time efficiency, in the sense that those approaches cannot guarantee 100% correctness, their error rates, however, are predictable and adjustable depending on available computation resources and time constraints ...


Sparse Online Learning Of Image Similarity, Xingyu GAO, Steven C. H. HOI, Yongdong ZHANG, Jianshe ZHOU, Ji WAN, Zhenyu CHEN, Jintao LI, Jianke ZHU 2017 Singapore Management University

Sparse Online Learning Of Image Similarity, Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Jianshe Zhou, Ji Wan, Zhenyu Chen, Jintao Li, Jianke Zhu

Research Collection School Of Information Systems

Learning image similarity plays a critical role in real-world multimedia information retrieval applications, especially in Content-Based Image Retrieval (CBIR) tasks, in which an accurate retrieval of visually similar objects largely relies on an effective image similarity function. Crafting a good similarity function is very challenging because visual contents of images are often represented as feature vectors in high-dimensional spaces, for example, via bag-of-words (BoW) representations, and traditional rigid similarity functions, for example, cosine similarity, are often suboptimal for CBIR tasks. In this article, we address this fundamental problem, that is, learning to optimize image similarity with sparse and high-dimensional representations ...


Well-Tuned Algorithms For The Team Orienteering Problem With Time Windows, Aldy GUNAWAN, Hoong Chuin LAU, Pieter VANSTEENWEGEN, Kun LU 2017 Singapore Management University

Well-Tuned Algorithms For The Team Orienteering Problem With Time Windows, Aldy Gunawan, Hoong Chuin Lau, Pieter Vansteenwegen, Kun Lu

Research Collection School Of Information Systems

The Team Orienteering Problem with Time Windows (TOPTW) is the extension of the Orienteering Problem (OP) where each node is limited by a predefined time window during which the service has to start. The objective of the TOPTW is to maximize the total collected score by visiting a set of nodes with a limited number of paths. We propose two algorithms, Iterated Local Search and a hybridization of Simulated Annealing and Iterated Local Search (SAILS), to solve the TOPTW. As indicated in multiple research works on algorithms for the OP and its variants, determining appropriate parameter values in a statistical ...


Digital Commons powered by bepress