Open Access. Powered by Scholars. Published by Universities.®

Data Storage Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

1047 Full-Text Articles 2710 Authors 200418 Downloads 71 Institutions

All Articles in Data Storage Systems

Faceted Search

1047 full-text articles. Page 1 of 49.

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed 2017 WMU

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in compresso ...


Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, KS Rajan 2017 International Institute of Information Technology Hyderabad Gachibowli, Hyderabad, India

Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, Ks Rajan

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Relational databases have been around for a long time and spatial databases have exploited this feature for close to two decades. The recent past has seen the development of NoSQL non-relational databases, which are now being adopted for spatial object storage and handling, too. While SQL databases face scalability and agility challenges and fail to take the advantage of the cheap memory and processing power available these days, NoSQL databases can handle the rise in the data storage and frequency at which it is accessed and processed - which are essential features needed in geospatial scenarios, which do not deal with ...


Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu 2017 University of Nebraska - Lincoln

Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu

Computer Science and Engineering: Theses, Dissertations, and Student Research

Advanced computing and sensing technologies enable scientists to study natural and physical phenomena with unprecedented precision, resulting in an explosive growth of data. The unprecedented amounts of data generated from large scientific simulations impose a grand challenge in data analytics and visualization due to the fact that data are too massive for transferring, storing, and processing.

This dissertation makes the first contribution to the design of novel transfer functions and application-aware data replacement policy to facilitate feature classification on highly parallel distributed systems. We design novel transfer functions that advance the classification of continuously changed volume data by combining the ...


Personalized Microtopic Recommendation On Microblogs, Yang LI, Jing JIANG, Ting LIU, Minghui QIU, Xiaofei SUN 2017 Singapore Management University

Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun

Research Collection School Of Information Systems

Microblogging services such as Sina Weibo and Twitter allow users to create tags explicitly indicated by the # symbol. In Sina Weibo, these tags are called microtopics, and in Twitter, they are called hashtags. In Sina Weibo, each microtopic has a designate page and can be directly visited or commented on. Recommending these microtopics to users based on their interests can help users efficiently acquire information. However, it is non-trivial to recommend microtopics to users to satisfy their information needs. In this article, we investigate the task of personalized microtopic recommendation, which exhibits two challenges. First, users usually do not give ...


Do Your Friends Make You Buy This Brand?: Modeling Social Recommendation With Topics And Brands, Minh Duc LUU, Ee-peng LIM 2017 Singapore Management University

Do Your Friends Make You Buy This Brand?: Modeling Social Recommendation With Topics And Brands, Minh Duc Luu, Ee-Peng Lim

Research Collection School Of Information Systems

Consumer behavior and marketing research have shown that brand has significant influence on product reviews and product purchase decisions. However, there is very little work on incorporating brand related factors into product recommender systems. Meanwhile, the similarity in brand preference between a user and other socially connected users also affects her adoption decisions. To integrate seamlessly the individual and social brand related factors into the recommendation process, we propose a novel model called Social Brand–Item–Topic (SocBIT). As the original SocBIT model does not enforce non-negativity, which poses some difficulty in result interpretation, we also propose a non-negative version ...


An Efficient Privacy-Preserving Outsourced Computation Over Public Data, Ximeng LIU, Baodong QIN, Robert DENG, Yingjiu LI 2017 Singapore Management University

An Efficient Privacy-Preserving Outsourced Computation Over Public Data, Ximeng Liu, Baodong Qin, Robert Deng, Yingjiu Li

Research Collection School Of Information Systems

In this paper, we propose a new efficient privacy-preserving outsourced computation framework over public data, called EPOC. EPOC allows a user to outsource the computation of a function over multi-dimensional public data to the cloud while protecting the privacy of the function and its output. Specifically, we introduce three types of EPOC in order to tradeoff different levels of privacy protection and performance. We present a new cryptosystem called Switchable Homomorphic Encryption with Partially Decryption (SHED) as the core cryptographic primitive for EPOC. We introduce two coding techniques, called message pre-coding technique and message extending and coding technique respectively, for ...


Attribute-Based Keyword Search Over Hierarchical Data In Cloud Computing, Yinbin MIAO, Jianfeng MA, Ximeng LIU, Xinghua LI, Qi JIANG, Junwei ZHANG 2017 Singapore Management University

Attribute-Based Keyword Search Over Hierarchical Data In Cloud Computing, Yinbin Miao, Jianfeng Ma, Ximeng Liu, Xinghua Li, Qi Jiang, Junwei Zhang

Research Collection School Of Information Systems

Searchable encryption (SE) has been a promising technology which allows users to perform search queries over encrypted data. However, the most of existing SE schemes cannot deal with the shared records that have hierarchical structures. In this paper, we devise a basic cryptographic primitive called as attribute-based keyword search over hierarchical data (ABKS-HD) scheme by using the ciphertext-policy attribute-based encryption (CP-ABE) technique, but this basic scheme cannot satisfy all the desirable requirements of cloud systems. The facts that the single keyword search will yield many irrelevant search results and the revoked users can access the unauthorized data with the old ...


Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe 2017 Dublin Institute of Technology

Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe

Conference papers

This paper provides an overview of historic and current organizational limitations emerging in the Architecture, Engineering, Construction, Building Owner / Operations (AECOO) Industry. It then provides an overview of new technologies that attempt to mitigate these limitations. However, these technologies, taken together, appear to be converging and creating entirely new organizational structures in the AEC industries. This may be characterized by the emergence of what is called the Network Effect and it’s related calculus. This paper culminates with an introduction to Blockchain Technology (BT) and it’s integration with the emergence of groundbreaking technologies such as Internet of Things (IoT ...


The Practicality Of Cloud Computing, Xiaohua (Cindy) Li 2017 Sacred Heart University

The Practicality Of Cloud Computing, Xiaohua (Cindy) Li

Cindy Li

Since its inception, cloud computing has become the current paradigm. Organizations of different size and type have embraced the concept because of its both technological and economic advantages. Sacred Heart University Library has recently published its newly designed website on the cloud. For a small academic library, what does it mean to put their online data on the cloud? This paper will analyze and discuss the advantages of cloud computing, and some potential obstacles created by it through the author’s observations. This paper hopes the uniqueness of the case will contribute to the improvement of cloud computing experience of ...


Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan 2017 Purdue University

Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan

The Summer Undergraduate Research Fellowship (SURF) Symposium

Thousands of public cameras live-stream an abundance of data to the Internet every day. If analyzed in real-time by computer programs, these cameras could provide unprecedented utility as a global sensory tool. For example, if cameras capture the scene of a fire, a system running image analysis software on their footage in real-time could be programmed to react appropriately (perhaps call firefighters). No such technology has been deployed at large scale because the sheer computing resources needed have yet to be determined. In order to help us build computer systems powerful enough to achieve such lifesaving feats, we developed a ...


Pivot-Based Metric Indexing, Lu CHEN, Yunjun GAO, Baihua ZHENG, Christian S. JENSEN, Hanyu YANG, Keyu YANG 2017 Singapore Management University

Pivot-Based Metric Indexing, Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, Keyu Yang

Research Collection School Of Information Systems

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies ...


Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan LE, Hady Wirawan LAUW 2017 Singapore Management University

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady Wirawan Lauw

Research Collection School Of Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a ...


Basket-Sensitive Personalized Item Recommendation, Duc Trong LE, Hady Wirawan LAUW, Yuan FANG 2017 Singapore Management University

Basket-Sensitive Personalized Item Recommendation, Duc Trong Le, Hady Wirawan Lauw, Yuan Fang

Research Collection School Of Information Systems

Personalized item recommendation is useful in narrowing down the list of options provided to a user. In this paper, we address the problem scenario where the user is currently holding a basket of items, and the task is to recommend an item to be added to the basket. Here, we assume that items currently in a basket share some association based on an underlying latent need, e.g., ingredients to prepare some dish, spare parts of some device. Thus, it is important that a recommended item is relevant not only to the user, but also to the existing items in ...


Geometric Approaches For Top-K Queries [Tutorial], Kyriakos MOURATIDIS 2017 Singapore Management University

Geometric Approaches For Top-K Queries [Tutorial], Kyriakos Mouratidis

Research Collection School Of Information Systems

Top-k processing is a well-studied problem with numerous applications that is becoming increasingly relevant with the growing availability of recommendation systems and decision-making software. The objective of this tutorial is twofold. First, we will delve into the geometric aspects of top-k processing. Second, we will cover complementary features to top-k queries, with strong practical relevance and important applications, that have a computational geometric nature. The tutorial will close with insights in the effect of dimensionality on the meaningfulness of top-k queries, and interesting similarities to nearest neighbor search.


Sparse Online Learning Of Image Similarity, Xingyu GAO, Steven C. H. HOI, Yongdong ZHANG, Jianshe ZHOU, Ji WAN, Zhenyu CHEN, Jintao LI, Jianke ZHU 2017 Singapore Management University

Sparse Online Learning Of Image Similarity, Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Jianshe Zhou, Ji Wan, Zhenyu Chen, Jintao Li, Jianke Zhu

Research Collection School Of Information Systems

Learning image similarity plays a critical role in real-world multimedia information retrieval applications, especially in Content-Based Image Retrieval (CBIR) tasks, in which an accurate retrieval of visually similar objects largely relies on an effective image similarity function. Crafting a good similarity function is very challenging because visual contents of images are often represented as feature vectors in high-dimensional spaces, for example, via bag-of-words (BoW) representations, and traditional rigid similarity functions, for example, cosine similarity, are often suboptimal for CBIR tasks. In this article, we address this fundamental problem, that is, learning to optimize image similarity with sparse and high-dimensional representations ...


Well-Tuned Algorithms For The Team Orienteering Problem With Time Windows, Aldy GUNAWAN, Hoong Chuin LAU, Pieter VANSTEENWEGEN, Kun LU 2017 Singapore Management University

Well-Tuned Algorithms For The Team Orienteering Problem With Time Windows, Aldy Gunawan, Hoong Chuin Lau, Pieter Vansteenwegen, Kun Lu

Research Collection School Of Information Systems

The Team Orienteering Problem with Time Windows (TOPTW) is the extension of the Orienteering Problem (OP) where each node is limited by a predefined time window during which the service has to start. The objective of the TOPTW is to maximize the total collected score by visiting a set of nodes with a limited number of paths. We propose two algorithms, Iterated Local Search and a hybridization of Simulated Annealing and Iterated Local Search (SAILS), to solve the TOPTW. As indicated in multiple research works on algorithms for the OP and its variants, determining appropriate parameter values in a statistical ...


Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu CHEN, Yunjun GAO, Aoxiao ZHONG, Christian S. JENSEN, Gang CHEN, Baihua ZHENG 2017 Singapore Management University

Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua Zheng

Research Collection School Of Information Systems

Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a ...


Time-Aware Conversion Prediction, Wendi JI, Xiao Ling WANG, Feida ZHU 2017 Singapore Management University

Time-Aware Conversion Prediction, Wendi Ji, Xiao Ling Wang, Feida Zhu

Research Collection School Of Information Systems

The importance of product recommendation has been well recognized as a central task in business intelligence for e-commerce websites. Interestingly, what has been less aware of is the fact that different products take different time periods for conversion. The “conversion” here refers to actually a more general set of pre-defined actions, including for example purchases or registrations in recommendation and advertising systems. The mismatch between the product’s actual conversion period and the application’s target conversion period has been the subtle culprit compromising many existing recommendation algorithms.The challenging question: what products should be recommended for a given time ...


On Efficiently Finding Reverse K-Nearest Neighbors Over Uncertain Graphs, Yunjun GAO, Xiaoye MIAO, Gang CHEN, Baihua ZHENG, Deng CAI, Huiyong CUI 2017 Singapore Management University

On Efficiently Finding Reverse K-Nearest Neighbors Over Uncertain Graphs, Yunjun Gao, Xiaoye Miao, Gang Chen, Baihua Zheng, Deng Cai, Huiyong Cui

Research Collection School Of Information Systems

Reverse k-nearest neighbor (RkNN) query on graphs returns the data objects that take a specified query object q as one of their k-nearest neighbors. It has significant influence in many real-life applications including resource allocation and profile-based marketing. However, to the best of our knowledge, there is little previous work on RkNN search over uncertain graph data, even though many complex networks such as traffic networks and protein–protein interaction networks are often modeled as uncertain graphs. In this paper, we systematically study the problem of reversek-nearest neighbor search on uncertain graphs (UG-RkNN search for short), where graph edges contain ...


Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu LU, Archan MISRA, Wen SUN, Huayu WU 2017 Singapore Management University

Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu

Research Collection School Of Information Systems

We advocate for and introduce TRANSense, a framework for urban transportation service analytics that combines participatory smartphone sensing data with city-scale transportation-related transactional data (taxis, trains etc.). Our work is driven by the observed limitations of using each data type in isolation: (a) commonly-used anonymous city-scale datasets (such as taxi bookings and GPS trajectories) provide insights into the aggregate behavior of transport infrastructure, but fail to reveal individual-specific transport experiences (e.g., wait times in taxi queues); while (b) mobile sensing data can capture individual-specific commuting-related activities, but suffers from accuracy and energy overhead challenges due to usage artefacts and ...


Digital Commons powered by bepress