Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Storage Systems

PDF

2017

Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 67

Full-Text Articles in Engineering

Breadcrumbs: Privacy As A Privilege, Prachi Bhardwaj Dec 2017

Breadcrumbs: Privacy As A Privilege, Prachi Bhardwaj

Capstones

Breadcrumbs: Privacy as a Privilege Abstract

By: Prachi Bhardwaj

In 2017, the world saw more data breaches than in any year prior. The count was more than the all-time high record in 2016, which was 40 percent more than the year before that.

That’s because consumer data is incredibly valuable today. In the last three decades, data storage has gone from being stored physically to being stored almost entirely digitally, which means consumer data is more accessible and applicable to business strategies. As a result, companies are gathering data in ways previously unknown to the average consumer, and hackers are …


Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh Dec 2017

Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh

Dissertations and Theses Collection (Open Access)

Due to the increasing population and lack of coordination, there is a mismatch in supply and demand of common resources (e.g., shared bikes, ambulances, taxis) in urban environments, which has deteriorated a wide variety of quality of life metrics such as success rate in issuing shared bikes, response times for emergency needs, waiting times in queues etc. Thus, in my thesis, I propose efficient algorithms that optimise the quality of life metrics by proactively redistributing the resources using intelligent operational (day-to-day) and strategic (long-term) decisions in the context of urban transportation and health & safety. For urban transportation, Bike Sharing …


Process Models Discovery And Traces Classification: A Fuzzy-Bpmn Mining Approach., Kingsley Okoye Dr, Usman Naeem Dr, Syed Islam Dr, Abdel-Rahman H. Tawil Dr, Elyes Lamine Dr Dec 2017

Process Models Discovery And Traces Classification: A Fuzzy-Bpmn Mining Approach., Kingsley Okoye Dr, Usman Naeem Dr, Syed Islam Dr, Abdel-Rahman H. Tawil Dr, Elyes Lamine Dr

Journal of International Technology and Information Management

The discovery of useful or worthwhile process models must be performed with due regards to the transformation that needs to be achieved. The blend of the data representations (i.e data mining) and process modelling methods, often allied to the field of Process Mining (PM), has proven to be effective in the process analysis of the event logs readily available in many organisations information systems. Moreover, the Process Discovery has been lately seen as the most important and most visible intellectual challenge related to the process mining. The method involves automatic construction of process models from event logs about any domain …


Data Protection In Nigeria: Addressing The Multifarious Challenges Of A Deficient Legal System, Roland Akindele Dec 2017

Data Protection In Nigeria: Addressing The Multifarious Challenges Of A Deficient Legal System, Roland Akindele

Journal of International Technology and Information Management

This paper provides an overview of the current state of privacy and data protection policies and regulations in Nigeria. The paper contends that the extant legal regime in Nigeria is patently inadequate to effectively protect individuals against abuse resulting from the processing of their personal data. The view is based on the critical analysis of the current legal regime in Nigeria vis-à-vis the review of some vital data privacy issues. The paper makes some recommendations for the reform of the law.


Privacy Risks And Security Threats In Mhealth Apps, Brinda Hansraj Sampat, Bala Prabhakar Dec 2017

Privacy Risks And Security Threats In Mhealth Apps, Brinda Hansraj Sampat, Bala Prabhakar

Journal of International Technology and Information Management

mHealth (Mobile Health) applications (apps) have transformed the doctor-patient relationship. They help users with varied functionalities such as monitoring their health, understanding specific health conditions, consulting doctors online and achieving fitness goals. Whilst these apps provide an option of equitable and convenient access to healthcare, a lot of personal and sensitive data about users is collected, stored and shared to achieve these functionalities. Little is known about the privacy and security concerns these apps address. Based on literature review, this paper identifies the privacy risks and security features for evaluating thirty apps in the Medical category across two app distribution …


Table Of Contents Jitim Vol 26 Issue 4, 2017 Dec 2017

Table Of Contents Jitim Vol 26 Issue 4, 2017

Journal of International Technology and Information Management

Table of Contents


A Study Of Application-Awareness In Software-Defined Data Center Networks, Chui-Hui Chiu Nov 2017

A Study Of Application-Awareness In Software-Defined Data Center Networks, Chui-Hui Chiu

LSU Doctoral Dissertations

A data center (DC) has been a fundamental infrastructure for academia and industry for many years. Applications in DC have diverse requirements on communication. There are huge demands on data center network (DCN) control frameworks (CFs) for coordinating communication traffic. Simultaneously satisfying all demands is difficult and inefficient using existing traditional network devices and protocols. Recently, the agile software-defined Networking (SDN) is introduced to DCN for speeding up the development of the DCNCF. Application-awareness preserves the application semantics including the collective goals of communications. Previous works have illustrated that application-aware DCNCFs can much more efficiently allocate network resources by explicitly …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these graphs and …


Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao Nov 2017

Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao

Research Collection School Of Computing and Information Systems

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifying true values of data items of interest from conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, existing approaches generally assume every data item has exactly one true value, which fails to reflect the real world where data items with multiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items. SourceVote models the endorsement relations among sources by quantifying their two-sided inter-source agreements. In particular, two graphs are constructed to model inter-source relations. Then two aspects …


Building A Data “Deep State” @Gvsu, Matt Schultz Oct 2017

Building A Data “Deep State” @Gvsu, Matt Schultz

Matt Schultz

GVSU Libraries is advancing an agenda to evolve its data management support from that of ad-hoc faculty consultations to enacting a suite of new collaborative and dependable library services. This presentation will share details and lessons-learned from experimentations that range from liaison training to repository software developments.


Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed Oct 2017

Scalable Data Structure To Compress Next-Generation Sequencing Files And Its Application To Compressive Genomics, Sandino Vargas-Perez, Fahad Saeed

Parallel Computing and Data Science Lab Technical Reports

It is now possible to compress and decompress large-scale Next-Generation Sequencing files taking advantage of high-performance computing techniques. To this end, we have recently introduced a scalable hybrid parallel algorithm, called phyNGSC, which allows fast compression as well as decompression of big FASTQ datasets using distributed and shared memory programming models via MPI and OpenMP. In this paper we present the design and implementation of a novel parallel data structure which lessens the dependency on decompression and facilitates the handling of DNA sequences in their compressed state using fine-grained decompression in a technique that is identified as in …


Semantic Reasoning In Zero Example Video Event Retrieval, M. H. T. De Boer, Yi-Jie Lu, Hao Zhang, Klamer Schutte, Chong-Wah Ngo, Wessel Kraaij Oct 2017

Semantic Reasoning In Zero Example Video Event Retrieval, M. H. T. De Boer, Yi-Jie Lu, Hao Zhang, Klamer Schutte, Chong-Wah Ngo, Wessel Kraaij

Research Collection School Of Computing and Information Systems

Searching in digital video data for high-level events, such as a parade or a car accident, is challenging when the query is textual and lacks visual example images or videos. Current research in deep neural networks is highly beneficial for the retrieval of high-level events using visual examples, but without examples it is still hard to (1) determine which concepts are useful to pre-train (Vocabulary challenge) and (2) which pre-trained concept detectors are relevant for a certain unseen high-level event (Concept Selection challenge). In our article, we present our Semantic Event Retrieval Systemwhich (1) shows the importance of high-level concepts …


Understanding The Determinants Affecting The Continuance Intention To Use Cloud Computing, Shailja Tripathi Dr. Oct 2017

Understanding The Determinants Affecting The Continuance Intention To Use Cloud Computing, Shailja Tripathi Dr.

Journal of International Technology and Information Management

Cloud computing has been progressively implemented in the organizations. The purpose of the paper is to understand the fundamental factors influencing the senior manager’s continuance intention to use cloud computing in organizations. A conceptual framework was developed by using the Technology Acceptance Model (TAM) as a base theoretical model. A questionnaire was used to collect the data from several companies in IT, manufacturing, finance, pharmaceutical and retail sectors in India. The data analysis was done using structural equation modeling technique. Perceived usefulness and perceived ubiquity are identified as important factors that affect continuance intention to use cloud computing. In addition, …


Table Of Contents Jitim Vol 26 Issue 3, 2017 Oct 2017

Table Of Contents Jitim Vol 26 Issue 3, 2017

Journal of International Technology and Information Management

Table of Contents


Cross-Modal Recipe Retrieval With Rich Food Attributes, Jingjing Chen, Chong-Wah Ngo, Tat-Seng Chua Oct 2017

Cross-Modal Recipe Retrieval With Rich Food Attributes, Jingjing Chen, Chong-Wah Ngo, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Food is rich of visible (e.g., colour, shape) and procedural (e.g., cutting, cooking) attributes. Proper leveraging of these attributes, particularly the interplay among ingredients, cutting and cooking methods, for health-related applications has not been previously explored. This paper investigates cross-modal retrieval of recipes, specifically to retrieve a text-based recipe given a food picture as query. As similar ingredient composition can end up with wildly different dishes depending on the cooking and cutting procedures, the difficulty of retrieval originates from fine-grained recognition of rich attributes from pictures. With a multi-task deep learning model, this paper provides insights on the feasibility of …


Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, Ks Rajan Sep 2017

Analyzing The Performance Of Nosql Vs. Sql Databases For Spatial And Aggregate Queries, Sarthak Agarwal, Ks Rajan

Free and Open Source Software for Geospatial (FOSS4G) Conference Proceedings

Relational databases have been around for a long time and spatial databases have exploited this feature for close to two decades. The recent past has seen the development of NoSQL non-relational databases, which are now being adopted for spatial object storage and handling, too. While SQL databases face scalability and agility challenges and fail to take the advantage of the cheap memory and processing power available these days, NoSQL databases can handle the rise in the data storage and frequency at which it is accessed and processed - which are essential features needed in geospatial scenarios, which do not deal …


Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun Sep 2017

Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun

Research Collection School Of Computing and Information Systems

Microblogging services such as Sina Weibo and Twitter allow users to create tags explicitly indicated by the # symbol. In Sina Weibo, these tags are called microtopics, and in Twitter, they are called hashtags. In Sina Weibo, each microtopic has a designate page and can be directly visited or commented on. Recommending these microtopics to users based on their interests can help users efficiently acquire information. However, it is non-trivial to recommend microtopics to users to satisfy their information needs. In this article, we investigate the task of personalized microtopic recommendation, which exhibits two challenges. First, users usually do not …


Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu Sep 2017

Feature Extraction And Parallel Visualization For Large-Scale Scientific Data, Lina Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Advanced computing and sensing technologies enable scientists to study natural and physical phenomena with unprecedented precision, resulting in an explosive growth of data. The unprecedented amounts of data generated from large scientific simulations impose a grand challenge in data analytics and visualization due to the fact that data are too massive for transferring, storing, and processing.

This dissertation makes the first contribution to the design of novel transfer functions and application-aware data replacement policy to facilitate feature classification on highly parallel distributed systems. We design novel transfer functions that advance the classification of continuously changed volume data by combining the …


Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe Aug 2017

Bim+Blockchain: A Solution To The Trust Problem In Collaboration?, Malachy Mathews, Dan Robles, Brian Bowe

Conference papers

This paper provides an overview of historic and current organizational limitations emerging in the Architecture, Engineering, Construction, Building Owner / Operations (AECOO) Industry. It then provides an overview of new technologies that attempt to mitigate these limitations. However, these technologies, taken together, appear to be converging and creating entirely new organizational structures in the AEC industries. This may be characterized by the emergence of what is called the Network Effect and it’s related calculus. This paper culminates with an introduction to Blockchain Technology (BT) and it’s integration with the emergence of groundbreaking technologies such as Internet of Things (IoT), Artificial …


Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan Aug 2017

Resource Estimation For Large Scale, Real-Time Image Analysis On Live Video Cameras Worldwide, Caleb Tung, Yung-Hsiang Lu, Anup Mohan

The Summer Undergraduate Research Fellowship (SURF) Symposium

Thousands of public cameras live-stream an abundance of data to the Internet every day. If analyzed in real-time by computer programs, these cameras could provide unprecedented utility as a global sensory tool. For example, if cameras capture the scene of a fire, a system running image analysis software on their footage in real-time could be programmed to react appropriately (perhaps call firefighters). No such technology has been deployed at large scale because the sheer computing resources needed have yet to be determined. In order to help us build computer systems powerful enough to achieve such lifesaving feats, we developed a …


A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong Aug 2017

A Probabilistic Software Framework For Scalable Data Storage And Integrity Check, Sisi Xiong

Doctoral Dissertations

Data has overwhelmed the digital world in terms of volume, variety and velocity. Data- intensive applications are facing unprecedented challenges. On the other hand, computation resources, such as memory, suffer from shortage comparing to data scale. However, in certain applications, it is a must to process large amount of data in a time efficient manner. Probabilistic approaches are compromises between these three perspectives: large amount of data, limited computation resources and high time efficiency, in the sense that those approaches cannot guarantee 100% correctness, their error rates, however, are predictable and adjustable depending on available computation resources and time constraints. …


Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu Aug 2017

Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu

Research Collection School Of Computing and Information Systems

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors …


Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu Aug 2017

Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu

Research Collection School Of Computing and Information Systems

We advocate for and introduce TRANSense, a framework for urban transportation service analytics that combines participatory smartphone sensing data with city-scale transportation-related transactional data (taxis, trains etc.). Our work is driven by the observed limitations of using each data type in isolation: (a) commonly-used anonymous city-scale datasets (such as taxi bookings and GPS trajectories) provide insights into the aggregate behavior of transport infrastructure, but fail to reveal individual-specific transport experiences (e.g., wait times in taxi queues); while (b) mobile sensing data can capture individual-specific commuting-related activities, but suffers from accuracy and energy overhead challenges due to usage artefacts and lack …


Integrity Coded Databases: Ensuring Correctness And Freshness Of Outsourced Databases, Ujwal Karki Aug 2017

Integrity Coded Databases: Ensuring Correctness And Freshness Of Outsourced Databases, Ujwal Karki

Boise State University Theses and Dissertations

In recent years, cloud storage has become an inexpensive and convenient option for individuals and businesses to store and retrieve information. The cloud releases the data owner from the financial burden of hiring professionals to create, update and maintain local databases. The advancements in the field of networking and the growing need for computing resources for various applications have made cloud computing more demanding. Its positive aspects make the cloud an attractive option for data storage, but this service comes with a cost that it requires the data owner to relinquish control of their information to the cloud service provider. …


Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua Zheng Aug 2017

Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua Zheng

Research Collection School Of Computing and Information Systems

Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a small set …


Pivot-Based Metric Indexing, Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, Keyu Yang Aug 2017

Pivot-Based Metric Indexing, Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, Keyu Yang

Research Collection School Of Computing and Information Systems

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies …


Geometric Approaches For Top-K Queries [Tutorial], Kyriakos Mouratidis Aug 2017

Geometric Approaches For Top-K Queries [Tutorial], Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Top-k processing is a well-studied problem with numerous applications that is becoming increasingly relevant with the growing availability of recommendation systems and decision-making software. The objective of this tutorial is twofold. First, we will delve into the geometric aspects of top-k processing. Second, we will cover complementary features to top-k queries, with strong practical relevance and important applications, that have a computational geometric nature. The tutorial will close with insights in the effect of dimensionality on the meaningfulness of top-k queries, and interesting similarities to nearest neighbor search.


Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw Aug 2017

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a …


Time-Aware Conversion Prediction, Wendi Ji, Xiaoling Wang, Feida Zhu Aug 2017

Time-Aware Conversion Prediction, Wendi Ji, Xiaoling Wang, Feida Zhu

Research Collection School Of Computing and Information Systems

The importance of product recommendation has been well recognized as a central task in business intelligence for e-commerce websites. Interestingly, what has been less aware of is the fact that different products take different time periods for conversion. The “conversion” here refers to actually a more general set of pre-defined actions, including for example purchases or registrations in recommendation and advertising systems. The mismatch between the product’s actual conversion period and the application’s target conversion period has been the subtle culprit compromising many existing recommendation algorithms.The challenging question: what products should be recommended for a given time period to maximize …