Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Databases and Information Systems

Data Provenance Via Differential Auditing, Xin Mu, Ming Pang, Feida Zhu Nov 2023

Data Provenance Via Differential Auditing, Xin Mu, Ming Pang, Feida Zhu

Research Collection School Of Computing and Information Systems

With the rising awareness of data assets, data governance, which is to understand where data comes from, how it is collected, and how it is used, has been assuming evergrowing importance. One critical component of data governance gaining increasing attention is auditing machine learning models to determine if specific data has been used for training. Existing auditing techniques, like shadow auditing methods, have shown feasibility under specific conditions such as having access to label information and knowledge of training protocols. However, these conditions are often not met in most real-world applications. In this paper, we introduce a practical framework for …


Hrgcn: Heterogeneous Graph-Level Anomaly Detection With Hierarchical Relation-Augmented Graph Neural Networks, Jiaxi Li, Guansong Pang, Ling Chen, Mohammad-Reza Namazi-Rad Oct 2023

Hrgcn: Heterogeneous Graph-Level Anomaly Detection With Hierarchical Relation-Augmented Graph Neural Networks, Jiaxi Li, Guansong Pang, Ling Chen, Mohammad-Reza Namazi-Rad

Research Collection School Of Computing and Information Systems

This work considers the problem of heterogeneous graph-level anomaly detection. Heterogeneous graphs are commonly used to represent behaviours between different types of entities in complex industrial systems for capturing as much information about the system operations as possible. Detecting anomalous heterogeneous graphs from a large set of system behaviour graphs is crucial for many real-world applications like online web/mobile service and cloud access control. To address the problem, we propose HRGCN, an unsupervised deep heterogeneous graph neural network, to model complex heterogeneous relations between different entities in the system for effectively identifying these anomalous behaviour graphs. HRGCN trains a hierarchical …


Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li May 2023

Reinforced Adaptation Network For Partial Domain Adaptation, Keyu Wu, Min Wu, Zhenghua Chen, Ruibing Jin, Wei Cui, Zhiguang Cao, Xiaoli Li

Research Collection School Of Computing and Information Systems

Domain adaptation enables generalized learning in new environments by transferring knowledge from label-rich source domains to label-scarce target domains. As a more realistic extension, partial domain adaptation (PDA) relaxes the assumption of fully shared label space, and instead deals with the scenario where the target label space is a subset of the source label space. In this paper, we propose a Reinforced Adaptation Network (RAN) to address the challenging PDA problem. Specifically, a deep reinforcement learning model is proposed to learn source data selection policies. Meanwhile, a domain adaptation model is presented to simultaneously determine rewards and learn domain-invariant feature …


Dual-View Preference Learning For Adaptive Recommendation, Zhongzhou Liu, Yuan Fang, Min Wu Jan 2023

Dual-View Preference Learning For Adaptive Recommendation, Zhongzhou Liu, Yuan Fang, Min Wu

Research Collection School Of Computing and Information Systems

While recommendation systems have been widely deployed, most existing approaches only capture user preferences in the , i.e., the user's general interest across all kinds of items. However, in real-world scenarios, user preferences could vary with items of different natures, which we call the . Both views are crucial for fully personalized recommendation, where an underpinning macro-view governs a multitude of finer-grained preferences in the micro-view. To model the dual views, in this paper, we propose a novel model called Dual-View Adaptive Recommendation (DVAR). In DVAR, we formulate the micro-view based on item categories, and further integrate it with the …


Cross-Modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism, Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-Peng Lim, Steven C. H. Hoi Jan 2022

Cross-Modal Food Retrieval: Learning A Joint Embedding Of Food Images And Recipes With Semantic Consistency And Attention Mechanism, Hao Wang, Doyen Sahoo, Chenghao Liu, Ke Shu, Palakorn Achananuparp, Ee-Peng Lim, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Food retrieval is an important task to perform analysis of food-related information, where we are interested in retrieving relevant information about the queried food item such as ingredients, cooking instructions, etc. In this paper, we investigate cross-modal retrieval between food images and cooking recipes. The goal is to learn an embedding of images and recipes in a common feature space, such that the corresponding image-recipe embeddings lie close to one another. Two major challenges in addressing this problem are 1) large intra-variance and small inter-variance across cross-modal food data; and 2) difficulties in obtaining discriminative recipe representations. To address these …


Clinical Big Data And Deep Learning: Applications, Challenges, And Future Outlooks, Ying Yu, Liangliang Liu, Yaohang Li, Jianxin Wang Jan 2019

Clinical Big Data And Deep Learning: Applications, Challenges, And Future Outlooks, Ying Yu, Liangliang Liu, Yaohang Li, Jianxin Wang

Computer Science Faculty Publications

The explosion of digital healthcare data has led to a surge of data-driven medical research based on machine learning. In recent years, as a powerful technique for big data, deep learning has gained a central position in machine learning circles for its great advantages in feature representation and pattern recognition. This article presents a comprehensive overview of studies that employ deep learning methods to deal with clinical data. Firstly, based on the analysis of the characteristics of clinical data, various types of clinical data (e.g., medical images, clinical notes, lab results, vital signs and demographic informatics) are discussed and details …


Evaluation Criteria For Selecting Nosql Databases In A Single Box Environment, Ryan D. Engle, Brent T. Langhals, Michael R. Grimaila, Douglas D. Hodson Aug 2018

Evaluation Criteria For Selecting Nosql Databases In A Single Box Environment, Ryan D. Engle, Brent T. Langhals, Michael R. Grimaila, Douglas D. Hodson

Faculty Publications

In recent years, NoSQL database systems have become increasingly popular, especially for big data, commercial applications. These systems were designed to overcome the scaling and flexibility limitations plaguing traditional relational database management systems (RDBMSs). Given NoSQL database systems have been typically implemented in large-scale distributed environments serving large numbers of simultaneous users across potentially thousands of geographically separated devices, little consideration has been given to evaluating their value within single-box environments. It is postulated some of the inherent traits of each NoSQL database type may be useful, perhaps even preferable, regardless of scale. Thus, this paper proposes criteria conceived to …


Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan Jul 2018

Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. Existing literature models RSS as submodular maximization to capture the "diminishing returns" property of representativeness, but often only has a single constraint, which limits its applications to many real-world problems. To capture the recency issue and support various constraints, we formulate dynamic RSS as maximizing submodular functions subject to general d -knapsack constraints (SMDK) over sliding windows. We propose a KnapWindow framework (KW) for SMDK. KW utilizes KnapStream (KS) for SMDK in append-only streams as a subroutine. It maintains a sequence of checkpoints and …


Towards Practical Privacy-Preserving Analytics For Iot And Cloud Based Healthcare Systems, Sagar Sharma, Keke Chen, Amit P. Sheth Mar 2018

Towards Practical Privacy-Preserving Analytics For Iot And Cloud Based Healthcare Systems, Sagar Sharma, Keke Chen, Amit P. Sheth

Kno.e.sis Publications

Modern healthcare systems now rely on advanced computing methods and technologies, such as IoT devices and clouds, to collect and analyze personal health data at unprecedented scale and depth. Patients, doctors, healthcare providers, and researchers depend on analytical models derived from such data sources to remotely monitor patients, early-diagnose diseases, and find personalized treatments and medications. However, without appropriate privacy protection, conducting data analytics becomes a source of privacy nightmare. In this paper, we present the research challenges in developing practical privacy-preserving analytics in healthcare information systems. The study is based on kHealth - a personalized digital healthcare information system …


Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee Dec 2017

Btci: A New Framework For Identifying Congestion Cascades Using Bus Trajectory Data, Meng-Fen Chiang, Ee Peng Lim, Wang-Chien Lee, Agus Trisnajaya Kwee

Research Collection School Of Computing and Information Systems

The knowledge of traffic health status is essential to the general public and urban traffic management. To identify congestion cascades, an important phenomenon of traffic health, we propose a Bus Trajectory based Congestion Identification (BTCI) framework that explores the anomalous traffic health status and structure properties of congestion cascades using bus trajectory data. BTCI consists of two main steps, congested segment extraction and congestion cascades identification. The former constructs path speed models from historical vehicle transitions and design a non-parametric Kernel Density Estimation (KDE) function to derive a measure of congestion score. The latter aggregates congested segments (i.e., those with …


The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar Oct 2015

The Importance Of Being Isolated: An Empirical Study On Chromium Reviews, Subhajit Datta, Devarshi Bhatt, Manish Jain, Proshanta Sarkar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

As large scale software development has become more collaborative, and software teams more globally distributed, several studies have explored how developer interaction influences software development outcomes. The emphasis so far has been largely on outcomes like defect count, the time to close modification requests etc. In the paper, we examine data from the Chromium project to understand how different aspects of developer discussion relate to the closure time of reviews. On the basis of analyzing reviews discussed by 2000+ developers, our results indicate that quicker closure of reviews owned by a developer relates to higher reception of information and insights …


Modeling Temporal Adoptions Using Dynamic Matrix Factorization, Freddy Chong-Tat Chua, Richard Jayadi Oentaryo, Ee Peng Lim Dec 2013

Modeling Temporal Adoptions Using Dynamic Matrix Factorization, Freddy Chong-Tat Chua, Richard Jayadi Oentaryo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The problem of recommending items to users is relevant to many applications and the problem has often been solved using methods developed from Collaborative Filtering (CF). Collaborative Filtering model-based methods such as Matrix Factorization have been shown to produce good results for static rating-type data, but have not been applied to time-stamped item adoption data. In this paper, we adopted a Dynamic Matrix Factorization (DMF) technique to derive different temporal factorization models that can predict missing adoptions at different time steps in the users' adoption history. This DMF technique is an extension of the Non-negative Matrix Factorization (NMF) based on …


Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi Oct 2013

Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous …


Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang Aug 2006

Bias And Controversy: Beyond The Statistical Deviation, Hady W. Lauw, Ee Peng Lim, Ke Wang

Research Collection School Of Computing and Information Systems

In this paper, we investigate how deviation in evaluation activities may reveal bias on the part of reviewers and controversy on the part of evaluated objects. We focus on a 'data-centric approach' where the evaluation data is assumed to represent the ground truth'. The standard statistical approaches take evaluation and deviation at face value. We argue that attention should be paid to the subjectivity of evaluation, judging the evaluation score not just on 'what is being said' (deviation), but also on 'who says it' (reviewer) as well as on 'whom it is said about' (object). Furthermore, we observe that bias …