Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

2021

Institution
Keyword
Publication
Publication Type

Articles 31 - 43 of 43

Full-Text Articles in Artificial Intelligence and Robotics

Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen Apr 2021

Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen

Research Collection School Of Computing and Information Systems

Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier …


Time Period-Based Top-K Semantic Trajectory Pattern Query, Munkh-Erdene Yadamjav, Farhana Murtaza Choudhury, Zhifeng Bao, Baihua Zheng Apr 2021

Time Period-Based Top-K Semantic Trajectory Pattern Query, Munkh-Erdene Yadamjav, Farhana Murtaza Choudhury, Zhifeng Bao, Baihua Zheng

Research Collection School Of Computing and Information Systems

The sequences of user check-ins form semantic trajectories that represent the movement of users through time, along with the types of POIs visited. Extracting patterns in semantic trajectories can be widely used in applications such as route planning and trip recommendation. Existing studies focus on the entire time duration of the data, which may miss some temporally significant patterns. In addition, they require thresholds to define the interestingness of the patterns. Motivated by the above, we study a new problem of finding top-k semantic trajectory patterns w.r.t. a given time period and categories by considering the spatial closeness of POIs. …


Is The Ground Truth Really Accurate? Dataset Purification For Automated Program Repair, Deheng Yang, Yan Lei, Xiaoguang Mao, David Lo, Huan Xie, Meng Yan Mar 2021

Is The Ground Truth Really Accurate? Dataset Purification For Automated Program Repair, Deheng Yang, Yan Lei, Xiaoguang Mao, David Lo, Huan Xie, Meng Yan

Research Collection School Of Computing and Information Systems

Datasets of real-world bugs shipped with human-written patches are intensively used in the evaluation of existing automated program repair (APR) techniques, wherein the human-written patches always serve as the ground truth, for manual or automated assessment approaches, to evaluate the correctness of test-suite adequate patches. An inaccurate human-written patch tangled with other code changes will pose threats to the reliability of the assessment results. Therefore, the construction of such datasets always requires much manual effort on isolating real bug fixes from bug fixing commits. However, the manual work is time-consuming and prone to mistakes, and little has been known on …


How Do Users Answer Matlab Questions On Q&A Sites? A Case Study On Stack Overflow And Mathworks, Mahshid Naghashzadeh, Amir Hagshenas, Ashkan Sami, David Lo Mar 2021

How Do Users Answer Matlab Questions On Q&A Sites? A Case Study On Stack Overflow And Mathworks, Mahshid Naghashzadeh, Amir Hagshenas, Ashkan Sami, David Lo

Research Collection School Of Computing and Information Systems

MATLAB is an engineering programming language with various toolboxes that has a dedicated Question and Answer (Q&A) platform on the MathWorks website, which is similar to Stack Overflow (SO). Moreover, some MATLAB users ask their questions on SO. This paper aims to compare these two Q&A platforms to see what kind of questions are asked and how developers answer these questions in each platform. The result of our analysis on 80,382 MATLAB questions on SO and 266,367 questions on MathWorks show that MATLAB questions on topics ranging from the MATLAB software installation to questions related to programming received high votes …


Learning To Assess The Quality Of Stroke Rehabilitation Exercises, Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, Sergi Bermúdez I Badia Mar 2021

Learning To Assess The Quality Of Stroke Rehabilitation Exercises, Min Hun Lee, Daniel P. Siewiorek, Asim Smailagic, Alexandre Bernardino, Sergi Bermúdez I Badia

Research Collection School Of Computing and Information Systems

Due to the limited number of therapists, task-oriented exercises are often prescribed for post-stroke survivors as in-home rehabilitation. During in-home rehabilitation, a patient may become unmotivated or confused to comply prescriptions without the feedback of a therapist. To address this challenge, this paper proposes an automated method that can achieve not only qualitative, but also quantitative assessment of stroke rehabilitation exercises. Specifically, we explored a threshold model that utilizes the outputs of binary classifiers to quantify the correctness of a movements into a performance score. We collected movements of 11 healthy subjects and 15 post-stroke survivors using a Kinect sensor …


A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri Feb 2021

A New Feature Selection Method Based On Class Association Rule, Sami A. Al-Dhaheri

Dissertations, Theses, and Capstone Projects

Feature selection is a key process for supervised learning algorithms. It involves discarding irrelevant attributes from the training dataset from which the models are derived. One of the vital feature selection approaches is Filtering, which often uses mathematical models to compute the relevance for each feature in the training dataset and then sorts the features into descending order based on their computed scores. However, most Filtering methods face several challenges including, but not limited to, merely considering feature-class correlation when defining a feature’s relevance; additionally, not recommending which subset of features to retain. Leaving this decision to the end-user may …


Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning Via Importance Sampling, Yugang Ji, Mingyang Yin, Hongxia Yang, Jingren Zhou, Vincent W. Zheng, Chuan Shi, Yuan Fang Feb 2021

Accelerating Large-Scale Heterogeneous Interaction Graph Embedding Learning Via Importance Sampling, Yugang Ji, Mingyang Yin, Hongxia Yang, Jingren Zhou, Vincent W. Zheng, Chuan Shi, Yuan Fang

Research Collection School Of Computing and Information Systems

In real-world problems, heterogeneous entities are often related to each other through multiple interactions, forming a Heterogeneous Interaction Graph (HIG in short). While modeling HIGs to deal with fundamental tasks, graph neural networks present an attractive opportunity that can make full use of the heterogeneity and rich semantic information by aggregating and propagating information from different types of neighborhoods. However, learning on such complex graphs, often with millions or billions of nodes, edges, and various attributes, could suffer from expensive time cost and high memory consumption. In this paper, we attempt to accelerate representation learning on large-scale HIGs by adopting …


Relative And Absolute Location Embedding For Few-Shot Node Classification On Graph, Zemin Liu, Yuan Fang, Chenghao Liu, Steven C. H. Hoi Feb 2021

Relative And Absolute Location Embedding For Few-Shot Node Classification On Graph, Zemin Liu, Yuan Fang, Chenghao Liu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Node classification is an important problem on graphs. While recent advances in graph neural networks achieve promising performance, they require abundant labeled nodes for training. However, in many practical scenarios there often exist novel classes in which only one or a few labeled nodes are available as supervision, known as few-shot node classification. Although meta-learning has been widely used in vision and language domains to address few-shot learning, its adoption on graphs has been limited. In particular, graph nodes in a few-shot task are not independent and relate to each other. To deal with this, we propose a novel model …


The Introduction Of Big Data In Cloud Computing, Austin Gruenberg Jan 2021

The Introduction Of Big Data In Cloud Computing, Austin Gruenberg

Student Academic Conference

One of the fastest-growing technologies that many people are unaware of is the world of cloud computing. Having started in 2006, it is a relatively new technological advancement in the computer industry. The major branch of cloud computing that I decided to focus on was big data. I decided to research this topic to better understand what its current uses are, to see what the future holds for Big Data and cloud computing and because it is a growing, significant piece of technology being used in our society today. Big data and cloud computing are very important industries and have …


Single And Differential Morph Attack Detection, Baaria Chaudhary Jan 2021

Single And Differential Morph Attack Detection, Baaria Chaudhary

Graduate Theses, Dissertations, and Problem Reports

Face recognition systems operate on the assumption that a person's face serves as the unique link to their identity. In this thesis, we explore the problem of morph attacks, which have become a viable threat to face verification scenarios precisely because of their inherent ability to break this unique link. A morph attack occurs when two people who share similar facial features morph their faces together such that the resulting face image is recognized as either of two contributing individuals. Morphs inherit enough visual features from both individuals that both humans and automatic algorithms confuse them. The contributions of this …


"Who Can Help Me?'': Knowledge Infused Matching Of Support Seekers And Support Providers During Covid-19 On Reddit, Manas Gaur, Kaushik Roy, Aditya Sharma, Biplav Srivastava, Amit Sheth Jan 2021

"Who Can Help Me?'': Knowledge Infused Matching Of Support Seekers And Support Providers During Covid-19 On Reddit, Manas Gaur, Kaushik Roy, Aditya Sharma, Biplav Srivastava, Amit Sheth

Publications

During the ongoing COVID-19 crisis, subreddits on Reddit, such as r/Coronavirus saw a rapid growth in user's requests for help (support seekers - SSs) including individuals with varying professions and experiences with diverse perspectives on care (support providers - SPs). Currently, knowledgeable human moderators match an SS with a user with relevant experience, i.e, an SP on these subreddits. This unscalable process defers timely care. We present a medical knowledge-infused approach to efficient matching of SS and SPs validated by experts for the users affected by anxiety and depression, in the context of with COVID-19. After matching, each SP to …


A Continual Deepfake Detection Benchmark: Dataset, Methods, And Essentials, Chuqiao Li, Zhiwu Huang, Danda Pani Paudel, Yabin Wang, Mohamad Shahbazi, Xiaopeng Hong, Van Gool Luc Jan 2021

A Continual Deepfake Detection Benchmark: Dataset, Methods, And Essentials, Chuqiao Li, Zhiwu Huang, Danda Pani Paudel, Yabin Wang, Mohamad Shahbazi, Xiaopeng Hong, Van Gool Luc

Research Collection School Of Computing and Information Systems

There have been emerging a number of benchmarks and techniques for the detection of deepfakes. However, very few works study the detection of incrementally appearing deepfakes in the real-world scenarios. To simulate the wild scenes, this paper suggests a continual deepfake detection benchmark (CDDB) over a new collection of deepfakes from both known and unknown generative models. The suggested CDDB designs multiple evaluations on the detection over easy, hard, and long sequence of deepfake tasks, with a set of appropriate measures. In addition, we exploit multiple approaches to adapt multiclass incremental learning methods, commonly used in the continual visual recognition, …


The Value Of Humanization In Customer Service, Yang Gao, Huaxia Rui, Shujing Sun Jan 2021

The Value Of Humanization In Customer Service, Yang Gao, Huaxia Rui, Shujing Sun

Research Collection School Of Computing and Information Systems

As algorithm-based agents become increasingly capable of handling customer service queries, customers are often uncertain whether they are served by humans or algorithms, and managers are left to question the value of human agents once the technology matures. The current paper studies this question by quantifying the impact of customers' enhanced perception of being served by human agents on customer service interactions. Our identification strategy hinges on the abrupt implementation by Southwest Airlines of a signature policy, which requires the inclusion of an agent's first name in responses on Twitter, thereby making the agent more humanized in the eyes of …