Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms

2018

Institution
Keyword
Publication
Publication Type

Articles 1 - 23 of 23

Full-Text Articles in Databases and Information Systems

Unsupervised User Identity Linkage Via Factoid Embedding, Wei Xie, Xin Mu, Roy Ka Wei Lee, Feida Zhu, Ee-Peng Lim Nov 2018

Unsupervised User Identity Linkage Via Factoid Embedding, Wei Xie, Xin Mu, Roy Ka Wei Lee, Feida Zhu, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

User identity linkage (UIL), the problem of matching user account across multiple online social networks (OSNs), is widely studied and important to many real-world applications. Most existing UIL solutions adopt a supervised or semisupervised approach which generally suffer from scarcity of labeled data. In this paper, we propose Factoid Embedding, a novel framework that adopts an unsupervised approach. It is designed to cope with different profile attributes, content types and network links of different OSNs. The key idea is that each piece of information about a user identity describes the real identity owner, and thus distinguishes the owner from other …


Influence Maximization On Social Graphs: A Survey, Yuchen Li, Ju Fan, Yanhao Wang, Kian-Lee Tan Oct 2018

Influence Maximization On Social Graphs: A Survey, Yuchen Li, Ju Fan, Yanhao Wang, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Influence Maximization (IM), which selects a set of k users (called seed set) from a social network to maximize the expected number of influenced users (called influence spread), is a key algorithmic problem in social influence analysis. Due to its immense application potential and enormous technical challenges, IM has been extensively studied in the past decade. In this paper, we survey and synthesize a wide spectrum of existing studies on IM from an algorithmic perspective, with a special focus on the following key aspects (1) a review of well-accepted diffusion models that capture information diffusion process and build the foundation …


An Outlier Detection Algorithm Based On Cross-Correlation Analysis For Time Series Dataset, Hui Lu, Yaxian Liu, Zongming Fei, Chongchong Guan Sep 2018

An Outlier Detection Algorithm Based On Cross-Correlation Analysis For Time Series Dataset, Hui Lu, Yaxian Liu, Zongming Fei, Chongchong Guan

Computer Science Faculty Publications

Outlier detection is a very essential problem in a variety of application areas. Many detection methods are deficient for high-dimensional time series data sets containing both isolated and assembled outliers. In this paper, we propose an Outlier Detection method based on Cross-correlation Analysis (ODCA). ODCA consists of three key parts. They are data preprocessing, outlier analysis, and outlier rank. First, we investigate a linear interpolation method to convert assembled outliers into isolated ones. Second, a detection mechanism based on the cross-correlation analysis is proposed for translating the high-dimensional data sets into 1-D cross-correlation function, according to which the isolated outlier …


Question-Guided Hybrid Convolution For Visual Question Answering, Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven C. H. Hoi, Xiaogang Wang Sep 2018

Question-Guided Hybrid Convolution For Visual Question Answering, Peng Gao, Pan Lu, Hongsheng Li, Shuang Li, Yikang Li, Steven C. H. Hoi, Xiaogang Wang

Research Collection School Of Computing and Information Systems

In this paper, we propose a novel Question-Guided Hybrid Convolution (QGHC) network for Visual Question Answering (VQA). Most state-of-the-art VQA methods fuse the high-level textual and visual features from the neural network and abandon the visual spatial information when learning multi-modal features.To address these problems, question-guided kernels generated from the input question are designed to convolute with visual features for capturing the textual and visual relationship in the early stage. The question-guided convolution can tightly couple the textual and visual information but also introduce more parameters when learning kernels. We apply the group convolution, which consists of question-independent kernels and …


Expected Length Of The Longest Chain In Linear Hashing, Pongthip Srivarangkul, Hemanta K. Maji Aug 2018

Expected Length Of The Longest Chain In Linear Hashing, Pongthip Srivarangkul, Hemanta K. Maji

The Summer Undergraduate Research Fellowship (SURF) Symposium

Hash table with chaining is a data structure that chains objects with identical hash values together with an entry or a memory address. It works by calculating a hash value from an input then placing the input in the hash table entry. When we place two inputs in the same entry, they chain together in a linear linked list. We are interested in the expected length of the longest chain in linear hashing and methods to reduce the length because the worst-case look-up time is directly proportional to it.

The linear hash function used to calculate hash value is defined …


Exact Processing Of Uncertain Top-K Queries In Multi-Criteria Settings, Kyriakos Mouratidis, Bo Tang Aug 2018

Exact Processing Of Uncertain Top-K Queries In Multi-Criteria Settings, Kyriakos Mouratidis, Bo Tang

Research Collection School Of Computing and Information Systems

Traditional rank-aware processing assumes a dataset that contains available options to cover a specific need (e.g., restaurants, hotels, etc) and users who browse that dataset via top-k queries with linear scoring functions, i.e., by ranking the options according to the weighted sum of their attributes, for a set of given weights. In practice, however, user preferences (weights) may only be estimated with bounded accuracy, or may be inherently uncertain due to the inability of a human user to specify exact weight values with absolute accuracy. Motivated by this, we introduce the uncertain top-k query (UTK). Given uncertain preferences, that is, …


Embedding Wordnet Knowledge For Textual Entailment, Yunshi Lan, Jing Jiang Aug 2018

Embedding Wordnet Knowledge For Textual Entailment, Yunshi Lan, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we study how we can improve a deep learning approach to textual entailment by incorporating lexical entailment relations from WordNet. Our idea is to embed the lexical entailment knowledge contained in WordNet in specially-learned word vectors, which we call “entailment vectors.” We present a standard neural network model and a novel set-theoretic model to learn these entailment vectors from word pairs with known lexical entailment relations derived from WordNet. We further incorporate these entailment vectors into a decomposable attention model for textual entailment and evaluate the model on the SICK and the SNLI dataset. We find that …


Machine Learning To Predict College Course Success, Anthony R.Y. Dalton, Justin Beer, Sriharshasai Kommanapalli, James S. Lanich Ph.D. Jul 2018

Machine Learning To Predict College Course Success, Anthony R.Y. Dalton, Justin Beer, Sriharshasai Kommanapalli, James S. Lanich Ph.D.

SMU Data Science Review

In this paper, we present an analysis of the predictive ability of machine learning on the success of students in college courses in a California Community College. The California Legislature passed assembly bill 705 in order to place students in non-remedial coursework, based on high school transcripts, to increase college completion. We utilize machine learning methods on de-identified student high school transcript data to create predictive algorithms on whether or not the student will be successful in college-level English and Mathematics coursework. To satisfy the bill’s requirements, we first use exploratory data analysis on applicable transcript variables. Then we use …


Distributed K-Nearest Neighbor Queries In Metric Spaces, Xin Ding, Yuanliang Zhang, Lu Chen, Yunjun Gao, Baihua Zheng Jul 2018

Distributed K-Nearest Neighbor Queries In Metric Spaces, Xin Ding, Yuanliang Zhang, Lu Chen, Yunjun Gao, Baihua Zheng

Research Collection School Of Computing and Information Systems

Metric k nearest neighbor (MkNN) queries have applications in many areas such as multimedia retrieval, computational biology, and location-based services. With the growing volumes of data, a distributed method is required. In this paper, we propose an Asynchronous Metric Distributed System (AMDS), which uniformly partitions the data with the pivot-mapping technique to ensure the load balancing, and employs publish/subscribe communication model to asynchronously process large scale of queries. The employment of asynchronous processing model also improves robustness and efficiency of AMDS. In addition, we develop an efficient estimation based MkNN method using AMDS to improve the query efficiency. Extensive experiments …


Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan Jul 2018

Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. Existing literature models RSS as submodular maximization to capture the "diminishing returns" property of representativeness, but often only has a single constraint, which limits its applications to many real-world problems. To capture the recency issue and support various constraints, we formulate dynamic RSS as maximizing submodular functions subject to general d -knapsack constraints (SMDK) over sliding windows. We propose a KnapWindow framework (KW) for SMDK. KW utilizes KnapStream (KS) for SMDK in append-only streams as a subroutine. It maintains a sequence of checkpoints and …


Online Active Learning With Expert Advice, Shuji Hao, Peiying Hu, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao Jul 2018

Online Active Learning With Expert Advice, Shuji Hao, Peiying Hu, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao

Research Collection School Of Computing and Information Systems

In literature, learning with expert advice methods usually assume that a learner always obtain the true label of every incoming training instance at the end of each trial. However, in many real-world applications, acquiring the true labels of all instances can be both costly and time consuming, especially for large-scale problems. For example, in the social media, data stream usually comes in a high speed and volume, and it is nearly impossible and highly costly to label all of the instances. In this article, we address this problem with active learning with expert advice, where the ground truth of an …


A Bayesian Latent Variable Model Of User Preferences With Item Context, Aghiles Salah, Hady W. Lauw Jul 2018

A Bayesian Latent Variable Model Of User Preferences With Item Context, Aghiles Salah, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Personalized recommendation has proven to be very promising in modeling the preference of users over items. However, most existing work in this context focuses primarily on modeling user-item interactions, which tend to be very sparse. We propose to further leverage the item-item relationships that may reflect various aspects of items that guide users’ choices. Intuitively, items that occur within the same “context” (e.g., browsed in the same session, purchased in the same basket) are likely related in some latent aspect. Therefore, accounting for the item’s context would complement the sparse user-item interactions by extending a user’s preference to other items …


The Effect Of Endgame Tablebases On Modern Chess Engines, Christopher D. Peterson Jun 2018

The Effect Of Endgame Tablebases On Modern Chess Engines, Christopher D. Peterson

Computer Engineering

Modern chess engines have the ability to augment their evaluation by using massive tables containing billions of positions and their memorized solutions. This report examines the importance of these tables to better understand the circumstances under which they should be used. The analysis conducted in this paper empirically examines differences in size and speed of memorized positions and their impacts on engine strength. Using this technique, situations where memorized tables improve play (and situations where they do not) are discovered.


Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao Apr 2018

Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao

Theses

The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by …


Modular Scheduling System For Westside School District, Tyler Bienhoff Apr 2018

Modular Scheduling System For Westside School District, Tyler Bienhoff

Honors Theses

Westside School district offers a modular scheduling system for their high school that is more similar to a college schedule than the typical high school system. Due to the complexity of their master schedule each semester, there are no commercially available products that can assist in creating a schedule. Hence, this thesis discusses a scheduling algorithm and management system that was built specifically for Westside High School with the potential to be expanded for use by other interested schools. The first part of the paper is focused on gathering input from students and faculty for which courses and how many …


User-Centric Privacy Preservation In Mobile And Location-Aware Applications, Mingming Guo Apr 2018

User-Centric Privacy Preservation In Mobile And Location-Aware Applications, Mingming Guo

FIU Electronic Theses and Dissertations

The mobile and wireless community has brought a significant growth of location-aware devices including smart phones, connected vehicles and IoT devices. The combination of location-aware sensing, data processing and wireless communication in these devices leads to the rapid development of mobile and location-aware applications. Meanwhile, user privacy is becoming an indispensable concern. These mobile and location-aware applications, which collect data from mobile sensors carried by users or vehicles, return valuable data collection services (e.g., health condition monitoring, traffic monitoring, and natural disaster forecasting) in real time. The sequential spatial-temporal data queries sent by users provide their location trajectory information. The …


Distributed Multi-Task Classification: A Decentralized Online Learning Approach, Chi Zhang, Peilin Zhao, Shuji Hao, Yeng Chai Soh, Bu Sung Lee, Chunyan Miao, Steven C. H. Hoi Apr 2018

Distributed Multi-Task Classification: A Decentralized Online Learning Approach, Chi Zhang, Peilin Zhao, Shuji Hao, Yeng Chai Soh, Bu Sung Lee, Chunyan Miao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Although dispersing one single task to distributed learning nodes has been intensively studied by the previous research, multi-task learning on distributed networks is still an area that has not been fully exploited, especially under decentralized settings. The challenge lies in the fact that different tasks may have different optimal learning weights while communication through the distributed network forces all tasks to converge to an unique classifier. In this paper, we present a novel algorithm to overcome this challenge and enable learning multiple tasks simultaneously on a decentralized distributed network. Specifically, the learning framework can be separated into two phases: (i) …


A Sliding-Window Framework For Representative Subset Selection, Yanhao Wang, Yuchen Li, Kian-Lee Tan Apr 2018

A Sliding-Window Framework For Representative Subset Selection, Yanhao Wang, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. A common approach is to model RSS as the submodular maximization problem because the utility of extracted representatives often satisfies the "diminishing returns" property. To capture the data recency issue and support different types of constraints in real-world problems, we formulate RSS as maximizing a submodular function subject to a d-knapsack constraint (SMDK) over sliding windows. Then, we propose a novel KnapWindow framework for SMDK. Theoretically, KnapWindow is 1-ε/1+d - approximate for SMDK and achieves sublinear complexity. Finally, we evaluate the efficiency and effectiveness …


Cora: Commingled Remains And Analytics – An Open Community Ecosystem, Nicole Mcelroy, Ryan Ernst Mar 2018

Cora: Commingled Remains And Analytics – An Open Community Ecosystem, Nicole Mcelroy, Ryan Ernst

UNO Student Research and Creative Activity Fair

Anthropologists at organizations such as the DPAA (Defense POW/MIA Accounting Agency) have the tough job of sorting through commingled remains of fallen soldiers. Under the direction of Professor Pawaskar at the College of IS&T, Ryan Ernst and I are currently developing a web application for the DPAA that will help them inventory the bones and record all the appropriate associations. After the inventory web application is built we will begin the analysis process using graph theory and other mathematical algorithms. This will ultimately help organizations like the DPAA get closer to the end goal of identifying fallen soldiers from commingled …


Sparse Passive-Aggressive Learning For Bounded Online Kernel Methods, Jing Lu, Doyen Sahoo, Peilin Zhao, Steven C. H. Hoi Feb 2018

Sparse Passive-Aggressive Learning For Bounded Online Kernel Methods, Jing Lu, Doyen Sahoo, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

One critical deficiency of traditional online kernel learning methods is their unbounded and growing number of support vectors in the online learning process, making them inefficient and non-scalable for large-scale applications. Recent studies on scalable online kernel learning have attempted to overcome this shortcoming, e.g., by imposing a constant budget on the number of support vectors. Although they attempt to bound the number of support vectors at each online learning iteration, most of them fail to bound the number of support vectors for the final output hypothesis, which is often obtained by averaging the series of hypotheses over all the …


R3: Reinforced Ranker-Reader For Open-Domain Question Answering, Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang Feb 2018

R3: Reinforced Ranker-Reader For Open-Domain Question Answering, Shuohang Wang, Mo Yu, Xiaoxiao Guo, Zhiguo Wang, Tim Klinger, Wei Zhang, Shiyu Chang, Gerald Tesauro, Bowen Zhou, Jing Jiang

Research Collection School Of Computing and Information Systems

In recent years researchers have achieved considerable success applyingneural network methods to question answering (QA). These approaches haveachieved state of the art results in simplified closed-domain settings such asthe SQuAD (Rajpurkar et al., 2016) dataset, which provides a pre-selectedpassage, from which the answer to a given question may be extracted. Morerecently, researchers have begun to tackle open-domain QA, in which the modelis given a question and access to a large corpus (e.g., wikipedia) instead of apre-selected passage (Chen et al., 2017a). This setting is more complex as itrequires large-scale search for relevant passages by an information retrievalcomponent, combined with a …


Evaluating A Cluster Of Low-Power Arm64 Single-Board Computers With Mapreduce, Daniel Mcdermott Jan 2018

Evaluating A Cluster Of Low-Power Arm64 Single-Board Computers With Mapreduce, Daniel Mcdermott

EWU Masters Thesis Collection

With the meteoric rise of enormous data collection in science, industry, and the cloud, methods for processing massive datasets have become more crucial than ever. MapReduce is a restricted programing model for expressing parallel computations as simple serial functions, and an execution framework for distributing those computations over large datasets residing on clusters of commodity hardware. MapReduce abstracts away the challenging low-level synchronization and scalability details which parallel and distributed computing often necessitate, reducing the concept burden on programmers and scientists who require data processing at-scale. Typically, MapReduce clusters are implemented using inexpensive commodity hardware, emphasizing quantity over quality due …


Code: Coherence Based Decision Boundaries For Feature Correspondence, Wen-Yan Lin, Fan Wang, Ming-Ming Cheng, Sai-Kit Yeung, Philip H. S. Torr, Jiangbo Lu Jan 2018

Code: Coherence Based Decision Boundaries For Feature Correspondence, Wen-Yan Lin, Fan Wang, Ming-Ming Cheng, Sai-Kit Yeung, Philip H. S. Torr, Jiangbo Lu

Research Collection School Of Computing and Information Systems

A key challenge in feature correspondence is the difficulty in differentiating true and false matches at a local descriptor level. This forces adoption of strict similarity thresholds that discard many true matches. However, if analyzed at a global level, false matches are usually randomly scattered while true matches tend to be coherent (clustered around a few dominant motions), thus creating a coherence based separability constraint. This paper proposes a non-linear regression technique that can discover such a coherence based separability constraint from highly noisy matches and embed it into a correspondence likelihood model. Once computed, the model can filter the …