Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Theory and Algorithms

2021

Institution
Keyword
Publication
Publication Type

Articles 1 - 22 of 22

Full-Text Articles in Databases and Information Systems

Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani Dec 2021

Fair And Diverse Group Formation Based On Multidimensional Features, Mohammed Saad A Alqahtani

Graduate Theses and Dissertations

The goal of group formation is to build a team to accomplish a specific task. Algorithms are being developed to improve the team's effectiveness so formed and the efficiency of the group selection process. However, there is concern that team formation algorithms could be biased against minorities due to the algorithms themselves or the data on which they are trained. Hence, it is essential to build fair team formation systems that incorporate demographic information into the process of building the group. Although there has been extensive work on modeling individuals’ expertise for expert recommendation and/or team formation, there has been …


Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng Dec 2021

Context-Aware Graph Convolutional Network For Dynamic Origin-Destination Prediction, Juan Nathaniel, Baihua Zheng

Research Collection School Of Computing and Information Systems

A robust Origin-Destination (OD) prediction is key to urban mobility. A good forecasting model can reduce operational risks and improve service availability, among many other upsides. Here, we examine the use of Graph Convolutional Net-work (GCN) and its hybrid Markov-Chain (GCN-MC) variant to perform a context-aware OD prediction based on a large-scale public transportation dataset in Singapore. Compared with the baseline Markov-Chain algorithm and GCN, the proposed hybrid GCN-MC model improves the prediction accuracy by 37% and 12% respectively. Lastly, the addition of temporal and historical contextual information further improves the performance of the proposed hybrid model by 4 –12%.


Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori Nov 2021

Transfer-Learned Pruned Deep Convolutional Neural Networks For Efficient Plant Classification In Resource-Constrained Environments, Martinson Ofori

Masters Theses & Doctoral Dissertations

Traditional means of on-farm weed control mostly rely on manual labor. This process is time-consuming, costly, and contributes to major yield losses. Further, the conventional application of chemical weed control can be economically and environmentally inefficient. Site-specific weed management (SSWM) counteracts this by reducing the amount of chemical application with localized spraying of weed species. To solve this using computer vision, precision agriculture researchers have used remote sensing weed maps, but this has been largely ineffective for early season weed control due to problems such as solar reflectance and cloud cover in satellite imagery. With the current advances in artificial …


Information Extraction And Classification On Journal Papers, Lei Yu Nov 2021

Information Extraction And Classification On Journal Papers, Lei Yu

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

The importance of journals for diffusing the results of scientific research has increased considerably. In the digital era, Portable Document Format (PDF) became the established format of electronic journal articles. This structured form, combined with a regular and wide dissemination, spread scientific advancements easily and quickly. However, the rapidly increasing numbers of published scientific articles requires more time and effort on systematic literature reviews, searches and screens. The comprehension and extraction of useful information from the digital documents is also a challenging task, due to the complex structure of PDF.

To help a soil science team from the United States …


Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin Nov 2021

Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin

Research Collection School Of Computing and Information Systems

To improve the classification performance of support vector machines (SVMs) on imbalanced datasets, cost-sensitive learning methods have been proposed, e.g., DEC (Different Error Costs) and FSVM-CIL (Fuzzy SVM for Class Imbalance Learning). They relocate the hyperplane by adjusting the costs associated with misclassifying samples. However, the error costs are determined either empirically or by performing an exhaustive search in the parameter space. Both strategies can not guarantee effectiveness and efficiency simultaneously. In this paper, we propose ATEC, a solution that can efficiently find a preferable hyperplane by automatically tuning the error cost for between-class samples. ATEC distinguishes itself from all …


On A Multistage Discrete Stochastic Optimization Problem With Stochastic Constraints And Nested Sampling, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer Nov 2021

On A Multistage Discrete Stochastic Optimization Problem With Stochastic Constraints And Nested Sampling, Thuy Anh Ta, Tien Mai, Fabian Bastin, Pierre L'Ecuyer

Research Collection School Of Computing and Information Systems

We consider a multistage stochastic discrete program in which constraints on any stage might involve expectations that cannot be computed easily and are approximated by simulation. We study a sample average approximation (SAA) approach that uses nested sampling, in which at each stage, a number of scenarios are examined and a number of simulation replications are performed for each scenario to estimate the next-stage constraints. This approach provides an approximate solution to the multistage problem. To establish the consistency of the SAA approach, we first consider a two-stage problem and show that in the second-stage problem, given a scenario, the …


Quantum Computing For Supply Chain Finance, Paul R. Griffin, Ritesh Sampat Sep 2021

Quantum Computing For Supply Chain Finance, Paul R. Griffin, Ritesh Sampat

Research Collection School Of Computing and Information Systems

Applying quantum computing to real world applications to assess the potential efficacy is a daunting task for non-quantum specialists. This paper shows an implementation of two quantum optimization algorithms applied to portfolios of trade finance portfolios and compares the selections to those chosen by experienced underwriters and a classical optimizer. The method used is to map the financial risk and returns for a trade finance portfolio to an optimization function of a quantum algorithm developed in a Qiskit tutorial. The results show that whilst there is no advantage seen by using the quantum algorithms, the performance of the quantum algorithms …


Unified And Incremental Simrank: Index-Free Approximation With Scheduled Principle, Fanwei Zhu, Yuan Fang, Kai Zhang, Kevin C.-C. Chang, Hongtai Cao, Zhen Jiang, Minghui Wu Sep 2021

Unified And Incremental Simrank: Index-Free Approximation With Scheduled Principle, Fanwei Zhu, Yuan Fang, Kai Zhang, Kevin C.-C. Chang, Hongtai Cao, Zhen Jiang, Minghui Wu

Research Collection School Of Computing and Information Systems

SimRank is a popular link-based similarity measure on graphs. It enables a variety of applications with different modes of querying (e.g., single-pair, single-source and all-pair modes). In this paper, we propose UISim, a unified and incremental framework for all SimRank modes based on a scheduled approximation principle. UISim processes queries with incremental and prioritized exploration of the entire computation space, and thus allows flexible tradeoff of time and accuracy. On the other hand, it creates and shares common “building blocks” for online computation without relying on indexes, and thus is efficient to handle both static and dynamic graphs. Our experiments …


Multilateration Index., Chip Lynch Aug 2021

Multilateration Index., Chip Lynch

Electronic Theses and Dissertations

We present an alternative method for pre-processing and storing point data, particularly for Geospatial points, by storing multilateration distances to fixed points rather than coordinates such as Latitude and Longitude. We explore the use of this data to improve query performance for some distance related queries such as nearest neighbor and query-within-radius (i.e. “find all points in a set P within distance d of query point q”). Further, we discuss the problem of “Network Adequacy” common to medical and communications businesses, to analyze questions such as “are at least 90% of patients living within 50 miles of a covered emergency …


Design And Development Of Techniques To Ensure Integrity In Fog Computing Based Databases, Abdulwahab Fahad S. Alazeb Jul 2021

Design And Development Of Techniques To Ensure Integrity In Fog Computing Based Databases, Abdulwahab Fahad S. Alazeb

Graduate Theses and Dissertations

The advancement of information technology in coming years will bring significant changes to the way sensitive data is processed. But the volume of generated data is rapidly growing worldwide. Technologies such as cloud computing, fog computing, and the Internet of things (IoT) will offer business service providers and consumers opportunities to obtain effective and efficient services as well as enhance their experiences and services; increased availability and higher-quality services via real-time data processing augment the potential for technology to add value to everyday experiences. This improves human life quality and easiness. As promising as these technological innovations, they are prone …


Promoting Diversity In Academic Research Communities Through Multivariate Expert Recommendation, Omar Salman Jul 2021

Promoting Diversity In Academic Research Communities Through Multivariate Expert Recommendation, Omar Salman

Graduate Theses and Dissertations

Expert recommendation is the process of identifying individuals who have the appropriate knowledge and skills to achieve a specific task. It has been widely used in the educational environment mainly in the hiring process, paper-reviewer assignment, and assembling conference program committees. In this research, we highlight the problem of diversity and fair representation of underrepresented groups in expertise recommendation, factors that current expertise recommendation systems rarely consider. We introduce a novel way to model experts in academia by considering demographic attributes in addition to skills. We use the h-index score to quantify skills for a researcher and we identify five …


Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


Self-Adaptive Graph Traversal On Gpus, Mo Sha, Yuchen Li, Kian-Lee Tan Jun 2021

Self-Adaptive Graph Traversal On Gpus, Mo Sha, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

GPU’s massive computing power offers unprecedented opportunities to enable large graph analysis. Existing studies proposed various preprocessing approaches that convert the input graphs into dedicated structures for GPU-based optimizations. However, these dedicated approaches incur significant preprocessing costs as well as weak programmability to build general graph applications. In this paper, we introduce SAGE, a self-adaptive graph traversal on GPUs, which is free from preprocessing and operates on ubiquitous graph representations directly. We propose Tiled Partitioning and Resident Tile Stealing to fully exploit the computing power of GPUs in a runtime and self-adaptive manner. We also propose Sampling-based Reordering to further …


Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos May 2021

Machine Learning Approaches To Dribble Hand-Off Action Classification With Sportvu Nba Player Coordinate Data, Dembe Stephanos

Electronic Theses and Dissertations

Recently, strategies of National Basketball Association teams have evolved with the skillsets of players and the emergence of advanced analytics. One of the most effective actions in dynamic offensive strategies in basketball is the dribble hand-off (DHO). This thesis proposes an architecture for a classification pipeline for detecting DHOs in an accurate and automated manner. This pipeline consists of a combination of player tracking data and event labels, a rule set to identify candidate actions, manually reviewing game recordings to label the candidates, and embedding player trajectories into hexbin cell paths before passing the completed training set to the classification …


A Deep Analysis And Algorithmic Approach To Solving Complex Fitness Issues In Collegiate Student Athletes, Holly N. Puckett Apr 2021

A Deep Analysis And Algorithmic Approach To Solving Complex Fitness Issues In Collegiate Student Athletes, Holly N. Puckett

Honors College Theses

Sports are not simply an entertainment source. For many, it creates a sense of community, support, and trust among both fans and athletes alike. In order to continue the sense of community sports provides, athletes must be properly cared for in order to perform at the highest level possible. Thus, their fitness and health must be monitored continuously. In a professional sense, one can expect individualized attention to athletes daily due to an abundance of funding and resources. However, when looking at college communities and student athletes within them, the number of athletes per athletic trainer increases due to both …


Sql Injection & Web Application Security: A Python-Based Network Traffic Detection Model, Nyki Anderson Apr 2021

Sql Injection & Web Application Security: A Python-Based Network Traffic Detection Model, Nyki Anderson

Cybersecurity Undergraduate Research Showcase

The Internet of Things (IoT) presents a great many challenges in cybersecurity as the world grows more and more digitally dependent. Personally identifiable information (PII) (i,e., names, addresses, emails, credit card numbers) is stored in databases across websites the world over. The greatest threat to privacy, according to the Open Worldwide Application Security Project (OWASP) is SQL injection attacks (SQLIA) [1]. In these sorts of attacks, hackers use malicious statements entered into forms, search bars, and other browser input mediums to trick the web application server into divulging database assets. A proposed technique against such exploitation is convolution neural network …


A Fully Dynamic Algorithm For K-Regret Minimizing Sets, Yanhao Wang, Yuchen Li, Raymond Chi-Wing Wong, Kian-Lee Tan Apr 2021

A Fully Dynamic Algorithm For K-Regret Minimizing Sets, Yanhao Wang, Yuchen Li, Raymond Chi-Wing Wong, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Selecting a small set of representatives from a large database is important in many applications such as multi-criteria decision making, web search, and recommendation. The k-regret minimizing set (k-RMS) problem was recently proposed for representative tuple discovery. Specifically, for a large database P of tuples with multiple numerical attributes, the k-RMS problem returns a size-r subset Q of P such that, for any possible ranking function, the score of the top-ranked tuple in Q is not much worse than the score of the kth-ranked tuple in P. Although the k-RMS problem has been extensively studied in the literature, existing methods …


Improving Multi-Hop Knowledge Base Question Answering By Learning Intermediate Supervision Signals, Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, Ji Rong Wen Mar 2021

Improving Multi-Hop Knowledge Base Question Answering By Learning Intermediate Supervision Signals, Gaole He, Yunshi Lan, Jing Jiang, Wayne Xin Zhao, Ji Rong Wen

Research Collection School Of Computing and Information Systems

Multi-hop Knowledge Base Question Answering (KBQA) aims to find the answer entities that are multiple hops away in the Knowledge Base (KB) from the entities in the question. A major challenge is the lack of supervision signals at intermediate steps. Therefore, multi-hop KBQA algorithms can only receive the feedback from the final answer, which makes the learning unstable or ineffective. To address this challenge, we propose a novel teacher-student approach for the multi-hop KBQA task. In our approach, the student network aims to find the correct answer to the query, while the teacher network tries to learn intermediate supervision signals …


Unsupervised Data Mining Technique For Clustering Library In Indonesia, Robbi Rahim, Joseph Teguh Santoso, Sri Jumini, Gita Widi Bhawika, Daniel Susilo, Danny Wibowo Feb 2021

Unsupervised Data Mining Technique For Clustering Library In Indonesia, Robbi Rahim, Joseph Teguh Santoso, Sri Jumini, Gita Widi Bhawika, Daniel Susilo, Danny Wibowo

Library Philosophy and Practice (e-journal)

Organizing school libraries not only keeps library materials, but helps students and teachers in completing tasks in the teaching process so that national development goals are in order to improve community welfare by producing quality and competitive human resources. The purpose of this study is to analyze the Unsupervised Learning technique in conducting cluster mapping of the number of libraries at education levels in Indonesia. The data source was obtained from the Ministry of Education and Culture which was processed by the Central Statistics Agency (abbreviated as BPS) with url: bps.go.id/. The data consisted of 34 records where the attribute …


Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu Feb 2021

Visual Analysis Of Discrimination In Machine Learning, Qianwen Wang, Zhenghua Xu, Zhutian Chen, Yong Wang, Shixia Liu, Huamin Qu

Research Collection School Of Computing and Information Systems

The growing use of automated decision-making in critical applications, such as crime prediction and college admission, has raised questions about fairness in machine learning. How can we decide whether different treatments are reasonable or discriminatory? In this paper, we investigate discrimination in machine learning from a visual analytics perspective and propose an interactive visualization tool, DiscriLens, to support a more comprehensive analysis. To reveal detailed information on algorithmic discrimination, DiscriLens identifies a collection of potentially discriminatory itemsets based on causal modeling and classification rules mining. By combining an extended Euler diagram with a matrix-based visualization, we develop a novel set …


Quantifying The Impact Of Non-Stationarity In Reinforcement Learning-Based Traffic Signal Control, Lucas N. Alegre, Ana L.C. Bazzan, Bruno C. Da Silva Jan 2021

Quantifying The Impact Of Non-Stationarity In Reinforcement Learning-Based Traffic Signal Control, Lucas N. Alegre, Ana L.C. Bazzan, Bruno C. Da Silva

Computer Science Department Faculty Publication Series

In reinforcement learning (RL), dealing with non-stationarity is a challenging issue. However, some domains such as traffic optimization are inherently non-stationary. Causes for and effects of this are manifold. In particular, when dealing with traffic signal controls, addressing non-stationarity is key since traffic conditions change over time and as a function of traffic control decisions taken in other parts of a network. In this paper we analyze the effects that different sources of non-stationarity have in a network of traffic signals, in which each signal is modeled as a learning agent. More precisely, we study both the effects of changing …


Deep Unsupervised Anomaly Detection, Tangqing Li, Zheng Wang, Siying Liu, Wen-Yan Lin Jan 2021

Deep Unsupervised Anomaly Detection, Tangqing Li, Zheng Wang, Siying Liu, Wen-Yan Lin

Research Collection School Of Computing and Information Systems

This paper proposes a novel method to detect anomalies in large datasets under a fully unsupervised setting. The key idea behind our algorithm is to learn the representation underlying normal data. To this end, we leverage the latest clustering technique suitable for handling high dimensional data. This hypothesis provides a reliable starting point for normal data selection. We train an autoencoder from the normal data subset, and iterate between hypothesizing normal candidate subset based on clustering and representation learning. The reconstruction error from the learned autoencoder serves as a scoring function to assess the normality of the data. Experimental results …