Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
Articles 1 - 5 of 5
Full-Text Articles in Physical Sciences and Mathematics
Context-Aware Statistical Debugging: From Bug Predictors To Faulty Control Flow Paths, Lingxiao Jiang, Zhendong Su
Context-Aware Statistical Debugging: From Bug Predictors To Faulty Control Flow Paths, Lingxiao Jiang, Zhendong Su
Research Collection School Of Computing and Information Systems
Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths …
A Data-Dependent Distance Measure For Transductive Instance-Based Learning, Jared Lundell, Dan A. Ventura
A Data-Dependent Distance Measure For Transductive Instance-Based Learning, Jared Lundell, Dan A. Ventura
Faculty Publications
We consider learning in a transductive setting using instance-based learning (k-NN) and present a method for constructing a data-dependent distance “metric” using both labeled training data as well as available unlabeled data (that is to be classified by the model). This new data-driven measure of distance is empirically studied in the context of various instance-based models and is shown to reduce error (compared to traditional models) under certain learning conditions. Generalizations and improvements are suggested.
Predicting Coronary Artery Disease With Medical Profile And Gene Polymorphisms Data, Qiongyu Chen, Guoliang Li, Tze-Yun Leong, Chew-Kiat Heng
Predicting Coronary Artery Disease With Medical Profile And Gene Polymorphisms Data, Qiongyu Chen, Guoliang Li, Tze-Yun Leong, Chew-Kiat Heng
Research Collection School Of Computing and Information Systems
Coronary artery disease (CAD) is a main cause of death in the world. Finding cost-effective methods to predict CAD is a major challenge in public health. In this paper, we investigate the combined effects of genetic polymorphisms and non-genetic factors on predicting the risk of CAD by applying well known classification methods, such as Bayesian networks, naïve Bayes, support vector machine, k-nearest neighbor, neural networks and decision trees. Our experiments show that all these classifiers are comparable in terms of accuracy, while Bayesian networks have the additional advantage of being able to provide insights into the relationships among the variables. …
Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, George Busby, Marc Carmen, James Carroll, Robbie Haertel, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi
Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, George Busby, Marc Carmen, James Carroll, Robbie Haertel, Deryle W. Lonsdale, Peter Mcclanahan, Eric K. Ringger, Kevin Seppi
Faculty Publications
In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and …
Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan
Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan
Research Collection School Of Computing and Information Systems
In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naive Bayes. We introduce a new accurate feature selector with linear time complexity. …