Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Physical Sciences and Mathematics

Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin Nov 2021

Expediting The Accuracy-Improving Process Of Svms For Class Imbalance Learning, Bin Cao, Yuqi Liu, Chenyu Hou, Jing Fan, Baihua Zheng, Jianwei Jin

Research Collection School Of Computing and Information Systems

To improve the classification performance of support vector machines (SVMs) on imbalanced datasets, cost-sensitive learning methods have been proposed, e.g., DEC (Different Error Costs) and FSVM-CIL (Fuzzy SVM for Class Imbalance Learning). They relocate the hyperplane by adjusting the costs associated with misclassifying samples. However, the error costs are determined either empirically or by performing an exhaustive search in the parameter space. Both strategies can not guarantee effectiveness and efficiency simultaneously. In this paper, we propose ATEC, a solution that can efficiently find a preferable hyperplane by automatically tuning the error cost for between-class samples. ATEC distinguishes itself from all …


Delineating Knowledge Domains In Scientific Domains In Scientific Literature Using Machine Learning (Ml), Abhay Maurya, Smarajit Paul Choudhury Mr., Kshitij Jaiswal Mr. Jan 2021

Delineating Knowledge Domains In Scientific Domains In Scientific Literature Using Machine Learning (Ml), Abhay Maurya, Smarajit Paul Choudhury Mr., Kshitij Jaiswal Mr.

Library Philosophy and Practice (e-journal)

The recent years have witnessed an upsurge in the number of published documents. Organizations are showing an increased interest in text classification for effective use of the information. Manual procedures for text classification can be fruitful for a handful of documents, but the same lack in credibility when the number of documents increases besides being laborious and time-consuming. Text mining techniques facilitate assigning text strings to categories rendering the process of classification fast, accurate, and hence reliable. This paper classifies chemistry documents using machine learning and statistical methods. The procedure of text classification has been described in chronological order like …


Co2vec: Embeddings Of Co-Ordered Networks Based On Mutual Reinforcement, Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Philips Kokoh Prasetyo Oct 2020

Co2vec: Embeddings Of Co-Ordered Networks Based On Mutual Reinforcement, Meng-Fen Chiang, Ee-Peng Lim, Wang-Chien Lee, Philips Kokoh Prasetyo

Research Collection School Of Computing and Information Systems

We study the problem of representation learning for multiple types of entities in a co-ordered network where order relations exist among entities of the same type, and association relations exist across entities of different types. The key challenge in learning co-ordered network embedding is to preserve order relations among entities of the same type while leveraging on the general consistency in order relations between different entity types. In this paper, we propose an embedding model, CO2Vec, that addresses this challenge using mutually reinforced order dependencies. Specifically, CO2Vec explores in-direct order dependencies as supplementary evidence to enhance order representation learning across …


Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan May 2020

Chaff From The Wheat: Characterizing And Determining Valid Bug Reports, Yuanrui Fan, Xin Xia, David Lo, Ahmed E. Hassan

Research Collection School Of Computing and Information Systems

Developers use bug reports to triage and fix bugs. When triaging a bug report, developers must decide whether the bug report is valid (i.e., a real bug). A large amount of bug reports are submitted every day, with many of them end up being invalid reports. Manually determining valid bug report is a difficult and tedious task. Thus, an approach that can automatically analyze the validity of a bug report and determine whether a report is valid can help developers prioritize their triaging tasks and avoid wasting time and effort on invalid bug reports. In this study, motivated by the …


Large Scale Kernel Methods For Online Auc Maximization, Yi Ding, Chenghao Liu, Peilin Zhao, Steven C. H. Hoi Nov 2017

Large Scale Kernel Methods For Online Auc Maximization, Yi Ding, Chenghao Liu, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Learning to optimize AUC performance for classifying label imbalanced data in online scenarios has been extensively studied in recent years. Most of the existing work has attempted to address the problem directly in the original feature space, which may not suitable for non-linearly separable datasets. To solve this issue, some kernel-based learning methods are proposed for non-linearly separable datasets. However, such kernel approaches have been shown to be inefficient and failed to scale well on large scale datasets in practice. Taking this cue, in this work, we explore the use of scalable kernel-based learning techniques as surrogates to existing approaches: …


Real-Time Prediction Of Length Of Stay Using Passive Wi-Fi Sensing, Truc Viet Le, Baoyang Song, Laura Wynter May 2017

Real-Time Prediction Of Length Of Stay Using Passive Wi-Fi Sensing, Truc Viet Le, Baoyang Song, Laura Wynter

Research Collection School Of Computing and Information Systems

The proliferation of wireless technologies in today's everyday life is one of the key drivers of the Internet of Things (IoT). In addition to being an enabler of connectivity, the vast penetration of wireless devices today gives rise to a secondary functionality as a means of tracking and localization of the devices themselves. Indeed, in order to discover and automatically connect to known Wi-Fi networks, mobile devices have to scan and broadcast the so-called probe requests on all available channels, which can be captured and analyzed in a non-intrusive manner. Thus, one of the key applications of this feature is …


Real-Time Prediction Of Length Of Stay Using Passive Wi-Fi Sensing, Truc Viet Le, Baoyang Song, Laura Wynter May 2017

Real-Time Prediction Of Length Of Stay Using Passive Wi-Fi Sensing, Truc Viet Le, Baoyang Song, Laura Wynter

Research Collection School Of Computing and Information Systems

The proliferation of wireless technologies in today's everyday life is one of the key drivers of the Internet of Things (IoT). In addition to being an enabler of connectivity, the vast penetration of wireless devices today gives rise to a secondary functionality as a means of tracking and localization of the devices themselves. Indeed, in order to discover and automatically connect to known Wi-Fi networks, mobile devices have to scan and broadcast the so-called probe requests on all available channels, which can be captured and analyzed in a non-intrusive manner. Thus, one of the key applications of this feature is …


Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will Jan 2014

Svmaud: Using Textual Information To Predict The Audience Level Of Written Works Using Support Vector Machines, Todd Will

Dissertations

Information retrieval systems should seek to match resources with the reading ability of the individual user; similarly, an author must choose vocabulary and sentence structures appropriate for his or her audience. Traditional readability formulas, including the popular Flesch-Kincaid Reading Age and the Dale-Chall Reading Ease Score, rely on numerical representations of text characteristics, including syllable counts and sentence lengths, to suggest audience level of resources. However, the author’s chosen vocabulary, sentence structure, and even the page formatting can alter the predicted audience level by several levels, especially in the case of digital library resources. For these reasons, the performance of …


Multiview Semi-Supervised Learning With Consensus, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi Nov 2012

Multiview Semi-Supervised Learning With Consensus, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Obtaining high-quality and up-to-date labeled data can be difficult in many real-world machine learning applications. Semi-supervised learning aims to improve the performance of a classifier trained with limited number of labeled data by utilizing the unlabeled ones. This paper demonstrates a way to improve the transductive SVM, which is an existing semi-supervised learning algorithm, by employing a multiview learning paradigm. Multiview learning is based on the fact that for some problems, there may exist multiple perspectives, so called views, of each data sample. For example, in text classification, the typical view contains a large number of raw content features such …


Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin May 2011

Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. …


Semisupervised Svm Batch Mode Active Learning With Applications To Image Retrieval, Steven C. H. Hoi, Rong Jin, Jianke Zhu, Michael R. Lyu May 2009

Semisupervised Svm Batch Mode Active Learning With Applications To Image Retrieval, Steven C. H. Hoi, Rong Jin, Jianke Zhu, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Active learning has been shown as a key technique for improving content-based image retrieval (CBIR) performance. Among various methods, support vector machine (SVM) active learning is popular for its application to relevance feedback in CBIR. However, the regular SVM active learning has two main drawbacks when used for relevance feedback. First, SVM often suffers from learning with a small number of labeled examples, which is the case in relevance feedback. Second, SVM active learning usually does not take into account the redundancy among examples, and therefore could select multiple examples in relevance feedback that are similar (or even identical) to …


A Multimodal And Multilevel Ranking Scheme For Large-Scale Video Retrieval, Steven C. H. Hoi, Michael R. Lyu Jun 2008

A Multimodal And Multilevel Ranking Scheme For Large-Scale Video Retrieval, Steven C. H. Hoi, Michael R. Lyu

Research Collection School Of Computing and Information Systems

A critical issue of large-scale multimedia retrieval is how to develop an effective framework for ranking the search results. This problem is particularly challenging for content-based video retrieval due to some issues such as short text queries, insufficient sample learning, fusion of multimodal contents, and large-scale learning with huge media data. In this paper, we propose a novel multimodal and multilevel (MMML) ranking framework to attack the challenging ranking problem of content-based video retrieval. We represent the video retrieval task by graphs and suggest a graph based semi-supervised ranking (SSR) scheme, which can learn with small samples effectively and integrate …


Semi-Supervised Svm Batch Mode Active Learning For Image Retrieval, Steven Hoi, Rong Jin, Jianke Zhu, Michael R. Lyu Jun 2008

Semi-Supervised Svm Batch Mode Active Learning For Image Retrieval, Steven Hoi, Rong Jin, Jianke Zhu, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Active learning has been shown as a key technique for improving content-based image retrieval (CBIR) performance. Among various methods, support vector machine (SVM) active learning is popular for its application to relevance feedback in CBIR. However, the regular SVM active learning has two main drawbacks when used for relevance feedback. First, SVM often suffers from learning with a small number of labeled examples, which is the case in relevance feedback. Second, SVM active learning usually does not take into account the redundancy among examples, and therefore could select multiple examples in relevance feedback that are similar (or even identical) to …


A Multi-Scale Tikhonov Regularization Scheme For Implicit Surface Modeling, Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu Jun 2007

A Multi-Scale Tikhonov Regularization Scheme For Implicit Surface Modeling, Jianke Zhu, Steven C. H. Hoi, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Kernel machines have recently been considered as a promising solution for implicit surface modelling. A key challenge of machine learning solutions is how to fit implicit shape models from large-scale sets of point cloud samples efficiently. In this paper, we propose a fast solution for approximating implicit surfaces based on a multi-scale Tikhonov regularization scheme. The optimization of our scheme is formulated into a sparse linear equation system, which can be efficiently solved by factorization methods. Different from traditional approaches, our scheme does not employ auxiliary off-surface points, which not only saves the computational cost but also avoids the problem …


A Semi-Supervised Active Learning Framework For Image Retrieval, Steven Hoi, Michael R. Lyu Jun 2005

A Semi-Supervised Active Learning Framework For Image Retrieval, Steven Hoi, Michael R. Lyu

Research Collection School Of Computing and Information Systems

Although recent studies have shown that unlabeled data are beneficial to boosting the image retrieval performance, very few approaches for image retrieval can learn with labeled and unlabeled data effectively. This paper proposes a novel semi-supervised active learning framework comprising a fusion of semi-supervised learning and support vector machines. We provide theoretical analysis of the active learning framework and present a simple yet effective active learning algorithm for image retrieval. Experiments are conducted on real-world color images to compare with traditional methods. The promising experimental results show that our proposed scheme significantly outperforms the previous approaches.


Integrating User Feedback Log Into Relevance Feedback By Coupled Svm For Content-Based Image Retrieval, Steven C. H. Hoi, Michael R. Lyu, Rong Jin Apr 2005

Integrating User Feedback Log Into Relevance Feedback By Coupled Svm For Content-Based Image Retrieval, Steven C. H. Hoi, Michael R. Lyu, Rong Jin

Research Collection School Of Computing and Information Systems

Relevance feedback has been shown as an important tool to boost the retrieval performance in content-based image retrieval. In the past decade, various algorithms have been proposed to formulate relevance feedback in contentbased image retrieval. Traditional relevance feedback techniques mainly carry out the learning tasks by focusing lowlevel visual features of image content with little consideration on log information of user feedback. However, from a long-term learning perspective, the user feedback log is one of the most important resources to bridge the semantic gap problem in image retrieval. In this paper we propose a novel technique to integrate the log …


The Edam Project: Mining Atmospheric Aerosol Datasets, Raghu Ramakrishnan, James J. Schauer, Lei Chen, Zheng Huang, Martin M. Shafer, Deborah S. Gross, David R. Musicant Jan 2005

The Edam Project: Mining Atmospheric Aerosol Datasets, Raghu Ramakrishnan, James J. Schauer, Lei Chen, Zheng Huang, Martin M. Shafer, Deborah S. Gross, David R. Musicant

Faculty Work

Data mining has been a very active area of research in the database, machine learning, and mathematical programming communities in recent years. EDAM (Exploratory Data Analysis and Management) is a joint project between researchers in Atmospheric Chemistry and Computer Science at Carleton College and the University of Wisconsin-Madison that aims to develop data mining techniques for advancing the state of the art in analyzing atmospheric aerosol datasets. There is a great need to better understand the sources, dynamics, and compositions of atmospheric aerosols. The traditional approach for particle measurement, which is the collection of bulk samples of particulates on filters, …


Biased Support Vector Machine For Relevance Feedback In Image Retrieval, Steven Hoi, Chi-Hang Chan, Kaizhu Huang, Michael R. Lyu, Irwin King Jul 2004

Biased Support Vector Machine For Relevance Feedback In Image Retrieval, Steven Hoi, Chi-Hang Chan, Kaizhu Huang, Michael R. Lyu, Irwin King

Research Collection School Of Computing and Information Systems

Recently, support vector machines (SVMs) have been engaged on relevance feedback tasks in content-based image retrieval. Typical approaches by SVMs treat the relevance feedback as a strict binary classification problem. However, these approaches do not consider an important issue of relevance feedback, i.e. the unbalanced dataset problem, in which the negative instances largely outnumber the positive instances. For solving this problem, we propose a novel technique to formulate the relevance feedback based on a modified SVM called biased support vector machine (Biased SVM or BSVM). Mathematical formulation and explanations are provided for showing the advantages. Experiments are conducted to evaluate …