Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Computer Sciences

Stability And Classification Performance Of Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar, Qianhui Liang Dec 2011

Stability And Classification Performance Of Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar, Qianhui Liang

Computer Science Faculty Publications

Feature selection techniques can be evaluated based on either model performance or the stability (robustness) of the technique. The ideal situation is to choose a feature selec- tion technique that is robust to change, while also ensuring that models built with the selected features perform well. One domain where feature selection is especially important is software defect prediction, where large numbers of met- rics collected from previous software projects are used to help engineers focus their efforts on the most faulty mod- ules. This study presents a comprehensive empirical ex- amination of seven filter-based feature ranking techniques (rankers) applied to …


Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain Oct 2011

Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last …


Classification For Mass Spectra And Comprehensive Two-Dimensional Chromatograms, Xue Tian Aug 2011

Classification For Mass Spectra And Comprehensive Two-Dimensional Chromatograms, Xue Tian

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Mass spectra contain characteristic information regarding the molecular structure and properties of compounds. The mass spectra of compounds from the same chemically related group are similar. Classification is one of the fundamental methodologies for analyzing mass spectral data. The primary goals of classification are to automatically group compounds based on their mass spectra, to find correlation between the properties of compounds and their mass spectra, and to provide a positive identification of unknown compounds.

This dissertation presents a new algorithm for the classification of mass spectra, the most similar neighbor with a probability-based spectrum similarity measure (MSN-PSSM). Experimental results demonstrate …


Centinela: A Human Activity Recognition System Based On Acceleration And Vital Sign Data, Óscar D. Lara, Alfredo J. Perez, Miguel A. Labrador, José D. Posada Jul 2011

Centinela: A Human Activity Recognition System Based On Acceleration And Vital Sign Data, Óscar D. Lara, Alfredo J. Perez, Miguel A. Labrador, José D. Posada

Computer Science Faculty Publications

This paper presents Centinela, a system that combines acceleration data with vital signs to achieve highly accurate activity recognition. Centinela recognizes five activities: walking, running, sitting, ascending, and descending. The system includes a portable and unobtrusive real-time data collection platform, which only requires a single sensing device and a mobile phone. To extract features, both statistical and structural detectors are applied, and two new features are proposed to discriminate among activities during periods of vital sign stabilization. After evaluating eight different classifiers and three different time window sizes, our results show that Centinela achieves up to 95.7% overall accuracy, which …


Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin May 2011

Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. …