Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Computer Sciences

Stability And Classification Performance Of Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar, Qianhui Liang Dec 2011

Stability And Classification Performance Of Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar, Qianhui Liang

Computer Science Faculty Publications

Feature selection techniques can be evaluated based on either model performance or the stability (robustness) of the technique. The ideal situation is to choose a feature selec- tion technique that is robust to change, while also ensuring that models built with the selected features perform well. One domain where feature selection is especially important is software defect prediction, where large numbers of met- rics collected from previous software projects are used to help engineers focus their efforts on the most faulty mod- ules. This study presents a comprehensive empirical ex- amination of seven filter-based feature ranking techniques (rankers) applied to …


Using Semantic Templates To Study Vulnerabilities Recorded In Large Software Repositories, Yan Wu Oct 2011

Using Semantic Templates To Study Vulnerabilities Recorded In Large Software Repositories, Yan Wu

Student Work

Software vulnerabilities allow an attacker to reduce a system's Confidentiality, Availability, and Integrity by exposing information, executing malicious code, and undermine system functionalities that contribute to the overall system purpose and need. With new vulnerabilities discovered everyday in a variety of applications and user environments, a systematic study of their characteristics is a subject of immediate need for the following reasons:

  • The high rate in which information about past and new vulnerabilities are accumulated makes it difficult to absorb and comprehend.
  • Rather than learning from past mistakes, similar types of vulnerabilities are observed repeatedly.
  • As the scale and complexity of …


Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain Oct 2011

Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last …


Classification For Mass Spectra And Comprehensive Two-Dimensional Chromatograms, Xue Tian Aug 2011

Classification For Mass Spectra And Comprehensive Two-Dimensional Chromatograms, Xue Tian

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

Mass spectra contain characteristic information regarding the molecular structure and properties of compounds. The mass spectra of compounds from the same chemically related group are similar. Classification is one of the fundamental methodologies for analyzing mass spectral data. The primary goals of classification are to automatically group compounds based on their mass spectra, to find correlation between the properties of compounds and their mass spectra, and to provide a positive identification of unknown compounds.

This dissertation presents a new algorithm for the classification of mass spectra, the most similar neighbor with a probability-based spectrum similarity measure (MSN-PSSM). Experimental results demonstrate …


Centinela: A Human Activity Recognition System Based On Acceleration And Vital Sign Data, Óscar D. Lara, Alfredo J. Perez, Miguel A. Labrador, José D. Posada Jul 2011

Centinela: A Human Activity Recognition System Based On Acceleration And Vital Sign Data, Óscar D. Lara, Alfredo J. Perez, Miguel A. Labrador, José D. Posada

Computer Science Faculty Publications

This paper presents Centinela, a system that combines acceleration data with vital signs to achieve highly accurate activity recognition. Centinela recognizes five activities: walking, running, sitting, ascending, and descending. The system includes a portable and unobtrusive real-time data collection platform, which only requires a single sensing device and a mobile phone. To extract features, both statistical and structural detectors are applied, and two new features are proposed to discriminate among activities during periods of vital sign stabilization. After evaluating eight different classifiers and three different time window sizes, our results show that Centinela achieves up to 95.7% overall accuracy, which …


Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin May 2011

Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. …


Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul Apr 2011

Processing And Classification Of Physiological Signals Using Wavelet Transform And Machine Learning Algorithms, Abed Al-Raoof Bsoul

Theses and Dissertations

Over the last century, physiological signals have been broadly analyzed and processed not only to assess the function of the human physiology, but also to better diagnose illnesses or injuries and provide treatment options for patients. In particular, Electrocardiogram (ECG), blood pressure (BP) and impedance are among the most important biomedical signals processed and analyzed. The majority of studies that utilize these signals attempt to diagnose important irregularities such as arrhythmia or blood loss by processing one of these signals. However, the relationship between them is not yet fully studied using computational methods. Therefore, a system that extract and combine …


Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea Jan 2011

Algorithms For Training Large-Scale Linear Programming Support Vector Regression And Classification, Pablo Rivas Perea

Open Access Theses & Dissertations

The main contribution of this dissertation is the development of a method to train a Support Vector Regression (SVR) model for the large-scale case where the number of training samples supersedes the computational resources. The proposed scheme consists of posing the SVR problem entirely as a Linear Programming (LP) problem and on the development of a sequential optimization method based on variables decomposition, constraints decomposition, and the use of primal-dual interior point methods. Experimental results demonstrate that the proposed approach has comparable performance with other SV-based classifiers. Particularly, experiments demonstrate that as the problem size increases, the sparser the solution …


Measuring Traffic Flow And Classifying Vehicle Types: A Surveillance Video Based Approach, Erhan İnce Jan 2011

Measuring Traffic Flow And Classifying Vehicle Types: A Surveillance Video Based Approach, Erhan İnce

Turkish Journal of Electrical Engineering and Computer Sciences

The paper presents a vehicle counting method based on invariant moments and shadow aware foreground masks. Estimation of the background and the segmentation of foreground regions can be done using either the Mixture of Gaussians model (MoG) or an improved version of the Group Based Histogram (GBH) technique. The work demonstrates that, even though the improved GBH method delivers performance just as good as MoG, considering computational efficiency, MoG is more suitable. Shadow aware binary masks for each frame are formed by using background subtraction and shadow removal in the Hue Saturation and Value (HSV) domain. To determine new vehicles …