Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Computer Sciences

Measuring Stability Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar Nov 2011

Measuring Stability Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar

Computer Science Faculty Publications

Feature selection has been applied in many domains, such as text mining and software engineering. Ideally a feature selection technique should produce consistent out- puts regardless of minor variations in the input data. Re- searchers have recently begun to examine the stability (robustness) of feature selection techniques. The stability of a feature selection method is defined as the degree of agreement between its outputs to randomly-selected subsets of the same input data. This study evaluated the stability of 11 threshold-based feature ranking techniques (rankers) when applied to 16 real-world software measurement datasets of different sizes. Experimental results demonstrate that AUC …


Measuring Robustness Of Feature Selection Techniques On Software Engineering Datasets, Huanjing Wang, Taghi Khoshgoftaar, Randall Wald Aug 2011

Measuring Robustness Of Feature Selection Techniques On Software Engineering Datasets, Huanjing Wang, Taghi Khoshgoftaar, Randall Wald

Computer Science Faculty Publications

Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes in their input data. In this study, we investigate the robustness of six commonly used feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from three real-world software projects. The …


A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse Aug 2010

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Computer Science Faculty Publications

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …


High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Nov 2009

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …