Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Science Faculty Publications

2011

Software metrics

Articles 1 - 2 of 2

Full-Text Articles in Computer Engineering

Measuring Stability Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar Nov 2011

Measuring Stability Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi Khoshgoftaar

Computer Science Faculty Publications

Feature selection has been applied in many domains, such as text mining and software engineering. Ideally a feature selection technique should produce consistent out- puts regardless of minor variations in the input data. Re- searchers have recently begun to examine the stability (robustness) of feature selection techniques. The stability of a feature selection method is defined as the degree of agreement between its outputs to randomly-selected subsets of the same input data. This study evaluated the stability of 11 threshold-based feature ranking techniques (rankers) when applied to 16 real-world software measurement datasets of different sizes. Experimental results demonstrate that AUC …


Measuring Robustness Of Feature Selection Techniques On Software Engineering Datasets, Huanjing Wang, Taghi Khoshgoftaar, Randall Wald Aug 2011

Measuring Robustness Of Feature Selection Techniques On Software Engineering Datasets, Huanjing Wang, Taghi Khoshgoftaar, Randall Wald

Computer Science Faculty Publications

Feature Selection is a process which identifies irrelevant and redundant features from a high-dimensional dataset (that is, a dataset with many features), and removes these before further analysis is performed. Recently, the robustness (e.g., stability) of feature selection techniques has been studied, to examine the sensitivity of these techniques to changes in their input data. In this study, we investigate the robustness of six commonly used feature selection techniques as the magnitude of change to the datasets and the size of the selected feature subsets are varied. All experiments were conducted on 16 datasets from three real-world software projects. The …