Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Computer Sciences

Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana Oct 2010

Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana

Department of Information Systems & Computer Science Faculty Publications

Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature …


A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse Aug 2010

A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse

Computer Science Faculty Publications

Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …