Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Computer Sciences
Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana
Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana
Department of Information Systems & Computer Science Faculty Publications
Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature …
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
A Comparative Study Of Threshold-Based Feature Selection Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Jason Van Hulse
Computer Science Faculty Publications
Abstract Given high-dimensional software measurement data, researchers and practitioners often use feature (metric) selection techniques to improve the performance of software quality classification models. This paper presents our newly proposed threshold-based feature selection techniques, comparing the performance of these techniques by building classification models using five commonly used classifiers. In order to evaluate the effectiveness of different feature selection techniques, the models are evaluated using eight different performance metrics separately since a given performance metric usually captures only one aspect of the classification performance. All experiments are conducted on three Eclipse data sets with different levels of class imbalance. The …