Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana Oct 2010

Comparative Analysis Of Combinations Of Dimension Reduction And Data Mining Techniques For Malware Detection, Proceso L. Fernandez Jr, Jeffrey C. Yiu, Paul Albert R. Arana

Department of Information Systems & Computer Science Faculty Publications

Many malware detectors utilize data mining techniques as primary tools for pattern recognition. As the number of new and evolving malware continues to rise, there is an increasing need for faster and more accurate detectors. However, for a given malware detector, detection speed and accuracy are usually inversely related. This study explores several configurations of classification combined with feature selection. An optimization function involving accuracy and processing time is used to evaluate each configuration. A real data set provided by Trend Micro Philippines is used for the study. Among 18 di↵erent configurations studied, it is shown that J4.8 without feature …


Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong Aug 2010

Pattern Space Maintenance For Data Updates And Interactive Mining, Mengling Feng, Guozhu Dong, Jinyan Li, Yap-Peng Tan, Limsoon Wong

Kno.e.sis Publications

This article addresses the incremental and decremental maintenance of the frequent pattern space. We conduct an in-depth investigation on how the frequent pattern space evolves under both incremental and decremental updates. Based on the evolution analysis, a new data structure, Generator-Enumeration Tree (GE-tree), is developed to facilitate the maintenance of the frequent pattern space. With the concept of GE-tree, we propose two novel algorithms, Pattern Space Maintainer+ (PSM+) and Pattern Space Maintainer− (PSM−), for the incremental and decremental maintenance of frequent patterns. Experimental results demonstrate that the proposed algorithms, on average, outperform the representative state-of-the-art …


Visual Discovery In Multivariate Binary Data, Boris Kovalerchuk, Florian Delizy, Logan Riggs, Evgenii Vityaev Jan 2010

Visual Discovery In Multivariate Binary Data, Boris Kovalerchuk, Florian Delizy, Logan Riggs, Evgenii Vityaev

Computer Science Faculty Scholarship

This paper presents the concept of Monotone Boolean Function Visual Analytics (MBFVA) and its application to the medical domain. The medical application is concerned with discovering breast cancer diagnostic rules (i) interactively with a radiologist, (ii) analytically with data mining algorithms, and (iii) visually. The coordinated visualization of these rules opens an opportunity to coordinate the rules, and to come up with rules that are meaningful for the expert in the field, and are confirmed with the database. This paper shows how to represent and visualize binary multivariate data in 2-D and 3-D. This representation preserves the structural relations that …


A Novel Subspace Outlier Detection Approach In High Dimensional Data Sets, Jinsong Leng Jan 2010

A Novel Subspace Outlier Detection Approach In High Dimensional Data Sets, Jinsong Leng

Research outputs pre 2011

Many real applications are required to detect outliers in high dimensional data sets. The major difficulty of mining outliers lies on the fact that outliers are often embedded in subspaces. No efficient methods are available in general for subspace-based outlier detection. Most existing subspacebased outlier detection methods identify outliers by searching for abnormal sparse density units in subspaces. In this paper, we present a novel approach for finding outliers in the ‘interesting’ subspaces. The interesting subspaces are strongly correlated with `good' clusters. This approach aims to group the meaningful subspaces and then identify outliers in the projected subspaces. In doing …


A Wrapper-Based Feature Selection For Analysis Of Large Data Sets, Jinsong Leng, Craig Valli, Leisa Armstrong Jan 2010

A Wrapper-Based Feature Selection For Analysis Of Large Data Sets, Jinsong Leng, Craig Valli, Leisa Armstrong

Research outputs pre 2011

Knowledge discovery from large data sets using classic data mining techniques has been proved to be difficult due to large size in both dimension and samples. In real applications, data sets often consist of many noisy, redundant, and irrelevant features, resulting in degrading the classification accuracy and increasing the complexity exponentially. Due to the inherent nature, the analysis of the quality of data sets is difficult and very limited approaches about this issue can be found in the literature. This paper presents a novel method to investigate the quality and structure of data sets, i.e., how to analyze whether there …


Application Of A Data Mining Framework For The Identification Of Agricultural Production Areas In Wa, Yunous Vagh, Leisa Armstrong, Dean Diepeveen Jan 2010

Application Of A Data Mining Framework For The Identification Of Agricultural Production Areas In Wa, Yunous Vagh, Leisa Armstrong, Dean Diepeveen

Research outputs pre 2011

This paper will propose a data mining framework for the identification of agricultural production areas ill WA. The data mining (DM) framework was developed with the aim of enhancing the analysis of agricultural datasets compared to currently used statistical methods. The DM framework is a synthesis of different technologies brought together for the purpose of enhancing the interrogation of these datasets. The DM framework is based on the data, information, knowledge and wisdom continuum as a horizontal axis, with DM and online analytical processing (OLAP) forming the vertical axis. In addition the DM framework incorporates aspects of data warehousing phases, …