Open Access. Powered by Scholars. Published by Universities.®

OS and Networks Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in OS and Networks

Salience-Aware Adaptive Resonance Theory For Large-Scale Sparse Data Clustering, Lei Meng, Ah-Hwee Tan, Chunyan Miao Dec 2019

Salience-Aware Adaptive Resonance Theory For Large-Scale Sparse Data Clustering, Lei Meng, Ah-Hwee Tan, Chunyan Miao

Research Collection School Of Computing and Information Systems

Sparse data is known to pose challenges to cluster analysis, as the similarity between data tends to be ill-posed in the high-dimensional Hilbert space. Solutions in the literature typically extend either k-means or spectral clustering with additional steps on representation learning and/or feature weighting. However, adding these usually introduces new parameters and increases computational cost, thus inevitably lowering the robustness of these algorithms when handling massive ill-represented data. To alleviate these issues, this paper presents a class of self-organizing neural networks, called the salience-aware adaptive resonance theory (SA-ART) model. SA-ART extends Fuzzy ART with measures for cluster-wise salient feature modeling. …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello Jun 2016

Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello

FIU Electronic Theses and Dissertations

Operating Systems use fast, CPU-addressable main memory to maintain an application’s temporary data as anonymous data and to cache copies of persistent data stored in slower block-based storage devices. However, the use of this faster memory comes at a high cost. Therefore, several techniques have been implemented to use main memory more efficiently in the literature. In this dissertation we introduce three distinct approaches to improve overall system performance by optimizing main memory usage.

First, DRAM and host-side caching of file system data are used for speeding up virtual machine performance in today’s virtualized data centers. The clustering of VM …


Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch Dec 2015

Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch

Research Collection School Of Computing and Information Systems

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jan 2015

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Zhongmei Yao

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo Jun 2014

An Automated Approach For Finding Variable-Constant Pairing Bugs, Julia Lawall, David Lo

David LO

Named constants are used heavily in operating systems code, both as internal ags and in interactions with devices. Decision making within an operating system thus critically depends on the correct usage of these values. Nevertheless, compilers for the languages typically used in implementing operating systems provide little support for checking the usage of named constants. This affects correctness, when a constant is used in a context where its value is meaningless, and software maintenance, when a constant has the right value for its usage context but the wrong name. We propose a hybrid program-analysis and data-mining based approach to identify …


Simulation Of Circuit Creation In Tor: Preliminary Results, William Boyd, Norman Danner, Danny Krizanc Jul 2013

Simulation Of Circuit Creation In Tor: Preliminary Results, William Boyd, Norman Danner, Danny Krizanc

Norman Danner

We describe a methodology for simulating Tor relay up/down behavior over time and give some preliminary results.


Multi-Order Neurons For Evolutionary Higher Order Clustering And Growth, Kiruthika Ramanathan, Sheng Uei Guan Dec 2007

Multi-Order Neurons For Evolutionary Higher Order Clustering And Growth, Kiruthika Ramanathan, Sheng Uei Guan

Research Collection School Of Computing and Information Systems

This letter proposes to use multiorder neurons for clustering irregularly shaped data arrangements. Multiorder neurons are an evolutionary extension of the use of higher-order neurons in clustering. Higher-order neurons parametrically model complex neuron shapes by replacing the classic synaptic weight by higher-order tensors. The multiorder neuron goes one step further and eliminates two problems associated with higher-order neurons. First, it uses evolutionary algorithms to select the best neuron order for a given problem. Second, it obtains more information about the underlying data distribution by identifying the correct order for a given cluster of patterns. Empirically we observed that when the …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jun 2005

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Computer Science Faculty Publications

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


Modified Art 2a Growing Network Capable Of Generating A Fixed Number Of Nodes, Ji He, Ah-Hwee Tan, Chew-Lim Tan May 2004

Modified Art 2a Growing Network Capable Of Generating A Fixed Number Of Nodes, Ji He, Ah-Hwee Tan, Chew-Lim Tan

Research Collection School Of Computing and Information Systems

This paper introduces the Adaptive Resonance Theory under Constraint (ART-C 2A) learning paradigm based on ART 2A, which is capable of generating a user-defined number of recognition nodes through online estimation of an appropriate vigilance threshold. Empirical experiments compare the cluster validity and the learning efficiency of ART-C 2A with those of ART 2A, as well as three closely related clustering methods, namely online K-Means, batch K-Means, and SOM, in a quantitative manner. Besides retaining the online cluster creation capability of ART 2A, ART-C 2A gives the alternative clustering solution, which allows a direct control on the number of output …