Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

2015

Clustering

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 18 of 18

Full-Text Articles in Computer Sciences

Mapping Nominal Values To Numbers For Effective Visualization, Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew Ward Dec 2015

Mapping Nominal Values To Numbers For Effective Visualization, Geraldine Rosario, Elke Rundensteiner, David Brown, Matthew Ward

David C. Brown

Data sets with a large number of nominal variables, some with high cardinality, are becoming increasingly common and need to be explored. Unfortunately, most existing visual exploration displays are designed to handle numeric variables only. When importing data sets with nominal values into such visualization tools, most solutions to date are rather simplistic. Often, techniques that map nominal values to numbers do not assign order or spacing among the values in a manner that conveys semantic relationships. Moreover, displays designed for nominal variables usually cannot handle high cardinality variables well. This paper addresses the problem of how to display nominal …


Neuron Clustering For Mitigating Catastrophic Forgetting In Supervised And Reinforcement Learning, Benjamin Frederick Goodrich Dec 2015

Neuron Clustering For Mitigating Catastrophic Forgetting In Supervised And Reinforcement Learning, Benjamin Frederick Goodrich

Doctoral Dissertations

Neural networks have had many great successes in recent years, particularly with the advent of deep learning and many novel training techniques. One issue that has affected neural networks and prevented them from performing well in more realistic online environments is that of catastrophic forgetting. Catastrophic forgetting affects supervised learning systems when input samples are temporally correlated or are non-stationary. However, most real-world problems are non-stationary in nature, resulting in prolonged periods of time separating inputs drawn from different regions of the input space.

Reinforcement learning represents a worst-case scenario when it comes to precipitating catastrophic forgetting in neural networks. …


Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch Dec 2015

Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch

Research Collection School Of Computing and Information Systems

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the …


Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron Nov 2015

Cvic: Cluster Validation Using Instance-Based Confidences, Dean M. Lebaron

Theses and Dissertations

As unlabeled data becomes increasingly available, the need for robust data mining techniques increases as well. Clustering is a common data mining tool which seeks to find related, independent patterns in data called clusters. The cluster validation problem addresses the question of how well a given clustering fits the data set. We present CVIC (cluster validation using instance-based confidences) which assigns confidence scores to each individual instance, as opposed to more traditional methods which focus on the clusters themselves. CVIC trains supervised learners to recreate the clustering, and instances are scored based on output from the learners which corresponds to …


Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso Oct 2015

Analyzing And Organizing The Sonic Space Of Vocal Imitation, Davide Andrea Mauro Phd, D. Rocchesso

Computer Sciences and Electrical Engineering Faculty Research

The sonic space that can be spanned with the voice is vast and complex and, therefore, it is difficult to organize and explore. In order to devise tools that facilitate sound design by vocal sketching we attempt at organizing a database of short excerpts of vocal imitations. By clustering the sound samples on a space whose dimensionality has been reduced to the two principal components, it is experimentally checked how meaningful the resulting clusters are for humans. Eventually, a representative of each cluster, chosen to be close to its centroid, may serve as a landmark in the exploration of the …


Analyzing Educational Comments For Topics And Sentiments: A Text Analytics Approach, Gokran Ila Nitin, Swapna Gottipati, Venky Shankararaman Oct 2015

Analyzing Educational Comments For Topics And Sentiments: A Text Analytics Approach, Gokran Ila Nitin, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

Universities collect qualitative and quantitative feedback from students upon course completion in order to improve course quality and students’ learning experience. Combining program-wide and module-specific questions, universities collect feedback from students on three main aspects of a course namely, teaching style, content, and learning experience. The feedback is collected through both qualitative comments and quantitative scores. Current methods for analyzing the student course evaluations are manual and majorly focus on quantitative feedback and fall short of an in-depth exploration of qualitative feedback. In this paper, we develop student feedback mining system (SFMS) which applies text analytics and opinion mining approach …


Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu Sep 2015

Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu

Department of Electrical and Computer Engineering: Faculty Publications

Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging …


Evaluation And Improvement Of Procurement Process With Data Analytics, Melvin H. C. Tan, Wee Leong Lee Sep 2015

Evaluation And Improvement Of Procurement Process With Data Analytics, Melvin H. C. Tan, Wee Leong Lee

Research Collection School Of Computing and Information Systems

Analytics can be applied in procurement to benefit organizations beyond just prevention and detection of fraud. This study aims to demonstrate how advanced data mining techniques such as text mining and cluster analysis can be used to improve visibility of procurement patterns and provide decision-makers with insight to develop more efficient sourcing strategies, in terms of cost and effort. A case study of an organization’s effort to improve its procurement process is presented in this paper. The findings from this study suggest that opportunities exist for organizations to aggregate common goods and services among the purchases made under and across …


Clustering Data Of Mixed Categorical And Numerical Type With Unsupervised Feature Learning, Dao Lam, Mingzhen Wei, Donald C. Wunsch Sep 2015

Clustering Data Of Mixed Categorical And Numerical Type With Unsupervised Feature Learning, Dao Lam, Mingzhen Wei, Donald C. Wunsch

Geosciences and Geological and Petroleum Engineering Faculty Research & Creative Works

Mixed-type categorical and numerical data are a challenge in many applications. This general area of mixed-type data is among the frontier areas, where computational intelligence approaches are often brittle compared with the capabilities of living creatures. In this paper, unsupervised feature learning (UFL) is applied to the mixed-type data to achieve a sparse representation, which makes it easier for clustering algorithms to separate the data. Unlike other UFL methods that work with homogeneous data, such as image and video data, the presented UFL works with the mixed-type data using fuzzy adaptive resonance theory (ART). UFL with fuzzy ART (UFLA) obtains …


Online Multimodal Co-Indexing And Retrieval Of Weakly Labeled Web Image Collections, Lei Meng, Ah-Hwee Tan, Cyril Leung, Liqiang Nie, Tan-Seng Chua, Chunyan Miao Jun 2015

Online Multimodal Co-Indexing And Retrieval Of Weakly Labeled Web Image Collections, Lei Meng, Ah-Hwee Tan, Cyril Leung, Liqiang Nie, Tan-Seng Chua, Chunyan Miao

Research Collection School Of Computing and Information Systems

Weak supervisory information of web images, such as captions, tags, and descriptions, make it possible to better understand images at the semantic level. In this paper, we propose a novel online multimodal co-indexing algorithm based on Adaptive Resonance Theory, named OMC-ART, for the automatic co-indexing and retrieval of images using their multimodal information. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMCART is able to perform online learning of sequential data. Second, OMC-ART builds a two-layer indexing structure, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions …


Efficient Variations Of The Quality Threshold Clustering Algorithm, Frank Loforte Jr. May 2015

Efficient Variations Of The Quality Threshold Clustering Algorithm, Frank Loforte Jr.

CCE Theses and Dissertations

Clustering gene expression data such that the diameters of the clusters formed are no greater than a specified threshold prompted the development of the Quality Threshold Clustering (QTC) algorithm. It iteratively forms clusters of non-increasing size until all points are clustered; the largest cluster is always selected first. The QTC algorithm applies in many other domains that require a similar quality guarantee based on cluster diameter. The worst-case complexity of the original QTC algorithm is (n5). Since practical applications often involve large datasets, researchers called for more efficient versions of the QTC algorithm.

This dissertation aimed to develop …


Efficient Estimation Of Cluster Population, Sanjeev K C May 2015

Efficient Estimation Of Cluster Population, Sanjeev K C

UNLV Theses, Dissertations, Professional Papers, and Capstones

Partitioning a given set of points into clusters is a well known problem in pattern recognition, data mining, and knowledge discovery. One of the well known methods for identifying clusters in Euclidean space is the K-mean algorithm. In using the K-mean clustering algorithm it is necessary to know the value of k (the number of clusters) in advance. We propose to develop algorithms for good estimation of k for points distributed in two dimensions. The techniques we pursue include a bucketing method, g-hop neighbors, and Voronoi diagrams. We also present experimental results for examining the performances of the bucketing method …


Active Semi-Supervised Defect Categorization, Ferdian Thung, Xuan-Bach D. Le, David Lo May 2015

Active Semi-Supervised Defect Categorization, Ferdian Thung, Xuan-Bach D. Le, David Lo

Research Collection School Of Computing and Information Systems

Defects are inseparable part of software development and evolution. To better comprehend problems affecting a software system, developers often store historical defects and these defects can be categorized into families. IBM proposes Orthogonal Defect Categorization (ODC) which include various classifications of defects based on a number of orthogonal dimensions (e.g., symptoms and semantics of defects, root causes of defects, etc.). To help developers categorize defects, several approaches that employ machine learning have been proposed in the literature. Unfortunately, these approaches often require developers to manually label a large number of defect examples. In practice, manually labelling a large number of …


News Feeds Clustering Research Study, Haytham Abuel-Futuh Apr 2015

News Feeds Clustering Research Study, Haytham Abuel-Futuh

CCE Theses and Dissertations

With over 0.25 billion web pages hosted in the World Wide Web, it is virtually impossible to navigate through the Internet. Many applications try to help users achieve this task. For example, search engines build indexes to make the entire World Wide Web searchable, and news curators allow users to browse topics of interest on different structured sites. One problem that arises for these applications and others with similar goals is identifying documents with similar contents. This helps the applications show users documents with unique contents as well as group various similar documents under similar topics. There has been a …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jan 2015

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Zhongmei Yao

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


Fuzzy Adaptive Resonance Theory: Applications And Extensions, Clayton Parker Smith Jan 2015

Fuzzy Adaptive Resonance Theory: Applications And Extensions, Clayton Parker Smith

Masters Theses

"Adaptive Resonance Theory, ART, is a powerful clustering tool for learning arbitrary patterns in a self-organizing manner. In this research, two papers are presented that examine the extensibility and applications of ART. The first paper examines a means to boost ART performance by assigning each cluster a vigilance value, instead of a single value for the whole ART module. A Particle Swarm Optimization technique is used to search for desirable vigilance values. In the second paper, it is shown how ART, and clustering in general, can be a useful tool in preprocessing time series data. Clustering quantization attempts to meaningfully …


Computational Intelligence Based Complex Adaptive System-Of-Systems Architecture Evolution Strategy, Siddharth Agarwal Jan 2015

Computational Intelligence Based Complex Adaptive System-Of-Systems Architecture Evolution Strategy, Siddharth Agarwal

Doctoral Dissertations

The dynamic planning for a system-of-systems (SoS) is a challenging endeavor. Large scale organizations and operations constantly face challenges to incorporate new systems and upgrade existing systems over a period of time under threats, constrained budget and uncertainty. It is therefore necessary for the program managers to be able to look at the future scenarios and critically assess the impact of technology and stakeholder changes. Managers and engineers are always looking for options that signify affordable acquisition selections and lessen the cycle time for early acquisition and new technology addition. This research helps in analyzing sequential decisions in an evolving …


Comparison Of Clustering Techniques For Traffic Accident Detection, Nejdet Doğru, Abdülhami̇t Subaşi Jan 2015

Comparison Of Clustering Techniques For Traffic Accident Detection, Nejdet Doğru, Abdülhami̇t Subaşi

Turkish Journal of Electrical Engineering and Computer Sciences

Transportation infrastructure in intelligent transportation systems (ITSs) is complemented with information and communication technologies to achieve better passenger safety and reduced transportation time, fuel consumption, and vehicle wear and tear. This paper shows how data mining techniques are used in ITSs for accident detection and prevention on motorways. In traffic, vehicles show similar behavior to that of vehicles in closed neighborhoods. Vehicles that show different behaviors than neighbor vehicles in cases like accidents, inappropriate lane changes, and speeding can be considered as anomalies and detected. In this paper, a traffic accident is simulated and the effectiveness of different clustering techniques …