Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

2018

Clustering

Discipline
Institution
Publication
Publication Type
File Type

Articles 1 - 14 of 14

Full-Text Articles in Engineering

Applications Of Node-Based Resilience Graph Theoretic Framework To Clustering Autism Spectrum Disorders Phenotypes, John Matta, Junya Zhao, Gunes Ercal, Tayo Obafemi-Ajayi Dec 2018

Applications Of Node-Based Resilience Graph Theoretic Framework To Clustering Autism Spectrum Disorders Phenotypes, John Matta, Junya Zhao, Gunes Ercal, Tayo Obafemi-Ajayi

Electrical and Computer Engineering Faculty Research & Creative Works

With the growing ubiquity of data in network form, clustering in the context of a network, represented as a graph, has become increasingly important. Clustering is a very useful data exploratory machine learning tool that allows us to make better sense of heterogeneous data by grouping data with similar attributes based on some criteria. This paper investigates the application of a novel graph theoretic clustering method, Node-Based Resilience clustering (NBR-Clust), to address the heterogeneity of Autism Spectrum Disorder (ASD) and identify meaningful subgroups. The hypothesis is that analysis of these subgroups would reveal relevant biomarkers that would provide a better …


Node-Based Resilience Measure Clustering With Applications To Noisy And Overlapping Communities In Complex Networks, John Matta, Tayo Obafemi-Ajayi, Jeffrey Borwey, Koushik Sinha, Donald C. Wunsch, Gunes Ercal Aug 2018

Node-Based Resilience Measure Clustering With Applications To Noisy And Overlapping Communities In Complex Networks, John Matta, Tayo Obafemi-Ajayi, Jeffrey Borwey, Koushik Sinha, Donald C. Wunsch, Gunes Ercal

Electrical and Computer Engineering Faculty Research & Creative Works

This paper examines a schema for graph-theoretic clustering using node-based resilience measures. Node-based resilience measures optimize an objective based on a critical set of nodes whose removal causes some severity of disconnection in the network. Beyond presenting a general framework for the usage of node-based resilience measures for variations of clustering problems, we experimentally validate the usefulness of such methods in accomplishing the following: (i) clustering a graph in one step without knowing the number of clusters a priori; (ii) removing noise from noisy data; and (iii) detecting overlapping communities. We demonstrate that this clustering schema can be applied successfully …


On Wlan Fingerprint Indoor Positioning Systems Clustering, And Classification For Enhanced Performance, Haider G. Al Glehawi Aug 2018

On Wlan Fingerprint Indoor Positioning Systems Clustering, And Classification For Enhanced Performance, Haider G. Al Glehawi

Masters Theses

The most economic and affordable IPS are those incorporating existing infrastructure, such as the widely spread Wireless Local Area Network (WLAN). The Received Signal Strength (RSS) fingerprinting-based system is one of the most promising and powerful techniques so far to be used for indoor positioning. However, there are two challenges in using RSS based IPS; the first challenge is the variation of RSS to indoor multipath propagation, and the second is the high number of Access Points (APs) that are deployed in the region of interest. The first issue leads to degradation in the performance of RSS based IPS, while …


Analysis Of Grapevine Gene Expression Data Using Node-Based Resilience Clustering, Jeffrey Dale, John Matta, Susanne Howard, Gunes Ercal, Wenping Qiu, Tayo Obafemi-Ajayi Jul 2018

Analysis Of Grapevine Gene Expression Data Using Node-Based Resilience Clustering, Jeffrey Dale, John Matta, Susanne Howard, Gunes Ercal, Wenping Qiu, Tayo Obafemi-Ajayi

Electrical and Computer Engineering Faculty Research & Creative Works

Powdery mildew is the most economically important disease of cultivated grapevines worldwide. In the agricultural community, there is a great need for better understanding of the complex genetic basis of powdery mildew (PM) resistance by delineating possible gene biomarkers associated with the plants' defense mechanisms. Machine learning techniques can be applied to analysis of gene expression data to aid knowledge discovery of disease fighting genes. In this work, we apply a data-driven computational model, utilizing a graph-based clustering algorithm - Node-Based Resilience Clustering (NBRClust), to analyze grapevine gene expression data to identify possible gene biomarkers associated with powdery mildew disease …


Clustering Method Based On Graph Data Model And Reliability Detection, Yanyun Cheng, Huisong Bian, Changsheng Bian Jun 2018

Clustering Method Based On Graph Data Model And Reliability Detection, Yanyun Cheng, Huisong Bian, Changsheng Bian

Journal of System Simulation

Abstract: For the data in feature space, traditional clustering algorithm can take clustering analysis directly. High-dimensional spatial data cannot achieve intuitive and effective graphical visualization of clustering results in 2D plane. Graph data can clearly reflect the similarity relationship between objects. According to the distance of the data objects, the feature space data are modeled as graph data by iteration. Cluster analysis based on modularity is carried out on the modeling graph data. The two-dimensional visualization of non-spherical-shape distribution data cluster and result is achieved. The concept of credibility of the clustering result is proposed, and a method is proposed, …


Trust Based Time-Varying Network Topology For Distributed Co-Operative Control Of Multi-Class Multi-Agent Systems, Ankur Vipulkumar Dalal May 2018

Trust Based Time-Varying Network Topology For Distributed Co-Operative Control Of Multi-Class Multi-Agent Systems, Ankur Vipulkumar Dalal

Mechanical and Aerospace Engineering Theses

With increased levels of autonomy in most of the engineering fields and booms in areas such as swarms, platoons and Internet of Things (IoT), communication and information flow has become a highly researched field. With advancements in autonomous vehicles (AVs) and drones in armed warfare, more and more focus is being laid on intercommunication between these vehicles and its surroundings as well as intra-communication among the fleets/swarms itself. It is easier to deal with individual agents whereas it is quite challenging to deal with multi-agent systems especially with highly dynamic agents. In this thesis, we propose a general protocol for …


K-Means: A Revisit, Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo May 2018

K-Means: A Revisit, Wan-Lei Zhao, Cheng-Hao Deng, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Due to its simplicity and versatility, k-means remains popular since it was proposed three decades ago. The performance of k-means has been enhanced from different perspectives over the years. Unfortunately, a good trade-off between quality and efficiency is hardly reached. In this paper, a novel k-means variant is presented. Different from most of k-means variants, the clustering procedure is driven by an explicit objective function, which is feasible for the whole l(2)-space. The classic egg-chicken loop in k-means has been simplified to a pure stochastic optimization procedure. The procedure of k-means becomes simpler and converges to a considerably better local …


Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao Jan 2018

Interactive Clinical Event Pattern Mining And Visualization Using Insurance Claims Data, Zhenhui Piao

Theses and Dissertations--Computer Science

With exponential growth on a daily basis, there is potentially valuable information hidden in complex electronic medical records (EMR) systems. In this thesis, several efficient data mining algorithms were explored to discover hidden knowledge in insurance claims data. The first aim was to cluster three levels of information overload(IO) groups among chronic rheumatic disease (CRD) patient groups based on their clinical events extracted from insurance claims data. The second aim was to discover hidden patterns using three renowned pattern mining algorithms: Apriori, frequent pattern growth(FP-Growth), and sequential pattern discovery using equivalence classes(SPADE). The SPADE algorithm was found to be the …


Retail Data Analytics Using Graph Database, Rashmi Priya Jan 2018

Retail Data Analytics Using Graph Database, Rashmi Priya

Theses and Dissertations--Computer Science

Big data is an area focused on storing, processing and visualizing huge amount of data. Today data is growing faster than ever before. We need to find the right tools and applications and build an environment that can help us to obtain valuable insights from the data. Retail is one of the domains that collects huge amount of transaction data everyday. Retailers need to understand their customer’s purchasing pattern and behavior in order to take better business decisions.

Market basket analysis is a field in data mining, that is focused on discovering patterns in retail’s transaction data. Our goal is …


A Two-Level Clustering Strategy For Energy Performance Evaluation Of University Buildings, Kehua Li, Zhenjun Ma, Duane A. Robinson, Jun Ma Jan 2018

A Two-Level Clustering Strategy For Energy Performance Evaluation Of University Buildings, Kehua Li, Zhenjun Ma, Duane A. Robinson, Jun Ma

Faculty of Engineering and Information Sciences - Papers: Part B

This paper presents a clustering strategy to evaluate the energy performance and identify typical daily load profiles of buildings. The cluster analysis included intra-building clustering and inter-building clustering. The intra-building clustering used Gaussian mixture model clustering to identify the typical daily load profiles of each individual building. The inter-building clustering used hierarchical clustering to further identify the typical daily load profiles of a stock of buildings based on the typical daily load profiles identified for each individual building. The performance of this strategy was tested and evaluated using the two-year hourly electricity consumption data collected from 40 buildings on a …


Scalable Heuristic For Locating Distribution Centers On Real Road Networks, Saeed Ghanbartehrani, Jose David Porter Jan 2018

Scalable Heuristic For Locating Distribution Centers On Real Road Networks, Saeed Ghanbartehrani, Jose David Porter

15th IMHRC Proceedings (Savannah, Georgia. USA – 2018)

The median problem is a type of network location problem that aims at finding a node with the total minimum demand weighted distance to a set of demand points in a weighted graph. In this research, an algorithm for solving the median problem on real road networks is proposed. The proposed algorithm, referred to as the Multi-Threaded Dijkstra’s (MTD) algorithm, is used to locate Walmart distribution centers on the 28-million node road network of the United States with the objective of minimizing the total demand weighted transportation cost. The resulting optimal location configuration of Walmart distribution centers improves the total …


Machine Learning Techniques Implementation In Power Optimization, Data Processing, And Bio-Medical Applications, Khalid Khairullah Mezied Al-Jabery Jan 2018

Machine Learning Techniques Implementation In Power Optimization, Data Processing, And Bio-Medical Applications, Khalid Khairullah Mezied Al-Jabery

Doctoral Dissertations

"The rapid progress and development in machine-learning algorithms becomes a key factor in determining the future of humanity. These algorithms and techniques were utilized to solve a wide spectrum of problems extended from data mining and knowledge discovery to unsupervised learning and optimization. This dissertation consists of two study areas. The first area investigates the use of reinforcement learning and adaptive critic design algorithms in the field of power grid control. The second area in this dissertation, consisting of three papers, focuses on developing and applying clustering algorithms on biomedical data. The first paper presents a novel modelling approach for …


Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara Jan 2018

Offline And Online Density Estimation For Large High-Dimensional Data, Aref Majdara

Dissertations, Master's Theses and Master's Reports

Density estimation has wide applications in machine learning and data analysis techniques including clustering, classification, multimodality analysis, bump hunting and anomaly detection. In high-dimensional space, sparsity of data in local neighborhood makes many of parametric and nonparametric density estimation methods mostly inefficient.

This work presents development of computationally efficient algorithms for high-dimensional density estimation, based on Bayesian sequential partitioning (BSP). Copula transform is used to separate the estimation of marginal and joint densities, with the purpose of reducing the computational complexity and estimation error. Using this separation, a parallel implementation of the density estimation algorithm on a 4-core CPU is …


Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti Jan 2018

Composite Vector Quantization For Optimizing Antenna Locations, Zekeri̇ya Uykan, Riku Jantti

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, we study the location optimization problem of remote antenna units (RAUs) in generalized distributed antenna systems (GDASs). We propose a composite vector quantization (CVQ) algorithm that consists of unsupervised and supervised terms for RAU location optimization. We show that the CVQ can be used i) to minimize an \textit{upper bound} to the cell-averaged SNR error for a desired/demanded location-specific SNR function, and ii) to maximize the cell-averaged \textit{effective} \textit{SNR}. The CVQ-DAS includes the standard VQ, and thus the well-known squared distance criterion (SDC) as a special case. Computer simulations confirm the findings and suggest that the proposed …