Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Engineering

Meta-Icvi: Ensemble Validity Metrics For Concise Labeling Of Correct, Under- Or Over-Partitioning In Streaming Clustering, Niklas M. Melton, Sasha A. Petrenko, Donald C. Wunsch Jan 2024

Meta-Icvi: Ensemble Validity Metrics For Concise Labeling Of Correct, Under- Or Over-Partitioning In Streaming Clustering, Niklas M. Melton, Sasha A. Petrenko, Donald C. Wunsch

Electrical and Computer Engineering Faculty Research & Creative Works

Understanding the performance and validity of clustering algorithms is both challenging and crucial, particularly when clustering must be done online. Until recently, most validation methods have relied on batch calculation and have required considerable human expertise in their interpretation. Improving real-time performance and interpretability of cluster validation, therefore, continues to be an important theme in unsupervised learning. Building upon previous work on incremental cluster validity indices (iCVIs), this paper introduces the Meta- iCVI as a tool for explainable and concise labeling of partition quality in online clustering. Leveraging a time-series classifier and data-fusion techniques, the Meta- iCVI combines the outputs …


Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch Jan 2023

Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch

Civil, Architectural and Environmental Engineering Faculty Research & Creative Works

This paper explores using Cluster Validity Indices Fuzzy Adaptative Resonance Theory (CVI Fuzzy ART) to cluster ground motion records (GMRs). Clustering the features extracted from a supervised network trained for predicting the structure damage results in less overfitting from the trained network. Using Cluster Validity Indices (CVIs) to evaluate the clustering gives feedback to how well the data is being classified, allowing further separation of the data. By using CVI Fuzzy ART in combination with features extracted from a trained Convolutional Neural Network (CNN), we were able to form additional clusters in the data. Within the primary clusters, accuracy was …


K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar Apr 2022

K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar

Masters Theses & Specialist Projects

Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.

We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different …


Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead Jan 2022

Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead

Engineering Faculty Articles and Research

Large amounts of autism spectrum disorder (ASD) data is created through hospitals, therapy centers, and mobile applications; however, much of this rich data does not have pre-existing classes or labels. Large amounts of data—both genetic and behavioral—that are collected as part of scientific studies or a part of treatment can provide a deeper, more nuanced insight into both diagnosis and treatment of ASD. This paper reviews 43 papers using unsupervised machine learning in ASD, including k-means clustering, hierarchical clustering, model-based clustering, and self-organizing maps. The aim of this review is to provide a survey of the current uses of …


A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead Jun 2021

A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead

Mathematics, Physics, and Computer Science Faculty Articles and Research

In previous works, we have shown the efficacy of using Deep Belief Networks, paired with clustering, to identify distinct classes of objects within remotely sensed data via cluster analysis and qualitative analysis of the output data in comparison with reference data. In this paper, we quantitatively validate the methodology against datasets currently being generated and used within the remote sensing community, as well as show the capabilities and benefits of the data fusion methodologies used. The experiments run take the output of our unsupervised fusion and segmentation methodology and map them to various labeled datasets at different levels of global …


Can Generative Adversarial Networks Help Us Fight Financial Fraud?, Sean Mciver Jan 2021

Can Generative Adversarial Networks Help Us Fight Financial Fraud?, Sean Mciver

Dissertations

Transactional fraud datasets exhibit extreme class imbalance. Learners cannot make accurate generalizations without sufficient data. Researchers can account for imbalance at the data level, algorithmic level or both. This paper focuses on techniques at the data level. We evaluate the evidence of the optimal technique and potential enhancements. Global fraud losses totalled more than 80 % of the UK’s GDP in 2019. The improvement of preprocessing is inherently valuable in fighting these losses. Synthetic minority oversampling technique (SMOTE) and extensions of SMOTE are currently the most common preprocessing strategies. SMOTE oversamples the minority classes by randomly generating a point between …


An Explainable And Statistically Validated Ensemble Clustering Model Applied To The Identification Of Traumatic Brain Injury Subgroups, Dacosta Yeboah, Louis Steinmeister, Daniel B. Hier, Bassam Hadi, Donald C. Wunsch, Gayla R. Olbricht, Tayo Obafemi-Ajayi Sep 2020

An Explainable And Statistically Validated Ensemble Clustering Model Applied To The Identification Of Traumatic Brain Injury Subgroups, Dacosta Yeboah, Louis Steinmeister, Daniel B. Hier, Bassam Hadi, Donald C. Wunsch, Gayla R. Olbricht, Tayo Obafemi-Ajayi

Electrical and Computer Engineering Faculty Research & Creative Works

We present a framework for an explainable and statistically validated ensemble clustering model applied to Traumatic Brain Injury (TBI). The objective of our analysis is to identify patient injury severity subgroups and key phenotypes that delineate these subgroups using varied clinical and computed tomography data. Explainable and statistically-validated models are essential because a data-driven identification of subgroups is an inherently multidisciplinary undertaking. In our case, this procedure yielded six distinct patient subgroups with respect to mechanism of injury, severity of presentation, anatomy, psychometric, and functional outcome. This framework for ensemble cluster analysis fully integrates statistical methods at several stages of …


Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao May 2019

Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao

Research Collection School Of Computing and Information Systems

The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into …


Shape Analysis Of Traffic Flow Curves Using A Hybrid Computational Analysis, Wasim Irshad Kayani, Shikhar P. Acharya, Ivan G. Guardiola, Donald C. Wunsch, B. Schumacher, Isaac Wagner-Muns Nov 2016

Shape Analysis Of Traffic Flow Curves Using A Hybrid Computational Analysis, Wasim Irshad Kayani, Shikhar P. Acharya, Ivan G. Guardiola, Donald C. Wunsch, B. Schumacher, Isaac Wagner-Muns

Engineering Management and Systems Engineering Faculty Research & Creative Works

This paper highlights and validates the use of shape analysis using Mathematical Morphology tools as a means to develop meaningful clustering of historical data. Furthermore, through clustering more appropriate grouping can be accomplished that can result in the better parameterization or estimation of models. This results in more effective prediction model development. Hence, in an effort to highlight this within the research herein, a Back-Propagation Neural Network is used to validate the classification achieved through the employment of MM tools. Specifically, the Granulometric Size Distribution (GSD) is used to achieve clustering of daily traffic flow patterns based solely on their …


Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello Jun 2016

Optimizing Main Memory Usage In Modern Computing Systems To Improve Overall System Performance, Daniel Jose Campello

FIU Electronic Theses and Dissertations

Operating Systems use fast, CPU-addressable main memory to maintain an application’s temporary data as anonymous data and to cache copies of persistent data stored in slower block-based storage devices. However, the use of this faster memory comes at a high cost. Therefore, several techniques have been implemented to use main memory more efficiently in the literature. In this dissertation we introduce three distinct approaches to improve overall system performance by optimizing main memory usage.

First, DRAM and host-side caching of file system data are used for speeding up virtual machine performance in today’s virtualized data centers. The clustering of VM …


Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch Dec 2015

Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch

Research Collection School Of Computing and Information Systems

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the …


Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu Sep 2015

Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu

Department of Electrical and Computer Engineering: Faculty Publications

Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging …


Clustering Data Of Mixed Categorical And Numerical Type With Unsupervised Feature Learning, Dao Lam, Mingzhen Wei, Donald C. Wunsch Sep 2015

Clustering Data Of Mixed Categorical And Numerical Type With Unsupervised Feature Learning, Dao Lam, Mingzhen Wei, Donald C. Wunsch

Geosciences and Geological and Petroleum Engineering Faculty Research & Creative Works

Mixed-type categorical and numerical data are a challenge in many applications. This general area of mixed-type data is among the frontier areas, where computational intelligence approaches are often brittle compared with the capabilities of living creatures. In this paper, unsupervised feature learning (UFL) is applied to the mixed-type data to achieve a sparse representation, which makes it easier for clustering algorithms to separate the data. Unlike other UFL methods that work with homogeneous data, such as image and video data, the presented UFL works with the mixed-type data using fuzzy adaptive resonance theory (ART). UFL with fuzzy ART (UFLA) obtains …


Hidden Markov Model With Information Criteria Clustering And Extreme Learning Machine Regression For Wind Forecasting, Dao Lam, Shuhui Li, Donald C. Wunsch Jan 2014

Hidden Markov Model With Information Criteria Clustering And Extreme Learning Machine Regression For Wind Forecasting, Dao Lam, Shuhui Li, Donald C. Wunsch

Electrical and Computer Engineering Faculty Research & Creative Works

This paper proposes a procedural pipeline for wind forecasting based on clustering and regression. First, the data are clustered into groups sharing similar dynamic properties. Then, data in the same cluster are used to train the neural network that predicts wind speed. For clustering, a hidden Markov model (HMM) and the modified Bayesian information criteria (BIC) are incorporated in a new method of clustering time series data. to forecast wind, a new method for wind time series data forecasting is developed based on the extreme learning machine (ELM). the clustering results improve the accuracy of the proposed method of wind …


Seasonal Adaptation Of Vegetation Color In Satellite Images For Flight Simulations, Yuzhong Shen, Jiang Li, Vamsi Mantena, Srinivas Jakkula Jan 2009

Seasonal Adaptation Of Vegetation Color In Satellite Images For Flight Simulations, Yuzhong Shen, Jiang Li, Vamsi Mantena, Srinivas Jakkula

Electrical & Computer Engineering Faculty Publications

Automatic vegetation identification plays an important role in many applications including remote sensing and high performance flight simulations. This paper proposes a novel method that identifies vegetative areas in satellite images and then alters vegetation color to simulate seasonal changes based on training image pairs. The proposed method first generates a vegetation map for pixels corresponding to vegetative areas, using ISODATA clustering and vegetation classification. The ISODATA algorithm determines the number of clusters automatically. We then apply morphological operations to the clustered images to smooth the boundaries between clusters and to fill holes inside clusters. Six features are then computed …


Vegetation Identification Based On Satellite Imagery, Vamsi K.R. Mantena, Ramu Pedada, Srinivas Jakkula, Yuzhong Shen, Jiang Li, Hamid R. Arabnia (Ed.) Jan 2008

Vegetation Identification Based On Satellite Imagery, Vamsi K.R. Mantena, Ramu Pedada, Srinivas Jakkula, Yuzhong Shen, Jiang Li, Hamid R. Arabnia (Ed.)

Electrical & Computer Engineering Faculty Publications

Automatic vegetation identification plays an important role in many applications including remote sensing and high performance flight simulations. This paper presents a method to automatically identify vegetation based upon satellite imagery. First, we utilize the ISODATA algorithm to cluster pixels in the images where the number of clusters is determined by the algorithm. We then apply morphological operations to the clustered images to smooth the boundaries between clusters and to fill holes inside clusters. After that, we compute six features for each cluster. These six features then go through a feature selection algorithm and three of them are determined to …


Development And Implementation Of Optimized Energy-Delay Sub-Network Routing Protocol For Wireless Sensor Networks, Maciej Jan Zawodniok, Jagannathan Sarangapani, Steve Eugene Watkins, James W. Fonda Jan 2006

Development And Implementation Of Optimized Energy-Delay Sub-Network Routing Protocol For Wireless Sensor Networks, Maciej Jan Zawodniok, Jagannathan Sarangapani, Steve Eugene Watkins, James W. Fonda

Electrical and Computer Engineering Faculty Research & Creative Works

The development and implementation of the optimized energy-delay sub-network routing (OEDSR) protocol for wireless sensor networks (WSN) is presented. This ondemand routing protocol minimizes a novel link cost factor which is defined using available energy, end-to-end (E2E) delay and distance from a node to the base station (BS), along with clustering, to effectively route information to the BS. Initially, the nodes are either in idle or sleep mode, but once an event is detected, the nodes near the event become active and start forming sub-networks. Formation of the inactive network into a sub-network saves energy because only a portion of …


Modified Art 2a Growing Network Capable Of Generating A Fixed Number Of Nodes, Ji He, Ah-Hwee Tan, Chew-Lim Tan May 2004

Modified Art 2a Growing Network Capable Of Generating A Fixed Number Of Nodes, Ji He, Ah-Hwee Tan, Chew-Lim Tan

Research Collection School Of Computing and Information Systems

This paper introduces the Adaptive Resonance Theory under Constraint (ART-C 2A) learning paradigm based on ART 2A, which is capable of generating a user-defined number of recognition nodes through online estimation of an appropriate vigilance threshold. Empirical experiments compare the cluster validity and the learning efficiency of ART-C 2A with those of ART 2A, as well as three closely related clustering methods, namely online K-Means, batch K-Means, and SOM, in a quantitative manner. Besides retaining the online cluster creation capability of ART 2A, ART-C 2A gives the alternative clustering solution, which allows a direct control on the number of output …