Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Engineering (74)
- Computer Engineering (53)
- Electrical and Computer Engineering (41)
- Artificial Intelligence and Robotics (37)
- Databases and Information Systems (36)
-
- Numerical Analysis and Scientific Computing (22)
- Theory and Algorithms (22)
- Social and Behavioral Sciences (17)
- Life Sciences (16)
- Software Engineering (15)
- Bioinformatics (13)
- Other Computer Sciences (13)
- Data Science (10)
- OS and Networks (10)
- Information Security (9)
- Operations Research, Systems Engineering and Industrial Engineering (8)
- Statistics and Probability (8)
- Applied Mathematics (5)
- Graphics and Human Computer Interfaces (5)
- Mathematics (5)
- Medicine and Health Sciences (5)
- Geography (4)
- Systems Science (4)
- Applied Statistics (3)
- Arts and Humanities (3)
- Business (3)
- Communication (3)
- Computational Biology (3)
- Institution
-
- Singapore Management University (34)
- TÜBİTAK (21)
- Missouri University of Science and Technology (14)
- Brigham Young University (10)
- Selected Works (10)
-
- Wright State University (9)
- Zayed University (9)
- University of South Florida (8)
- New Jersey Institute of Technology (7)
- Old Dominion University (7)
- University of Massachusetts Amherst (6)
- Air Force Institute of Technology (4)
- China Simulation Federation (4)
- Purdue University (4)
- Technological University Dublin (4)
- University of Nevada, Las Vegas (4)
- Chapman University (3)
- Embry-Riddle Aeronautical University (3)
- Nova Southeastern University (3)
- Syracuse University (3)
- University of Kentucky (3)
- University of New Orleans (3)
- University of Texas at El Paso (3)
- University of Wisconsin Milwaukee (3)
- City University of New York (CUNY) (2)
- Edith Cowan University (2)
- Kennesaw State University (2)
- Louisiana Tech University (2)
- Marquette University (2)
- SelectedWorks (2)
- Publication Year
- Publication
-
- Research Collection School Of Computing and Information Systems (33)
- Turkish Journal of Electrical Engineering and Computer Sciences (21)
- Theses and Dissertations (14)
- Doctoral Dissertations (11)
- All Works (9)
-
- Dissertations (8)
- Browse all Theses and Dissertations (7)
- USF Tampa Graduate Theses and Dissertations (7)
- Electrical and Computer Engineering Faculty Research & Creative Works (4)
- Faculty Publications (4)
- Journal of System Simulation (4)
- UNLV Theses, Dissertations, Professional Papers, and Capstones (4)
- CCE Theses and Dissertations (3)
- Computer Science Theses & Dissertations (3)
- Electronic Theses and Dissertations (3)
- Journal of Digital Forensics, Security and Law (3)
- Open Access Dissertations (3)
- University of New Orleans Theses and Dissertations (3)
- Computer Science Faculty Research & Creative Works (2)
- Conference papers (2)
- Davide Andrea Mauro (2)
- Dissertations, Theses, and Capstone Projects (2)
- Electrical & Computer Engineering Faculty Publications (2)
- Faculty and Research Publications (2)
- Graduate College Dissertations and Theses (2)
- Interdisciplinary Informatics Faculty Proceedings & Presentations (2)
- Journal of Spatial Information Science (2)
- Kno.e.sis Publications (2)
- Masters Theses (2)
- Open Access Theses & Dissertations (2)
- Publication Type
- File Type
Articles 1 - 30 of 237
Full-Text Articles in Computer Sciences
Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan
Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan
Al-Azhar Bulletin of Science
Anti-Money Laundering (AML) is a crucial task in ensuring the integrity of financial systems. One keychallenge in AML is identifying high-risk groups based on their behavior. Unsupervised learning, particularly clustering, is a promising solution for this task. However, the use of hundreds of features todescribe behavior results in a highdimensional dataset that negatively impacts clustering performance.In this paper, we investigate the effectiveness of combining clustering method agglomerative hierarchicalclustering with four dimensionality reduction techniques -Independent Component Analysis (ICA), andKernel Principal Component Analysis (KPCA), Singular Value Decomposition (SVD), Locality Preserving Projections (LPP)- to overcome the issue of high-dimensionality in AML data and …
Splitfed-Based Patient Severity Prediction And Utility Maximization In Industrial Healthcare 4.0, Himanshu Singh, Biken Moirangthem, Ajay Pratap, Shilpi Kumari, Abhishek Kumar, Sajal K. Das
Splitfed-Based Patient Severity Prediction And Utility Maximization In Industrial Healthcare 4.0, Himanshu Singh, Biken Moirangthem, Ajay Pratap, Shilpi Kumari, Abhishek Kumar, Sajal K. Das
Computer Science Faculty Research & Creative Works
The healthcare industry has transitioned from traditional healthcare 1.0 to AI-powered healthcare 4.0. However, overall cost for patient treatment remains high and challenging to manage due to the absence of a centralized cost evaluation mechanism before hospital visits. Therefore, in this paper, we devise a cloud-based mechanism to calculate hospitals' star rating based on questionnaire with the application of Z-score and K∗clustering algorithm. To evaluate disease severity at cloud, splitfed technique is utilized in coordination with Wireless Body Area Network (WBAN). Finally, the cloud calculates provisional treatment costs and finds a preferable hospital with a low payable treatment cost and …
Meta-Icvi: Ensemble Validity Metrics For Concise Labeling Of Correct, Under- Or Over-Partitioning In Streaming Clustering, Niklas M. Melton, Sasha A. Petrenko, Donald C. Wunsch
Meta-Icvi: Ensemble Validity Metrics For Concise Labeling Of Correct, Under- Or Over-Partitioning In Streaming Clustering, Niklas M. Melton, Sasha A. Petrenko, Donald C. Wunsch
Electrical and Computer Engineering Faculty Research & Creative Works
Understanding the performance and validity of clustering algorithms is both challenging and crucial, particularly when clustering must be done online. Until recently, most validation methods have relied on batch calculation and have required considerable human expertise in their interpretation. Improving real-time performance and interpretability of cluster validation, therefore, continues to be an important theme in unsupervised learning. Building upon previous work on incremental cluster validity indices (iCVIs), this paper introduces the Meta- iCVI as a tool for explainable and concise labeling of partition quality in online clustering. Leveraging a time-series classifier and data-fusion techniques, the Meta- iCVI combines the outputs …
Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin
Model-Based Deep Autoencoders For Clustering Single-Cell Rna Sequencing Data With Side Information, Xiang Lin
Dissertations
Clustering analysis has been conducted extensively in single-cell RNA sequencing (scRNA-seq) studies. scRNA-seq can profile tens of thousands of genes' activities within a single cell. Thousands or tens of thousands of cells can be captured simultaneously in a typical scRNA-seq experiment. Biologists would like to cluster these cells for exploring and elucidating cell types or subtypes. Numerous methods have been designed for clustering scRNA-seq data. Yet, single-cell technologies develop so fast in the past few years that those existing methods do not catch up with these rapid changes and fail to fully fulfil their potential. For instance, besides profiling transcription …
On Hierarchical Clustering-Based Approach For Rddbs Design, Hassan I. Abdalla, Ali A. Amer, Sri Devi Ravana
On Hierarchical Clustering-Based Approach For Rddbs Design, Hassan I. Abdalla, Ali A. Amer, Sri Devi Ravana
All Works
Distributed database system (DDBS) design is still an open challenge even after decades of research, especially in a dynamic network setting. Hence, to meet the demands of high-speed data gathering and for the management and preservation of huge systems, it is important to construct a distributed database for real-time data storage. Incidentally, some fragmentation schemes, such as horizontal, vertical, and hybrid, are widely used for DDBS design. At the same time, data allocation could not be done without first physically fragmenting the data because the fragmentation process is the foundation of the DDBS design. Extensive research have been conducted to …
Structure Estimation Of Adversarial Distributions For Enhancing Model Robustness: A Clustering-Based Approach, Bader Rasheed, Adil Khan, Asad Masood Khattak
Structure Estimation Of Adversarial Distributions For Enhancing Model Robustness: A Clustering-Based Approach, Bader Rasheed, Adil Khan, Asad Masood Khattak
All Works
In this paper, we propose an advanced method for adversarial training that focuses on leveraging the underlying structure of adversarial perturbation distributions. Unlike conventional adversarial training techniques that consider adversarial examples in isolation, our approach employs clustering algorithms in conjunction with dimensionality reduction techniques to group adversarial perturbations, effectively constructing a more intricate and structured feature space for model training. Our method incorporates density and boundary-aware clustering mechanisms to capture the inherent spatial relationships among adversarial examples. Furthermore, we introduce a strategy for utilizing adversarial perturbations to enhance the delineation between clusters, leading to the formation of more robust and …
A Proposed Artificial Intelligence Model For Android-Malware Detection, Fatma Taher, Omar Al Fandi, Mousa Al Kfairy, Hussam Al Hamadi, Saed Alrabaee
A Proposed Artificial Intelligence Model For Android-Malware Detection, Fatma Taher, Omar Al Fandi, Mousa Al Kfairy, Hussam Al Hamadi, Saed Alrabaee
All Works
There are a variety of reasons why smartphones have grown so pervasive in our daily lives. While their benefits are undeniable, Android users must be vigilant against malicious apps. The goal of this study was to develop a broad framework for detecting Android malware using multiple deep learning classifiers; this framework was given the name DroidMDetection. To provide precise, dynamic, Android malware detection and clustering of different families of malware, the framework makes use of unique methodologies built based on deep learning and natural language processing (NLP) techniques. When compared to other similar works, DroidMDetection (1) uses API calls and …
Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater
Static Malware Family Clustering Via Structural And Functional Characteristics, David George, Andre Mauldin, Josh Mitchell, Sufiyan Mohammed, Robert Slater
SMU Data Science Review
Static and dynamic analyses are the two primary approaches to analyzing malicious applications. The primary distinction between the two is that the application is analyzed without execution in static analysis, whereas the dynamic approach executes the malware and records the behavior exhibited during execution. Although each approach has advantages and disadvantages, dynamic analysis has been more widely accepted and utilized by the research community whereas static analysis has not seen the same attention. This study aims to apply advancements in static analysis techniques to demonstrate the identification of fine-grained functionality, and show, through clustering, how malicious applications may be grouped …
Comparative Study Of Clustering Techniques On Eye-Tracking In Dynamic 3d Virtual Environments, Scott Johnson
Comparative Study Of Clustering Techniques On Eye-Tracking In Dynamic 3d Virtual Environments, Scott Johnson
All Graduate Theses and Dissertations, Spring 1920 to Summer 2023
Eye-tracking has been used for decades to understand how and why an individual focuses on particular objects, areas, and elements of space. A vast body of knowledge exists on how eye-tracking is measured. However, historically, eye-tracking has been predominately studied using 2D environments, with limited work in 3D environments. The purpose of this study is to identify which methods most accurately represent the areas that have captured the participant’s visual attention within a 3D dynamic environment. This will be completed by evaluating different clustering methods of fixations using a customized virtual reality tool that collects eye-tracking data. There exist several …
How To Combine Probabilistic And Fuzzy Uncertainty: Theoretical Explanation Of Clustering-Related Empirical Result, Lázló Szilágyi, Olga Kosheleva, Vladik Kreinovich
How To Combine Probabilistic And Fuzzy Uncertainty: Theoretical Explanation Of Clustering-Related Empirical Result, Lázló Szilágyi, Olga Kosheleva, Vladik Kreinovich
Departmental Technical Reports (CS)
In contrast to crisp clustering techniques that assign each object to a class, fuzzy clustering algorithms assign, to each object and to each class, a degree to which this object belongs to this class. In the most widely used fuzzy clustering algorithm -- fuzzy c-means -- for each object, degrees corresponding to different classes add up to 1. From this viewpoint, these degrees act as probabilities. There exist alternative fuzzy-based clustering techniques in which, in line with the general idea of the fuzzy set, the largest of the degrees is equal to 1. In some practical situations, the probability-type fuzzy …
Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane
Machine Learning And Network Embedding Methods For Gene Co-Expression Networks, Niloofar Aghaieabiane
Dissertations
High-throughput technologies such as DNA microarrays and RNA-seq are used to measure the expression levels of large numbers of genes simultaneously. To support the extraction of biological knowledge, individual gene expression levels are transformed into Gene Co-expression Networks (GCNs). GCNs are analyzed to discover gene modules. GCN construction and analysis is a well-studied topic, for nearly two decades. While new types of sequencing and the corresponding data are now available, the software package WGCNA and its most recent variants are still widely used, contributing to biological discovery.
The discovery of biologically significant modules of genes from raw expression data is …
Comparing Igneous Geochemical Data From Hawaii And Southern California Via Machine Learning, Miro Manestar
Comparing Igneous Geochemical Data From Hawaii And Southern California Via Machine Learning, Miro Manestar
MS in Computer Science Project Reports
Bi-plots are commonly used in geochemical analyses. However, their use can become cumbersome in the case of multi-variate analyses. Therefore, this thesis explores the application of unsupervised machine learning techniques, specifically PCA and K-Means, to analyze large geochemical data sets from two distinct regions, Hawaii and the \acrfull{prb} in Southern California. The IBM Foundational Methodology for Data Science was utilized to ensure proper data preparation and analysis. PCA provided dimensionality reduction, revealing which features correlated most strongly with variances within the data. K-Means clustering allowed for deeper interpretation of the data. The analysis yielded valuable insights into the composition and …
Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch
Analyzing Ground Motion Records With Cvi Fuzzy Art, Dustin Tanksley, Xinzhe Yuan, Genda Chen, Donald C. Wunsch
Civil, Architectural and Environmental Engineering Faculty Research & Creative Works
This paper explores using Cluster Validity Indices Fuzzy Adaptative Resonance Theory (CVI Fuzzy ART) to cluster ground motion records (GMRs). Clustering the features extracted from a supervised network trained for predicting the structure damage results in less overfitting from the trained network. Using Cluster Validity Indices (CVIs) to evaluate the clustering gives feedback to how well the data is being classified, allowing further separation of the data. By using CVI Fuzzy ART in combination with features extracted from a trained Convolutional Neural Network (CNN), we were able to form additional clusters in the data. Within the primary clusters, accuracy was …
Unsupervised Contrastive Representation Learning For Knowledge Distillation And Clustering, Fei Ding
Unsupervised Contrastive Representation Learning For Knowledge Distillation And Clustering, Fei Ding
All Dissertations
Unsupervised contrastive learning has emerged as an important training strategy to learn representation by pulling positive samples closer and pushing negative samples apart in low-dimensional latent space. Usually, positive samples are the augmented versions of the same input and negative samples are from different inputs. Once the low-dimensional representations are learned, further analysis, such as clustering, and classification can be performed using the representations. Currently, there are two challenges in this framework. First, the empirical studies reveal that even though contrastive learning methods show great progress in representation learning on large model training, they do not work well for small …
Missing Value Estimation Using Clustering And Deep Learning Within Multiple Imputation Framework, Manar D. Samad, Sakib Abrar, Norou Diawara
Missing Value Estimation Using Clustering And Deep Learning Within Multiple Imputation Framework, Manar D. Samad, Sakib Abrar, Norou Diawara
Computer Science Faculty Research
Missing values in tabular data restrict the use and performance of machine learning, requiring the imputation of missing values. Arguably the most popular imputation algorithm is multiple imputation by chained equations (MICE), which estimates missing values from linear conditioning on observed values. This paper proposes methods to improve both the imputation accuracy of MICE and the classification accuracy of imputed data by replacing MICE’s linear regressors with ensemble learning and deep neural networks (DNN). The imputation accuracy is further improved by characterizing individual samples with cluster labels (CISCL) obtained from the training data. Our extensive analyses of six tabular data …
K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar
K-Means Clustering Using Gravity Distance, Ajinkya Vishwas Indulkar
Masters Theses & Specialist Projects
Clustering is an important topic in data modeling. K-means Clustering is a well-known partitional clustering algorithm, where a dataset is separated into groups sharing similar properties. Clustering an unbalanced dataset is a challenging problem in data modeling, where some group has a much larger number of data points than others. When a K-means clustering algorithm with Euclidean distance is applied to such data, the algorithm fails to form good clusters. The standard K-means tends to split data into smaller clusters during a clustering process evenly.
We propose a new K-means clustering algorithm to overcome the disadvantage by introducing a different …
Incremental Non-Greedy Clustering At Scale, Nicholas Monath
Incremental Non-Greedy Clustering At Scale, Nicholas Monath
Doctoral Dissertations
Clustering is the task of organizing data into meaningful groups. Modern clustering applications such as entity resolution put several demands on clustering algorithms: (1) scalability to massive numbers of points as well as clusters, (2) incremental additions of data, (3) support for any user-specified similarity functions. Hierarchical clusterings are often desired as they represent multiple alternative flat clusterings (e.g., at different granularity levels). These tree-structured clusterings provide for both fine-grained clusters as well as uncertainty in the presence of newly arriving data. Previous work on hierarchical clustering does not fully address all three of the aforementioned desiderata. Work on incremental …
Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead
Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead
Engineering Faculty Articles and Research
Large amounts of autism spectrum disorder (ASD) data is created through hospitals, therapy centers, and mobile applications; however, much of this rich data does not have pre-existing classes or labels. Large amounts of data—both genetic and behavioral—that are collected as part of scientific studies or a part of treatment can provide a deeper, more nuanced insight into both diagnosis and treatment of ASD. This paper reviews 43 papers using unsupervised machine learning in ASD, including k-means clustering, hierarchical clustering, model-based clustering, and self-organizing maps. The aim of this review is to provide a survey of the current uses of …
Outlier Detection In Energy Datasets, Stephen Crawford
Outlier Detection In Energy Datasets, Stephen Crawford
Honors Projects
In the past decade, numerous datasets have been released with the explicit goal of furthering non-intrusive load monitoring research (NILM). NILM is an energy measurement strategy that seeks to disaggregate building-scale loads. Disaggregation attempts to turn the energy consumption of a building into its constituent appliances. NILM algorithms require representative real-world measurements which has led institutions to publish and share their own datasets. NILM algorithms are designed, trained, and tested using the data presented in a small number of these NILM datasets. Many of the datasets contain arbitrarily selected devices. Likewise, the datasets themselves report aggregate load information from building(s) …
Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown
Topological Hierarchies And Decomposition: From Clustering To Persistence, Kyle A. Brown
Browse all Theses and Dissertations
Hierarchical clustering is a class of algorithms commonly used in exploratory data analysis (EDA) and supervised learning. However, they suffer from some drawbacks, including the difficulty of interpreting the resulting dendrogram, arbitrariness in the choice of cut to obtain a flat clustering, and the lack of an obvious way of comparing individual clusters. In this dissertation, we develop the notion of a topological hierarchy on recursively-defined subsets of a metric space. We look to the field of topological data analysis (TDA) for the mathematical background to associate topological structures such as simplicial complexes and maps of covers to clusters in …
A Brief Comparison Of K-Means And Agglomerative Hierarchical Clustering Algorithms On Small Datasets, Hassan I. Abdalla
A Brief Comparison Of K-Means And Agglomerative Hierarchical Clustering Algorithms On Small Datasets, Hassan I. Abdalla
All Works
In this work, the agglomerative hierarchical clustering and K-means clustering algorithms are implemented on small datasets. Considering that the selection of the similarity measure is a vital factor in data clustering, two measures are used in this study - cosine similarity measure and Euclidean distance - along with two evaluation metrics - entropy and purity - to assess the clustering quality. The datasets used in this work are taken from UCI machine learning depository. The experimental results indicate that k-means clustering outperformed hierarchical clustering in terms of entropy and purity using cosine similarity measure. However, hierarchical clustering outperformed k-means clustering …
Trajectory Design For Uav-Based Data Collection Using Clustering Model In Smart Farming, Tariq Qayyum, Zouheir Trabelsi, Asad Malik, Kadhim Hayawi
Trajectory Design For Uav-Based Data Collection Using Clustering Model In Smart Farming, Tariq Qayyum, Zouheir Trabelsi, Asad Malik, Kadhim Hayawi
All Works
Unmanned aerial vehicles (UAVs) play an important role in facilitating data collection in remote areas due to their remote mobility. The collected data require processing close to the end-user to support delay-sensitive applications. In this paper, we proposed a data collection scheme and scheduling framework for smart farms. We categorized the proposed model into two phases: data collection and data scheduling. In the data collection phase, the IoT sensors are deployed randomly to form a cluster based on their RSSI. The UAV calculates an optimum trajectory in order to gather data from all clusters. The UAV offloads the data to …
Constructing Frameworks For Task-Optimized Visualizations, Ghulam Jilani Abdul Rahim Quadri
Constructing Frameworks For Task-Optimized Visualizations, Ghulam Jilani Abdul Rahim Quadri
USF Tampa Graduate Theses and Dissertations
Visualization is crucial in today’s data-driven world to augment and enhance human understanding and decision-making. Effective visualizations must support accuracy in visual task performance and expressive data communication. Effective visualization design depends on the visual channels used, chart types, or visual tasks. However, design choices and visual judgment are co-related, and effectiveness is not one-dimensional, leading to a significant need to understand the intersection of these factors to create optimized visualizations. Hence, constructing frameworks that consider both design decisions and the task being performed enables optimizing visualization design to maximize efficacy. This dissertation describes experiments, techniques, and user studies to …
Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe
Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe
Research Collection School Of Computing and Information Systems
Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of highquality public health data is a significant challenge in developing countries primarily due to non-diligent data collection by community health workers (CHWs). Our use of the word non-diligence here is to emphasize that poor data collection is often not a deliberate action by CHW but arises due to a myriad of factors, sometime beyond the control of the CHW. In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert’s guidance …
Cluster Hire In Social Networks Using Modified Weighted Structural Clustering Algorithm For Networks (Mwscan), Harshil Patal
Cluster Hire In Social Networks Using Modified Weighted Structural Clustering Algorithm For Networks (Mwscan), Harshil Patal
Electronic Theses and Dissertations
The concept of effective collaboration within a group is immensely used in organizations as a viable means for improving team performance. Any organization or prominent institute, who works with multiple projects needs to hire a group of experts who can complete a set of projects. When hiring a group of experts, numerous considerations must be taken into account. In the Cluster Hire problem, we are given a set of experts, each having a set of skills. Also, we are given a set of projects, each requiring a set of skills. Upon completion of each project, a profit is generated for …
Piecewise Linear Manifold Clustering, Artyom Diky
Piecewise Linear Manifold Clustering, Artyom Diky
Dissertations, Theses, and Capstone Projects
This work studies the application of topological analysis to non-linear manifold clustering. A novel method, that exploits the data clustering structure, allows to generate a topological representation of the point dataset. An analysis of topological construction under different simulated conditions is performed to explore the capabilities and limitations of the method, and demonstrated statistically significant improvements in performance. Furthermore, we introduce a new information-theoretical validation measure for clustering, that exploits geometrical properties of clusters to estimate clustering compressibility, for evaluation of the clustering goodness-of-fit without any prior information about true class assignments. We show how the new validation measure, when …
Automated Parsing Of Flexible Molecular Systems Using Principal Component Analysis And K-Means Clustering Techniques, Matthew J. Nwerem
Automated Parsing Of Flexible Molecular Systems Using Principal Component Analysis And K-Means Clustering Techniques, Matthew J. Nwerem
Computational and Data Sciences (MS) Theses
Computational investigation of molecular structures and reactions of biological and pharmaceutical interests remains a grand scientific challenge due to the size and conformational flexibility of these systems. The work requires parsing and analyzing thousands of conformations in each molecular state for meaningful chemical information and subjecting the ensemble to costly quantum chemical calculations. The current status quo typically involves a manual process where the investigator must look at each conformation, separating each into structural families. This process is time-intensive and tedious, making this process infeasible in some cases, and limiting the ability of theoreticians to study these systems. However, the …
Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern
Analysis Of Music Genre Clustering Algorithms, Samuel Walter Stern
Theses and Dissertations
Classification and clustering of music genres has become an increasingly prevalent focusin recent years, prompting a push for research into relevant algorithms. The most successful algorithms have typically applied the Naive Bayes or k-Nearest Neighbors algorithms, or used Neural Networks to perform classification. This thesis seeks to investigate the use of unsupervised clustering algorithms such as K-Means or Hierarchical clustering, and establish their usefulness in comparison to or conjunction with established methods.
Identifying The Use Of A Park Based On Clusters Of Visitors' Movements From Mobile Phone Data, Roberto Pierdicca, Marina Paolanti, Raffaele Vaira, Ernesto Marcheggiani, Eva Savina Malinverni, Emanuele Frontoni
Identifying The Use Of A Park Based On Clusters Of Visitors' Movements From Mobile Phone Data, Roberto Pierdicca, Marina Paolanti, Raffaele Vaira, Ernesto Marcheggiani, Eva Savina Malinverni, Emanuele Frontoni
Journal of Spatial Information Science
Planning urban parks is a burdensome task, requiring knowledge of countless variables that are impossible to consider all at the same time. One of these variables is the set of people who use the parks. Despite information and communication technologies being a valuable source of data, a standardized method which enables landscape planners to use such information to design urban parks is still broadly missing. The objective of this study is to design an approach that can identify how an urban green park is used by its visitors in order to provide planners and the managing authorities with a standardized …
A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead
A Quantitative Validation Of Multi-Modal Image Fusion And Segmentation For Object Detection And Tracking, Nicholas Lahaye, Michael J. Garay, Brian D. Bue, Hesham El-Askary, Erik Linstead
Mathematics, Physics, and Computer Science Faculty Articles and Research
In previous works, we have shown the efficacy of using Deep Belief Networks, paired with clustering, to identify distinct classes of objects within remotely sensed data via cluster analysis and qualitative analysis of the output data in comparison with reference data. In this paper, we quantitatively validate the methodology against datasets currently being generated and used within the remote sensing community, as well as show the capabilities and benefits of the data fusion methodologies used. The experiments run take the output of our unsupervised fusion and segmentation methodology and map them to various labeled datasets at different levels of global …