Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 36

Full-Text Articles in Physical Sciences and Mathematics

Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead Jan 2022

Applications Of Unsupervised Machine Learning In Autism Spectrum Disorder Research: A Review, Chelsea Parlett-Pelleriti, Elizabeth Stevens, Dennis R. Dixon, Erik J. Linstead

Engineering Faculty Articles and Research

Large amounts of autism spectrum disorder (ASD) data is created through hospitals, therapy centers, and mobile applications; however, much of this rich data does not have pre-existing classes or labels. Large amounts of data—both genetic and behavioral—that are collected as part of scientific studies or a part of treatment can provide a deeper, more nuanced insight into both diagnosis and treatment of ASD. This paper reviews 43 papers using unsupervised machine learning in ASD, including k-means clustering, hierarchical clustering, model-based clustering, and self-organizing maps. The aim of this review is to provide a survey of the current uses of …


Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe Oct 2021

Measuring Data Collection Diligence For Community Healthcare, Galawala Ramesha Samurdhi Karunasena, M. S. Ambiya, Arunesh Sinha, R. Nagar, S. Dalal, Abdullah. H., D. Thakkar, D. Narayanan, M. Tambe

Research Collection School Of Computing and Information Systems

Data analytics has tremendous potential to provide targeted benefit in low-resource communities, however the availability of highquality public health data is a significant challenge in developing countries primarily due to non-diligent data collection by community health workers (CHWs). Our use of the word non-diligence here is to emphasize that poor data collection is often not a deliberate action by CHW but arises due to a myriad of factors, sometime beyond the control of the CHW. In this work, we define and test a data collection diligence score. This challenging unlabeled data problem is handled by building upon domain expert’s guidance …


Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng Aug 2020

Global Optimization Algorithms For Image Registration And Clustering, Cuicui Zheng

Dissertations

Global optimization is a classical problem of finding the minimum or maximum value of an objective function. It has applications in many areas, such as biological image analysis, chemistry, mechanical engineering, financial analysis, deep learning and image processing. For practical applications, it is important to understand the efficiency of global optimization algorithms. This dissertation develops and analyzes some new global optimization algorithms and applies them to practical problems, mainly for image registration and data clustering.

First, the dissertation presents a new global optimization algorithm which approximates the optimum using only function values. The basic idea is to use the points …


Robust Graph Learning From Noisy Data, Zhao Kang, Haiqi Pan, Steven C. H. Hoi, Zenglin Xu May 2020

Robust Graph Learning From Noisy Data, Zhao Kang, Haiqi Pan, Steven C. H. Hoi, Zenglin Xu

Research Collection School Of Computing and Information Systems

Learning graphs from data automatically have shown encouraging performance on clustering and semisupervised learning tasks. However, real data are often corrupted, which may cause the learned graph to be inexact or unreliable. In this paper, we propose a novel robust graph learning scheme to learn reliable graphs from the real-world noisy data by adaptively removing noise and errors in the raw data. We show that our proposed model can also be viewed as a robust version of manifold regularized robust principle component analysis (RPCA), where the quality of the graph plays a critical role. The proposed model is able to …


Developing Agent-Based Models To Study Financial Markets, Saurav Chakraborty Apr 2020

Developing Agent-Based Models To Study Financial Markets, Saurav Chakraborty

USF Tampa Graduate Theses and Dissertations

This dissertation presents research that employs agent-based modelling to provide a framework to support simulation as a complement to traditional economic models for policy evaluation. It consists of three studies. The first study employs cluster analysis to capture the different types of banks and the associated business models that define their decision-making. The results from study one will help us get an understanding of how different banks behave and provide an insight into their lending practices. Hence, it would be very helpful in evaluating and analyzing the impact of future policies. Study two develops a fine-grained interbank lending model based …


Salience-Aware Adaptive Resonance Theory For Large-Scale Sparse Data Clustering, Lei Meng, Ah-Hwee Tan, Chunyan Miao Dec 2019

Salience-Aware Adaptive Resonance Theory For Large-Scale Sparse Data Clustering, Lei Meng, Ah-Hwee Tan, Chunyan Miao

Research Collection School Of Computing and Information Systems

Sparse data is known to pose challenges to cluster analysis, as the similarity between data tends to be ill-posed in the high-dimensional Hilbert space. Solutions in the literature typically extend either k-means or spectral clustering with additional steps on representation learning and/or feature weighting. However, adding these usually introduces new parameters and increases computational cost, thus inevitably lowering the robustness of these algorithms when handling massive ill-represented data. To alleviate these issues, this paper presents a class of self-organizing neural networks, called the salience-aware adaptive resonance theory (SA-ART) model. SA-ART extends Fuzzy ART with measures for cluster-wise salient feature modeling. …


Topicsummary: A Tool For Analyzing Class Discussion Forums Using Topic Based Summarizations, Swapna Gottipati, Venky Shankararaman, Renjini Ramesh Oct 2019

Topicsummary: A Tool For Analyzing Class Discussion Forums Using Topic Based Summarizations, Swapna Gottipati, Venky Shankararaman, Renjini Ramesh

Research Collection School Of Computing and Information Systems

This Innovative Practice full paper, describes the application of text mining techniques for extracting insights from a course based online discussion forum through generation of topic based summaries. Discussions, either in classroom or online provide opportunity for collaborative learning through exchange of ideas that leads to enhanced learning through active participation. Online discussions offer a number of benefits namely providing additional time to reflect and synthesize information before writing, providing a natural platform for students to voice their ideas without any one student dominating the conversation, and providing a record of the student’s thoughts. An online discussion forum provides a …


Redpc: A Residual Error-Based Density Peak Clustering Algorithm, Milan Parmar, Di Wang, Xiaofeng Zhang, Ah-Hwee Tan, Chunyan Miao, You Zhou Jul 2019

Redpc: A Residual Error-Based Density Peak Clustering Algorithm, Milan Parmar, Di Wang, Xiaofeng Zhang, Ah-Hwee Tan, Chunyan Miao, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm was designed to identify arbitrary-shaped clusters by finding density peaks in the underlying dataset. Due to its aptitudes of relatively low computational complexity and a small number of control parameters in use, DPC soon became widely adopted. However, because DPC takes the entire data space into consideration during the computation of local density, which is then used to generate a decision graph for the identification of cluster centroids, DPC may face difficulty in differentiating overlapping clusters and in dealing with low-density data points. In this paper, we propose a residual error-based density peak clustering …


Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao May 2019

Cure: Flexible Categorical Data Representation By Hierarchical Coupling Learning, Songlei Jian, Guansong Pang, Longbing Cao, Kai Lu, Hang Gao

Research Collection School Of Computing and Information Systems

The representation of categorical data with hierarchical value coupling relationships (i.e., various value-to-value cluster interactions) is very critical yet challenging for capturing complex data characteristics in learning tasks. This paper proposes a novel and flexible coupled unsupervised categorical data representation (CURE) framework, which not only captures the hierarchical couplings but is also flexible enough to be instantiated for contrastive learning tasks. CURE first learns the value clusters of different granularities based on multiple value coupling functions and then learns the value representation from the couplings between the obtained value clusters. With two complementary value coupling functions, CURE is instantiated into …


Using Smart Card Data To Model Commuters’ Responses Upon Unexpected Train Delays, Xiancai Tian, Baihua Zheng Dec 2018

Using Smart Card Data To Model Commuters’ Responses Upon Unexpected Train Delays, Xiancai Tian, Baihua Zheng

Research Collection School Of Computing and Information Systems

The mass rapid transit (MRT) network is playing an increasingly important role in Singapore's transit network, thanks to its advantages of higher capacity and faster speed. Unfortunately, due to aging infrastructure, increasing demand, and other reasons like adverse weather condition, commuters in Singapore recently have been facing increasing unexpected train delays (UTDs), which has become a source of frustration for both commuters and operators. Most, if not all, existing works on delay management do not consider commuters' behavior. We dedicate this paper to the study of commuters' behavior during UTDs. We adopt a data-driven approach to analyzing the six-month' real …


Exploiting The Interdependency Of Land Use And Mobility For Urban Planning, Kasthuri Jayarajah, Andrew Tan, Archan Misra Oct 2018

Exploiting The Interdependency Of Land Use And Mobility For Urban Planning, Kasthuri Jayarajah, Andrew Tan, Archan Misra

Research Collection School Of Computing and Information Systems

Urban planners and economists alike have strong interest in understanding the inter-dependency of land use and people flow. The two-pronged problem entails systematic modeling and understanding of how land use impacts crowd flow to an area and in turn, how the influx of people to an area (or lack thereof) can influence the viability of business entities in that area. With cities becoming increasingly sensor-rich, for example, digitized payments for public transportation and constant trajectory tracking of buses and taxis, understanding and modelling crowd flows at the city scale, as well as, at finer granularity such as at the neighborhood …


How Does Developer Interaction Relate To Software Quality? An Examination Of Product Development Data, Subhajit Datta Jun 2018

How Does Developer Interaction Relate To Software Quality? An Examination Of Product Development Data, Subhajit Datta

Research Collection School Of Computing and Information Systems

Industrial software systems are being increasingly developed by large and distributed teams. Tools like collaborative development environments (CDE) are used to facilitate interaction between members of such teams, with the expectation that social factors around the interaction would facilitate team functioning. In this paper, we first identify typically social characteristics of interaction in a software development team: reachability, connection, association, and clustering. We then examine how these factors relate to the quality of software produced by a team, in terms of the number of defects, through an empirical study of 70+ teams, involving 900+ developers in total, spread across 30+ …


Quantitative Phenotype Analysis To Identify, Validate And Compare Rat Disease Models, Yiqing Zhao May 2018

Quantitative Phenotype Analysis To Identify, Validate And Compare Rat Disease Models, Yiqing Zhao

Theses and Dissertations

Introduction

The laboratory rat has been widely used as an animal model in biomedical research. There are many strains exhibiting a wide variety of phenotypes. Capturing these phenotypes in a centralized database provides researchers with an easy method for choosing the appropriate strains for their studies. Current resources such as NBRP and PhysGen provided some preliminary work in rat phenotype databases. However, there are drawbacks in both projects: (1) small number of animals (6 rats) used by NBRP; (2) NBRP project is a one-time effort for each strain; (3) PhysGen web interface only enables queries within a single study – …


Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao Apr 2018

Efficient Reduced Bias Genetic Algorithm For Generic Community Detection Objectives, Aditya Karnam Gururaj Rao

Theses

The problem of community structure identification has been an extensively investigated area for biology, physics, social sciences, and computer science in recent years for studying the properties of networks representing complex relationships. Most traditional methods, such as K-means and hierarchical clustering, are based on the assumption that communities have spherical configurations. Lately, Genetic Algorithms (GA) are being utilized for efficient community detection without imposing sphericity. GAs are machine learning methods which mimic natural selection and scale with the complexity of the network. However, traditional GA approaches employ a representation method that dramatically increases the solution space to be searched by …


A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou Dec 2017

A Novel Density Peak Clustering Algorithm Based On Squared Residual Error, Milan Parmar, Di Wang, Ah-Hwee Tan, Chunyan Miao, Jianhua Jiang, You Zhou

Research Collection School Of Computing and Information Systems

The density peak clustering (DPC) algorithm is designed to quickly identify intricate-shaped clusters with high dimensionality by finding high-density peaks in a non-iterative manner and using only one threshold parameter. However, DPC has certain limitations in processing low-density data points because it only takes the global data density distribution into account. As such, DPC may confine in forming low-density data clusters, or in other words, DPC may fail in detecting anomalies and borderline points. In this paper, we analyze the limitations of DPC and propose a novel density peak clustering algorithm to better handle low-density clustering tasks. Specifically, our algorithm …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Xic Clustering By Baseyian Network, Kyle J. Handy Jan 2017

Xic Clustering By Baseyian Network, Kyle J. Handy

Graduate Student Theses, Dissertations, & Professional Papers

No abstract provided.


Applying Ahp And Clustering Approaches For Public Transportation Decisionmaking: A Case Study Of Isfahan City, Alireza Salavati, Hossein Haghshenas, Bahador Ghadirifaraz, Jamshid Laghaei, Ghodrat Eftekhari Dec 2016

Applying Ahp And Clustering Approaches For Public Transportation Decisionmaking: A Case Study Of Isfahan City, Alireza Salavati, Hossein Haghshenas, Bahador Ghadirifaraz, Jamshid Laghaei, Ghodrat Eftekhari

Journal of Public Transportation

The main purpose of this paper is to define appropriate criteria for the systematic approach to evaluate and prioritize multiple candidate corridors for public transport investment simultaneously to serve travel demand, regarding supply of current public transportation system and road network conditions of Isfahan, Iran. To optimize resource allocation, policymakers need to identify proper corridors to implement a public transportation system. In fact, the main question is to adopt the best public transportation system for each main corridor of Isfahan. In this regard, 137 questionnaires were completed by experts, directors, and policymakers of Isfahan to identify goals and objectives in …


Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya Apr 2016

Unsupervised Learning Framework For Large-Scale Flight Data Analysis Of Cockpit Human Machine Interaction Issues, Abhishek B. Vaidya

Open Access Theses

As the level of automation within an aircraft increases, the interactions between the pilot and autopilot play a crucial role in its proper operation. Issues with human machine interactions (HMI) have been cited as one of the main causes behind many aviation accidents. Due to the complexity of such interactions, it is challenging to identify all possible situations and develop the necessary contingencies. In this thesis, we propose a data-driven analysis tool to identify potential HMI issues in large-scale Flight Operational Quality Assurance (FOQA) dataset. The proposed tool is developed using a multi-level clustering framework, where a set of basic …


Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch Dec 2015

Adaptive Scaling Of Cluster Boundaries For Large-Scale Social Media Data Clustering, Lei Meng, Ah-Hwee Tan, Donald C. Wunsch

Research Collection School Of Computing and Information Systems

The large scale and complex nature of social media data raises the need to scale clustering techniques to big data and make them capable of automatically identifying data clusters with few empirical settings. In this paper, we present our investigation and three algorithms based on the fuzzy adaptive resonance theory (Fuzzy ART) that have linear computational complexity, use a single parameter, i.e., the vigilance parameter to identify data clusters, and are robust to modest parameter settings. The contribution of this paper lies in two aspects. First, we theoretically demonstrate how complement coding, commonly known as a normalization method, changes the …


Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu Sep 2015

Bioinformatics Approaches To Single-Cell Analysis In Developmental Biology, Dicle Yalcin, Zeynep M. Hakguder, Hasan H. Otu

Department of Electrical and Computer Engineering: Faculty Publications

Individual cells within the same population show various degrees of heterogeneity, which may be better handled with single-cell analysis to address biological and clinical questions. Single-cell analysis is especially important in developmental biology as subtle spatial and temporal differences in cells have significant associations with cell fate decisions during differentiation and with the description of a particular state of a cell exhibiting an aberrant phenotype. Biotechnological advances, especially in the area of microfluidics, have led to a robust, massively parallel and multi-dimensional capturing, sorting, and lysis of single-cells and amplification of related macromolecules, which have enabled the use of imaging …


Online Multimodal Co-Indexing And Retrieval Of Weakly Labeled Web Image Collections, Lei Meng, Ah-Hwee Tan, Cyril Leung, Liqiang Nie, Tan-Seng Chua, Chunyan Miao Jun 2015

Online Multimodal Co-Indexing And Retrieval Of Weakly Labeled Web Image Collections, Lei Meng, Ah-Hwee Tan, Cyril Leung, Liqiang Nie, Tan-Seng Chua, Chunyan Miao

Research Collection School Of Computing and Information Systems

Weak supervisory information of web images, such as captions, tags, and descriptions, make it possible to better understand images at the semantic level. In this paper, we propose a novel online multimodal co-indexing algorithm based on Adaptive Resonance Theory, named OMC-ART, for the automatic co-indexing and retrieval of images using their multimodal information. Compared with existing studies, OMC-ART has several distinct characteristics. First, OMCART is able to perform online learning of sequential data. Second, OMC-ART builds a two-layer indexing structure, in which the first layer co-indexes the images by the key visual and textual features based on the generalized distributions …


Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao Jan 2015

Automatically Discovering The Number Of Clusters In Web Page Datasets, Zhongmei Yao

Zhongmei Yao

Clustering is well-suited for Web mining by automatically organizing Web pages into categories, each of which contains Web pages having similar contents. However, one problem in clustering is the lack of general methods to automatically determine the number of categories or clusters. For the Web domain in particular, currently there is no such method suitable for Web page clustering. In an attempt to address this problem, we discover a constant factor that characterizes the Web domain, based on which we propose a new method for automatically determining the number of clusters in Web page data sets. We discover that the …


Dynamic Clustering Of Contextual Multi-Armed Bandits, Trong T. Nguyen, Hady W. Lauw Nov 2014

Dynamic Clustering Of Contextual Multi-Armed Bandits, Trong T. Nguyen, Hady W. Lauw

Research Collection School Of Computing and Information Systems

With the prevalence of the Web and social media, users increasingly express their preferences online. In learning these preferences, recommender systems need to balance the trade-off between exploitation, by providing users with more of the "same", and exploration, by providing users with something "new" so as to expand the systems' knowledge. Multi-armed bandit (MAB) is a framework to balance this trade-off. Most of the previous work in MAB either models a single bandit for the whole population, or one bandit for each user. We propose an algorithm to divide the population of users into multiple clusters, and to customize the …


Scalable Visual Instance Mining With Threads Of Features, Wei Zhang, Hongzhi Li, Chong-Wah Ngo, Shih-Fu Chang Nov 2014

Scalable Visual Instance Mining With Threads Of Features, Wei Zhang, Hongzhi Li, Chong-Wah Ngo, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

We address the problem of visual instance mining, which is to extract frequently appearing visual instances automatically from a multimedia collection. We propose a scalable mining method by exploiting Thread of Features (ToF). Specifically, ToF, a compact representation that links consistent features across images, is extracted to reduce noises, discover patterns, and speed up processing. Various instances, especially small ones, can be discovered by exploiting correlated ToFs. Our approach is significantly more effective than other methods in mining small instances. At the same time, it is also more efficient by requiring much fewer hash tables. We compared with several state-of-the-art …


Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa Jan 2013

Context Aware Privacy Preserving Clustering And Classification, Nirmal Thapa

Theses and Dissertations--Computer Science

Data are valuable assets to any organizations or individuals. Data are sources of useful information which is a big part of decision making. All sectors have potential to benefit from having information. Commerce, health, and research are some of the fields that have benefited from data. On the other hand, the availability of the data makes it easy for anyone to exploit the data, which in many cases are private confidential data. It is necessary to preserve the confidentiality of the data. We study two categories of privacy: Data Value Hiding and Data Pattern Hiding. Privacy is a huge concern …


Extracting And Normalizing Entity-Actions From Users' Comments, Swapna Gottipati, Jing Jiang Dec 2012

Extracting And Normalizing Entity-Actions From Users' Comments, Swapna Gottipati, Jing Jiang

Research Collection School Of Computing and Information Systems

With the growing popularity of opinion-rich resources on the Web, new opportunities and challenges arise and aid people in actively using such information to understand the opinions of others. Opinion mining process currently focuses on extracting the sentiments of the users on products, social, political and economical issues. In many instances, users not only express their sentiments but also contribute their ideas, requests and suggestions through comments. Such comments are useful for domain experts and are referred to as actionable content. Extracting actionable knowledge from online social media has attracted a growing interest from both academia and the industry. We …


A Generalized Cluster Centroid Based Classifier For Text Categorization, Guansong Pang, Shengyi Jiang Nov 2012

A Generalized Cluster Centroid Based Classifier For Text Categorization, Guansong Pang, Shengyi Jiang

Research Collection School Of Computing and Information Systems

In this paper, a Generalized Cluster Centroid based Classifier (GCCC) and its variants for text categorization are proposed by utilizing a clustering algorithm to integrate two wellknown classifiers, i.e., the K-nearest-neighbor (KNN) classifier and the Rocchio classifier. KNN, a lazy learning method, suffers from inefficiency in online categorization while achieving remarkable effectiveness. Rocchio, which has efficient categorization performance, fails to obtain an expressive categorization model due to its inherent linear separability assumption. Our proposed method mainly focuses on two points: one point is that we use a clustering algorithm to strengthen the expressiveness of the Rocchio model; another one is …


Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini Oct 2012

Adaptive Grid Based Localized Learning For Multidimensional Data, Sheetal Saini

Doctoral Dissertations

Rapid advances in data-rich domains of science, technology, and business has amplified the computational challenges of "Big Data" synthesis necessary to slow the widening gap between the rate at which the data is being collected and analyzed for knowledge. This has led to the renewed need for efficient and accurate algorithms, framework, and algorithmic mechanisms essential for knowledge discovery, especially in the domains of clustering, classification, dimensionality reduction, feature ranking, and feature selection. However, data mining algorithms are frequently challenged by the sparseness due to the high dimensionality of the datasets in such domains which is particularly detrimental to the …


The Social Network Of Software Engineering Research, Subhajit Datta, Nishant Kumar, Santonu Sarkar Feb 2012

The Social Network Of Software Engineering Research, Subhajit Datta, Nishant Kumar, Santonu Sarkar

Research Collection School Of Computing and Information Systems

The social network perspective has served as a useful framework for studying scientific research collaboration in different disciplines. Although collaboration in computer science research has received some attention, software engineering research collaboration has remained unexplored to a large extent. In this paper, we examine the collaboration networks based on co-authorship information of papers from ten software engineering publication venues over the 1976-2010 time period. We compare time variations of certain parameters of these networks with corresponding parameters of collaboration networks from other disciplines. We also explore whether software engineering collaboration networks manifest symptoms of the small-world phenomenon, conform to the …