Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Graph-Based Acoustic Clustering And Classification, Justin Youngho Sunu Jan 2023

Graph-Based Acoustic Clustering And Classification, Justin Youngho Sunu

CGU Theses & Dissertations

The rapid growth of audio data collection in various domains necessitates advanced techniquesfor efficient analysis and classification. This dissertation proposes new approaches for categorizing acoustic data, using both unsupervised and semi-supervised learning methods. Starting with raw audio, we preprocess the signal to segment it into time windows, each of which we consider as an independent data point. We use the short-time Fourier transform to describe the signal in a given time window as a set of Fourier coefficients. We interpret the resulting frequency signature as a high-dimensional feature description of each data point. We then develop a graph-based approach for …


In Search Of Star Clusters: An Introduction To The K-Means Algorithm, Marcio Nascimento Jan 2022

In Search Of Star Clusters: An Introduction To The K-Means Algorithm, Marcio Nascimento

Journal of Humanistic Mathematics

This article is a gentle introduction to K-means, a mathematical technique of processing data for further classification. We begin with a brief historical introduction, where we find connections with Plato’s Timæus, von Linné’s binomial classification, and the star clustering concept of Mary Sommerville and collaborators. Artificial intelligence algorithms use K-means as a classification methodology to learn about data in a very accurate way, because it is a quantitative procedure based on similarities.


Data-Driven Methods For Low-Energy Nuclear Theory, Jordan M.R. Fox Jan 2022

Data-Driven Methods For Low-Energy Nuclear Theory, Jordan M.R. Fox

CGU Theses & Dissertations

The term data-driven describes computational methods for numerical problem solvingwhich have been developed by the field of data science; these are at the intersection of computer science,mathematics, and statistics. When applied to a domain science like nuclear physics, especially with the goalof deepening scientific insight, data-driven methods form a core pillar of the computational science endeavor.In this dissertation I explore two problems related to theoretical nuclear physics: one in the framework of numerical statistics, and the other in the framework of machine learning. I) Historically our understanding of the structure of the atomic nucleus, the quantum many-body problem, has been …


Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li Jan 2020

Causal Effect Random Forest Of Interaction Trees For Learning Individualized Treatment Regimes In Observational Studies: With Applications To Education Study Data, Luo Li

CGU Theses & Dissertations

Learning individualized treatment regimes (ITR) using observational data holds great interest in various fields, as treatment recommendations based on individual characteristics may improve individual treatment benefits with a reduced cost. It has long been observed that different individuals may respond to a certain treatment with significant heterogeneity. ITR can be defined as a mapping between individual characteristics to a treatment assignment. The optimal ITR is the treatment assignment that maximizes expected individual treatment effects. Rooted from personalized medicine, many studies and applications of ITR are in medical fields and clinical practice. Heterogeneous responses are also well documented in educational interventions. …


The Paradox Of Big Data, Gary N. Smith Jan 2019

The Paradox Of Big Data, Gary N. Smith

Pomona Economics

Data-mining is often used to discover patterns in Big Data. It is tempting believe that because an unearthed pattern is unusual it must be meaningful, but patterns are inevitable in Big Data and usually meaningless. The paradox of Big Data is that data mining is most seductive when there are a large number of variables, but a large number of variables exacerbates the perils of data mining.


A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi Jan 2013

A Machine Learning Approach To Diagnosis Of Parkinson’S Disease, Sumaiya F. Hashmi

CMC Senior Theses

I will investigate applications of machine learning algorithms to medical data, adaptations of differences in data collection, and the use of ensemble techniques.

Focusing on the binary classification problem of Parkinson’s Disease (PD) diagnosis, I will apply machine learning algorithms to a primary dataset consisting of voice recordings from healthy and PD subjects. Specifically, I will use Artificial Neural Networks, Support Vector Machines, and an Ensemble Learning algorithm to reproduce results from [MS12] and [GM09].

Next, I will adapt a secondary regression dataset of PD recordings and combine it with the primary binary classification dataset, testing various techniques to consolidate …


Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch Jan 2004

Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch

CGU Faculty Publications and Research

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly …