Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Chapman University

Theses/Dissertations

Discipline
Keyword
Publication Year
Publication

Articles 1 - 16 of 16

Full-Text Articles in Data Science

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang May 2024

A Novel Correction For The Multivariate Ljung-Box Test, Minhao Huang

Computational and Data Sciences (PhD) Dissertations

This research introduces an analytical improvement to the Multivariate Ljung-Box test that addresses significant deviations of the original test from the nominal Type I error rates under almost all scenarios. Prior attempts to mitigate this issue have been directed at modification of the test statistics or correction of the test distribution to achieve precise results in finite samples. In previous studies, focused on designing corrections to the univariate Ljung-Box, a method that specifically adjusts the test rejection region has been the most successful of attaining the best Type I error rates. We adopt the same approach for the more complex, …


Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto Dec 2023

Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto

Computational and Data Sciences (PhD) Dissertations

This dissertation aims to extend the boundaries of Programming Computable Functions (PCF) by introducing a novel collection of categories referred to as Random Variable Spaces. Originating as a generalization of Quasi-Borel Spaces, Random Variable Spaces are rigorously defined as categories where objects are sets paired with a collection of random variables from an underlying measurable space. These spaces offer a theoretical foundation for extending PCF to natively handle stochastic elements.

The dissertation is structured into seven chapters that provide a multi-disciplinary background, from PCF and Measure Theory to Category Theory with special attention to Monads and the Giry Monad. The …


Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani Aug 2023

Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani

Computational and Data Sciences (PhD) Dissertations

The advent of the Omicron strain of SARS-CoV-2 has elicited apprehension regarding its potential influence on the effectiveness of current vaccines and antibody treatments. The present investigation involved the implementation of mutational scanning analyses to examine the impact of Omicron mutations on the binding affinity of four categories of antibodies that target the Omicron receptor binding domain (RBD) of the Spike protein. The study demonstrates that the Omicron variant harbors 23 unique mutations across the RBD regions I, II, III, and IV. Of these mutations, seven are shared between RBD regions I and II, while three are shared among RBD …


Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali May 2022

Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali

Computational and Data Sciences (PhD) Dissertations

Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …


Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson May 2022

Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson

Computational and Data Sciences (MS) Theses

Free and open source software for statistical modeling and machine learning have advanced productivity in data science significantly. Packages such as SciPy in Python and caret in R provide fundamental tools for statistical modeling and machine learning in the two most popular programming languages used by data scientists. Unfortunately, robust tools similar to these are limited in terms of causal inference. The tools in R that exist lack consistent and standardized methodologies and inputs. R lacks a comprehensive package that offers traditional causal inference methods such as standardization, IP weighting, G-estimation, outcome regression, and propensity matching in one common package. …


An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster Dec 2021

An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster

Computational and Data Sciences (MS) Theses

One of the most common recommendations in healthcare is to simply form healthy habits, but little research has been done to understand the formation and continuation of a healthy habit that isn’t heavily influenced by an individual’s interpretation. Arizona State University’s WalkIT study aimed to analyze how goal setting and financial reinforcement can influence moderate-to-vigorous physical activity (MVPA) in adults, while using data from accelerometers to alleviate individual bias. In this trial, 512 insufficiently active adults were recruited to wear an accelerometer for 1 year and were then randomly assigned to one of the four study groups. Each group had …


Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo Aug 2021

Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo

Computational and Data Sciences (PhD) Dissertations

As Artificial Intelligence (AI) increasingly penetrates all aspects of society, many obstacles emerge. This thesis identifies and discusses the issues facing Computer Vision and significant deficiencies in the Software Development Life-cycle that need to be resolved to facilitate the evolution toward true artificial intelligence. We explicitly review the concepts behind Convolutional Neural Network (CNN) models, the benchmark for computer vision. Chapter 2 highlights the mechanisms that have popularized CNNs while also specifying significant gaps that could garner the model inadequate for future use in safety-critical systems. We put forward two main limitations. Namely, CNNs do not use lack of information …


Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye Aug 2021

Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye

Computational and Data Sciences (PhD) Dissertations

Remote sensing and instrumentation is constantly improving and increasing in capability. Included within this, is the increase in amount of different instrument types, with various combinations of spatial and spectral resolutions, pointing angles, and various other instrument-specific qualities. While the increase in instruments, and therefore datasets, is a boon for those aiming to study the complexities of the various Earth systems, it can also present a large number of new challenges. With this information in mind, our group has set our aims on combining datasets with different spatial and spectral resolutions in an effective and as-general-as-possible way, with as little …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian May 2021

Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian

Computational and Data Sciences (PhD) Dissertations

In this work, first we investigate the use of ECG signal as a biometric in human identification systems using deep learning models. We train convolutional neural network models on ECG samples from approximately 81k patients. Our models achieved an over-all accuracy of 95.69%. Further, we assess the accuracy of our ECG identification model for distinct groups of patients with particular heart conditions and combinations of such conditions. For example, we observed that the identification accuracy was the highest (99.7%) for patients with both ST changes and supraventricular tachycardia. On the other hand, we also found that the identification rate was …


Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett May 2021

Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett

Computational and Data Sciences (PhD) Dissertations

Many cognitive tasks and measures can benefit from trial-level analyses including Item Response Theory models as well as other Bayesian and Machine Learning models. Specifically, this dissertation focuses mainly on task-based measures of metamemory and how within-set variability as well as item-level characteristics can improve the inferences researchers make about these measures.First, a clustering analysis of judgements of learning across a task is examined in order to detect different participant strategies on a metamemory task and whether strategy use differs by age. Second, the benefits of using item response theory models to analyze both individual and item-level differences in metamemory …


Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng May 2021

Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng

Computational and Data Sciences (PhD) Dissertations

This work constitutes six projects. In the first project, a newly inaugurated research database for 12-lead electrocardiogram signals was created under the auspices of Chapman University and Shaoxing People's Hospital (Shaoxing Hospital Zhejiang University School of Medicine). This database aims to enable the scientific community in conducting new studies on arrhythmia and other cardiovascular conditions. In the second project, we created a new 12-lead ECG database under the auspices of Chapman University and Ningbo First Hospital of Zhejiang University that aims to provide high quality data enabling detection of the distinctions between idiopathic ventricular arrhythmia from right ventricular outflow tract …


Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best Jan 2021

Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best

Computational and Data Sciences (PhD) Dissertations

The use of machine learning has risen in recent years, though many areas remain unexplored due to lack of data or lack of computational tools. This dissertation explores machine learning approaches in case studies involving image classification and natural language processing. In addition, a software library in the form of two-way bridge connecting deep learning models in Keras with ones available in the Fortran programming language is also presented.

In Chapter 2, we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software unified modeling language diagrams where data is …


Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett Jan 2021

Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett

Computational and Data Sciences (PhD) Dissertations

This work is a comparative study of different univariate and multivariate time series predictive models as applied to Bitcoin, other cryptocurrencies, and other related financial time series data. ARIMA models, long regarded as the gold standard of univariate financial time series prediction due to both its flexibility and simplicity, are used a baseline for prediction. Given the highly correlative nature amongst different cryptocurrencies, this work aims to show the benefit of forecasting with multivariate time series models—primarily focusing on a novel parameter optimization of VARIMA models outlined in this paper.

These models are trained on 3 years of historical data, …


A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko Aug 2020

A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko

Computational and Data Sciences (PhD) Dissertations

This thesis represents the results of three research projects that underline the breadth and depth of my interests.

Firstly, I devoted some efforts to the well-known Box-Pierce goodness-of-fit tests for time series models which has been an important research topic over the last few decades. All previously proposed tests are focused on changes of the test statistics. Instead, I adopted a different approach that takes the best performing test and modifying the rejection region. Thus, I developed a semiparametric correction of the Adjusted Box-Pierce test that attains the best I error rates for all sample sizes and lags and outperforms …


Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield Aug 2020

Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield

Computational and Data Sciences (PhD) Dissertations

Over the past 100 years, assessment tools have been developed that allow us to explore mental and behavioral processes that could not be measured before. However, conventional statistical models used for psychological data are lacking in thoroughness and predictability. This provides a perfect opportunity to use machine learning to study the data in a novel way. In this paper, we present examples of using machine learning techniques with data in three areas: eating disorders, body satisfaction, and Autism Spectrum Disorder (ASD). We explore clustering algorithms as well as virtual reality (VR).

Our first study employs the k-means clustering algorithm to …