Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Chapman University

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 28 of 28

Full-Text Articles in Data Science

Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto Dec 2023

Random Variable Spaces: Mathematical Properties And An Extension To Programming Computable Functions, Mohammed Kurd-Misto

Computational and Data Sciences (PhD) Dissertations

This dissertation aims to extend the boundaries of Programming Computable Functions (PCF) by introducing a novel collection of categories referred to as Random Variable Spaces. Originating as a generalization of Quasi-Borel Spaces, Random Variable Spaces are rigorously defined as categories where objects are sets paired with a collection of random variables from an underlying measurable space. These spaces offer a theoretical foundation for extending PCF to natively handle stochastic elements.

The dissertation is structured into seven chapters that provide a multi-disciplinary background, from PCF and Measure Theory to Category Theory with special attention to Monads and the Giry Monad. The …


Verifying Empirical Predictive Modeling Of Societal Vulnerability To Hazardous Events: A Monte Carlo Experimental Approach, Yi Victor Wang, Seung Hee Kim, Menas C. Kafatos Aug 2023

Verifying Empirical Predictive Modeling Of Societal Vulnerability To Hazardous Events: A Monte Carlo Experimental Approach, Yi Victor Wang, Seung Hee Kim, Menas C. Kafatos

Institute for ECHO Articles and Research

With the emergence of large amounts of historical records on adverse impacts of hazardous events, empirical predictive modeling has been revived as a foundational paradigm for quantifying disaster vulnerability of societal systems. This paradigm models societal vulnerability to hazardous events as a vulnerability curve indicating an expected loss rate of a societal system with respect to a possible spectrum of intensity measure (IM) of an event. Although the empirical predictive models (EPMs) of societal vulnerability are calibrated on historical data, they should not be experimentally tested with data derived from field experiments on any societal system. Alternatively, in this paper, …


Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani Aug 2023

Computational Analysis Of Antibody Binding Mechanisms To The Omicron Rbd Of Sars-Cov-2 Spike Protein: Identification Of Epitopes And Hotspots For Developing Effective Therapeutic Strategies, Mohammed Alshahrani

Computational and Data Sciences (PhD) Dissertations

The advent of the Omicron strain of SARS-CoV-2 has elicited apprehension regarding its potential influence on the effectiveness of current vaccines and antibody treatments. The present investigation involved the implementation of mutational scanning analyses to examine the impact of Omicron mutations on the binding affinity of four categories of antibodies that target the Omicron receptor binding domain (RBD) of the Spike protein. The study demonstrates that the Omicron variant harbors 23 unique mutations across the RBD regions I, II, III, and IV. Of these mutations, seven are shared between RBD regions I and II, while three are shared among RBD …


Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama Jul 2023

Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama

Business Faculty Articles and Research

We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral context vector independent of ideology, and a latent position vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents …


Dense & Attention Convolutional Neural Networks For Toe Walking Recognition, Junde Chen, Rahul Soangra, Marybeth Grant-Beuttler, Y. A. Nanehkaran, Yuxin Wen May 2023

Dense & Attention Convolutional Neural Networks For Toe Walking Recognition, Junde Chen, Rahul Soangra, Marybeth Grant-Beuttler, Y. A. Nanehkaran, Yuxin Wen

Physical Therapy Faculty Articles and Research

Idiopathic toe walking (ITW) is a gait disorder where children’s initial contacts show limited or no heel touch during the gait cycle. Toe walking can lead to poor balance, increased risk of falling or tripping, leg pain, and stunted growth in children. Early detection and identification can facilitate targeted interventions for children diagnosed with ITW. This study proposes a new one-dimensional (1D) Dense & Attention convolutional network architecture, which is termed as the DANet, to detect idiopathic toe walking. The dense block is integrated into the network to maximize information transfer and avoid missed features. Further, the attention modules are …


Text And Data Mining Applications For Teaching Music Bibliography, Taylor Greene, Laurie Sampsel Mar 2023

Text And Data Mining Applications For Teaching Music Bibliography, Taylor Greene, Laurie Sampsel

Library Presentations, Posters, and Audiovisual Materials

Text and data mining (TDM) is a process of increasing interdisciplinary potential and one with many practical applications for music graduate students. TDM, however, remains a topic rarely introduced in the music bibliography course. Understandably, talk of artificial intelligence, algorithms, and programming languages are intimidating to music students, but thanks to software applications, knowledge about these computer science topics are not required to participate in research using TDM. This presentation explores ways to introduce digital humanities to music students through TDM.

In our presentation, we will discuss two approaches to incorporating TDM into the music bibliography course, focusing on two …


Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead Aug 2022

Automated Identification Of Astronauts On Board The International Space Station: A Case Study In Space Archaeology, Rao Hamza Ali, Amir Kanan Kashefi, Alice C. Gorman, Justin St. P. Walsh, Erik J. Linstead

Art Faculty Articles and Research

We develop and apply a deep learning-based computer vision pipeline to automatically identify crew members in archival photographic imagery taken on-board the International Space Station. Our approach is able to quickly tag thousands of images from public and private photo repositories without human supervision with high degrees of accuracy, including photographs where crew faces are partially obscured. Using the results of our pipeline, we carry out a large-scale network analysis of the crew, using the imagery data to provide novel insights into the social interactions among crew during their missions.


A Comparative Study On Deep Learning Models For Text Classification Of Unstructured Medical Notes With Various Levels Of Class Imbalance, Hongxia Lu, Louis Ehwerhemuepha, Cyril Rakovski Jul 2022

A Comparative Study On Deep Learning Models For Text Classification Of Unstructured Medical Notes With Various Levels Of Class Imbalance, Hongxia Lu, Louis Ehwerhemuepha, Cyril Rakovski

Mathematics, Physics, and Computer Science Faculty Articles and Research

Background

Discharge medical notes written by physicians contain important information about the health condition of patients. Many deep learning algorithms have been successfully applied to extract important information from unstructured medical notes data that can entail subsequent actionable results in the medical domain. This study aims to explore the model performance of various deep learning algorithms in text classification tasks on medical notes with respect to different disease class imbalance scenarios.

Methods

In this study, we employed seven artificial intelligence models, a CNN (Convolutional Neural Network), a Transformer encoder, a pretrained BERT (Bidirectional Encoder Representations from Transformers), and four typical …


Assessing The Reidentification Risks Posed By Deep Learning Algorithms Applied To Ecg Data, Arin Ghazarian, Jianwei Zheng, Daniele Struppa, Cyril Rakovski Jun 2022

Assessing The Reidentification Risks Posed By Deep Learning Algorithms Applied To Ecg Data, Arin Ghazarian, Jianwei Zheng, Daniele Struppa, Cyril Rakovski

Mathematics, Physics, and Computer Science Faculty Articles and Research

ECG (Electrocardiogram) data analysis is one of the most widely used and important tools in cardiology diagnostics. In recent years the development of advanced deep learning techniques and GPU hardware have made it possible to train neural network models that attain exceptionally high levels of accuracy in complex tasks such as heart disease diagnoses and treatments. We investigate the use of ECGs as biometrics in human identification systems by implementing state-of-the-art deep learning models. We train convolutional neural network models on approximately 81k patients from the US, Germany and China. Currently, this is the largest research project on ECG identification. …


A Large-Scale Sentiment Analysis Of Tweets Pertaining To The 2020 Us Presidential Election, Rao Hamza Ali, Gabriela Pinto, Evelyn Lawrie, Erik J. Linstead Jun 2022

A Large-Scale Sentiment Analysis Of Tweets Pertaining To The 2020 Us Presidential Election, Rao Hamza Ali, Gabriela Pinto, Evelyn Lawrie, Erik J. Linstead

Engineering Faculty Articles and Research

We capture the public sentiment towards candidates in the 2020 US Presidential Elections, by analyzing 7.6 million tweets sent out between October 31st and November 9th, 2020. We apply a novel approach to first identify tweets and user accounts in our database that were later deleted or suspended from Twitter. This approach allows us to observe the sentiment held for each presidential candidate across various groups of users and tweets: accessible tweets and accounts, deleted tweets and accounts, and suspended or inaccessible tweets and accounts. We compare the sentiment scores calculated for these groups and provide key insights into the …


A Novel Correction For The Adjusted Box-Pierce Test, Sidy Danioko, Jianwei Zheng, Kyle Anderson, Alexander Barrett, Cyril S. Rakovski May 2022

A Novel Correction For The Adjusted Box-Pierce Test, Sidy Danioko, Jianwei Zheng, Kyle Anderson, Alexander Barrett, Cyril S. Rakovski

Mathematics, Physics, and Computer Science Faculty Articles and Research

The classical Box-Pierce and Ljung-Box tests for auto-correlation of residuals possess severe deviations from nominal type I error rates. Previous studies have attempted to address this issue by either revising existing tests or designing new techniques. The Adjusted Box-Pierce achieves the best results with respect to attaining type I error rates closer to nominal values. This research paper proposes a further correction to the adjusted Box-Pierce test that possesses near perfect type I error rates. The approach is based on an inflation of the rejection region for all sample sizes and lags calculated via a linear model applied to simulated …


Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali May 2022

Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali

Computational and Data Sciences (PhD) Dissertations

Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …


Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson May 2022

Causalmodels: An R Library For Estimating Causal Effects, Joshua Wolff Anderson

Computational and Data Sciences (MS) Theses

Free and open source software for statistical modeling and machine learning have advanced productivity in data science significantly. Packages such as SciPy in Python and caret in R provide fundamental tools for statistical modeling and machine learning in the two most popular programming languages used by data scientists. Unfortunately, robust tools similar to these are limited in terms of causal inference. The tools in R that exist lack consistent and standardized methodologies and inputs. R lacks a comprehensive package that offers traditional causal inference methods such as standardization, IP weighting, G-estimation, outcome regression, and propensity matching in one common package. …


An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster Dec 2021

An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster

Computational and Data Sciences (MS) Theses

One of the most common recommendations in healthcare is to simply form healthy habits, but little research has been done to understand the formation and continuation of a healthy habit that isn’t heavily influenced by an individual’s interpretation. Arizona State University’s WalkIT study aimed to analyze how goal setting and financial reinforcement can influence moderate-to-vigorous physical activity (MVPA) in adults, while using data from accelerometers to alleviate individual bias. In this trial, 512 insufficiently active adults were recruited to wear an accelerometer for 1 year and were then randomly assigned to one of the four study groups. Each group had …


Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen Nov 2021

Pre-Earthquake Ionospheric Perturbation Identification Using Cses Data Via Transfer Learning, Pan Xiong, Cheng Long, Huiyu Zhou, Roberto Battiston, Angelo De Santis, Dimitar Ouzounov, Xuemin Zhang, Xuhui Shen

Mathematics, Physics, and Computer Science Faculty Articles and Research

During the lithospheric buildup to an earthquake, complex physical changes occur within the earthquake hypocenter. Data pertaining to the changes in the ionosphere may be obtained by satellites, and the analysis of data anomalies can help identify earthquake precursors. In this paper, we present a deep-learning model, SeqNetQuake, that uses data from the first China Seismo-Electromagnetic Satellite (CSES) to identify ionospheric perturbations prior to earthquakes. SeqNetQuake achieves the best performance [F-measure (F1) = 0.6792 and Matthews correlation coefficient (MCC) = 0.427] when directly trained on the CSES dataset with a spatial window centered on the earthquake epicenter with the Dobrovolsky …


Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo Aug 2021

Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo

Computational and Data Sciences (PhD) Dissertations

As Artificial Intelligence (AI) increasingly penetrates all aspects of society, many obstacles emerge. This thesis identifies and discusses the issues facing Computer Vision and significant deficiencies in the Software Development Life-cycle that need to be resolved to facilitate the evolution toward true artificial intelligence. We explicitly review the concepts behind Convolutional Neural Network (CNN) models, the benchmark for computer vision. Chapter 2 highlights the mechanisms that have popularized CNNs while also specifying significant gaps that could garner the model inadequate for future use in safety-critical systems. We put forward two main limitations. Namely, CNNs do not use lack of information …


Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye Aug 2021

Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye

Computational and Data Sciences (PhD) Dissertations

Remote sensing and instrumentation is constantly improving and increasing in capability. Included within this, is the increase in amount of different instrument types, with various combinations of spatial and spectral resolutions, pointing angles, and various other instrument-specific qualities. While the increase in instruments, and therefore datasets, is a boon for those aiming to study the complexities of the various Earth systems, it can also present a large number of new challenges. With this information in mind, our group has set our aims on combining datasets with different spatial and spectral resolutions in an effective and as-general-as-possible way, with as little …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian May 2021

Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian

Computational and Data Sciences (PhD) Dissertations

In this work, first we investigate the use of ECG signal as a biometric in human identification systems using deep learning models. We train convolutional neural network models on ECG samples from approximately 81k patients. Our models achieved an over-all accuracy of 95.69%. Further, we assess the accuracy of our ECG identification model for distinct groups of patients with particular heart conditions and combinations of such conditions. For example, we observed that the identification accuracy was the highest (99.7%) for patients with both ST changes and supraventricular tachycardia. On the other hand, we also found that the identification rate was …


Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett May 2021

Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett

Computational and Data Sciences (PhD) Dissertations

Many cognitive tasks and measures can benefit from trial-level analyses including Item Response Theory models as well as other Bayesian and Machine Learning models. Specifically, this dissertation focuses mainly on task-based measures of metamemory and how within-set variability as well as item-level characteristics can improve the inferences researchers make about these measures.First, a clustering analysis of judgements of learning across a task is examined in order to detect different participant strategies on a metamemory task and whether strategy use differs by age. Second, the benefits of using item response theory models to analyze both individual and item-level differences in metamemory …


Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng May 2021

Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng

Computational and Data Sciences (PhD) Dissertations

This work constitutes six projects. In the first project, a newly inaugurated research database for 12-lead electrocardiogram signals was created under the auspices of Chapman University and Shaoxing People's Hospital (Shaoxing Hospital Zhejiang University School of Medicine). This database aims to enable the scientific community in conducting new studies on arrhythmia and other cardiovascular conditions. In the second project, we created a new 12-lead ECG database under the auspices of Chapman University and Ningbo First Hospital of Zhejiang University that aims to provide high quality data enabling detection of the distinctions between idiopathic ventricular arrhythmia from right ventricular outflow tract …


The Agnostic Structure Of Data Science Methods, Domenico Napoletani, Marco Panza, Daniele Struppa Apr 2021

The Agnostic Structure Of Data Science Methods, Domenico Napoletani, Marco Panza, Daniele Struppa

MPP Published Research

In this paper we argue that data science is a coherent and novel approach to empirical problems that, in its most general form, does not build understanding about phenomena. Within the new type of mathematization at work in data science, mathematical methods are not selected because of any relevance for a problem at hand; mathematical methods are applied to a specific problem only by `forcing’, i.e. on the basis of their ability to reorganize the data for further analysis and the intrinsic richness of their mathematical structure. In particular, we argue that deep learning neural networks are best understood within …


Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best Jan 2021

Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best

Computational and Data Sciences (PhD) Dissertations

The use of machine learning has risen in recent years, though many areas remain unexplored due to lack of data or lack of computational tools. This dissertation explores machine learning approaches in case studies involving image classification and natural language processing. In addition, a software library in the form of two-way bridge connecting deep learning models in Keras with ones available in the Fortran programming language is also presented.

In Chapter 2, we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software unified modeling language diagrams where data is …


Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett Jan 2021

Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett

Computational and Data Sciences (PhD) Dissertations

This work is a comparative study of different univariate and multivariate time series predictive models as applied to Bitcoin, other cryptocurrencies, and other related financial time series data. ARIMA models, long regarded as the gold standard of univariate financial time series prediction due to both its flexibility and simplicity, are used a baseline for prediction. Given the highly correlative nature amongst different cryptocurrencies, this work aims to show the benefit of forecasting with multivariate time series models—primarily focusing on a novel parameter optimization of VARIMA models outlined in this paper.

These models are trained on 3 years of historical data, …


Spatial Frequency Implications For Global And Local Processing In Autistic Children, Riya Mody, Ayra Tusneem, Louanne Boyd, Vincent Berardi Dec 2020

Spatial Frequency Implications For Global And Local Processing In Autistic Children, Riya Mody, Ayra Tusneem, Louanne Boyd, Vincent Berardi

Student Scholar Symposium Abstracts and Posters

Visual processing in humans is done by integrating and updating multiple streams of global and local sensory input. Interaction between these two systems can be disrupted in individuals with ASD and other learning disabilities. When this integration is not done smoothly, it becomes difficult to see the “big picture”, which has been found to have implications on emotion recognition, social skills, and conversation skills. An example of this phenomenon is local interference, which is when local details are prioritized over the global features. Previous research in this field has aimed to decrease local interference by developing and evaluating a filter …


A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko Aug 2020

A Novel Correction For The Adjusted Box-Pierce Test — New Risk Factors For Emergency Department Return Visits Within 72 Hours For Children With Respiratory Conditions — General Pediatric Model For Understanding And Predicting Prolonged Length Of Stay, Sidy Danioko

Computational and Data Sciences (PhD) Dissertations

This thesis represents the results of three research projects that underline the breadth and depth of my interests.

Firstly, I devoted some efforts to the well-known Box-Pierce goodness-of-fit tests for time series models which has been an important research topic over the last few decades. All previously proposed tests are focused on changes of the test statistics. Instead, I adopted a different approach that takes the best performing test and modifying the rejection region. Thus, I developed a semiparametric correction of the Adjusted Box-Pierce test that attains the best I error rates for all sample sizes and lags and outperforms …


Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield Aug 2020

Gaining Computational Insight Into Psychological Data: Applications Of Machine Learning With Eating Disorders And Autism Spectrum Disorder, Natalia Rosenfield

Computational and Data Sciences (PhD) Dissertations

Over the past 100 years, assessment tools have been developed that allow us to explore mental and behavioral processes that could not be measured before. However, conventional statistical models used for psychological data are lacking in thoroughness and predictability. This provides a perfect opportunity to use machine learning to study the data in a novel way. In this paper, we present examples of using machine learning techniques with data in three areas: eating disorders, body satisfaction, and Autism Spectrum Disorder (ASD). We explore clustering algorithms as well as virtual reality (VR).

Our first study employs the k-means clustering algorithm to …


Agnostic Science. Towards A Philosophy Of Data Analysis, Domenico Napoletani, Marco Panza, Daniele C. Struppa Jun 2010

Agnostic Science. Towards A Philosophy Of Data Analysis, Domenico Napoletani, Marco Panza, Daniele C. Struppa

MPP Published Research

In this paper we will offer a few examples to illustrate the orientation of contemporary research in data analysis and we will investigate the corresponding role of mathematics. We argue that the modus operandi of data analysis is implicitly based on the belief that if we have collected enough and sufficiently diverse data, we will be able to answer most relevant questions concerning the phenomenon itself. This is a methodological paradigm strongly related, but not limited to, biology, and we label it the microarray paradigm. In this new framework, mathematics provides powerful techniques and general ideas which generate new …