Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Data Science

An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster Dec 2021

An Information-Theoretic Analysis Of Adherence To Physical Exercise Routines, Lily Foster

Computational and Data Sciences (MS) Theses

One of the most common recommendations in healthcare is to simply form healthy habits, but little research has been done to understand the formation and continuation of a healthy habit that isn’t heavily influenced by an individual’s interpretation. Arizona State University’s WalkIT study aimed to analyze how goal setting and financial reinforcement can influence moderate-to-vigorous physical activity (MVPA) in adults, while using data from accelerometers to alleviate individual bias. In this trial, 512 insufficiently active adults were recruited to wear an accelerometer for 1 year and were then randomly assigned to one of the four study groups. Each group had …


Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo Aug 2021

Exploring Behaviors Of Software Developers And Their Code Through Computational And Statistical Methods, Elia Eiroa Lledo

Computational and Data Sciences (PhD) Dissertations

As Artificial Intelligence (AI) increasingly penetrates all aspects of society, many obstacles emerge. This thesis identifies and discusses the issues facing Computer Vision and significant deficiencies in the Software Development Life-cycle that need to be resolved to facilitate the evolution toward true artificial intelligence. We explicitly review the concepts behind Convolutional Neural Network (CNN) models, the benchmark for computer vision. Chapter 2 highlights the mechanisms that have popularized CNNs while also specifying significant gaps that could garner the model inadequate for future use in safety-critical systems. We put forward two main limitations. Namely, CNNs do not use lack of information …


Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye Aug 2021

Multi-Modal Data Fusion, Image Segmentation, And Object Identification Using Unsupervised Machine Learning: Conception, Validation, Applications, And A Basis For Multi-Modal Object Detection And Tracking, Nicholas Lahaye

Computational and Data Sciences (PhD) Dissertations

Remote sensing and instrumentation is constantly improving and increasing in capability. Included within this, is the increase in amount of different instrument types, with various combinations of spatial and spectral resolutions, pointing angles, and various other instrument-specific qualities. While the increase in instruments, and therefore datasets, is a boon for those aiming to study the complexities of the various Earth systems, it can also present a large number of new challenges. With this information in mind, our group has set our aims on combining datasets with different spatial and spectral resolutions in an effective and as-general-as-possible way, with as little …


Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter Aug 2021

Enhancing Microbiome Host Disease Prediction With Variational Autoencoders, Celeste Manughian-Peter

Computational and Data Sciences (MS) Theses

Advancements in genetic sequencing methods for microbiomes in recent decades have permitted the collection of taxonomic and functional profiles of microbial communities, accelerating the discovery of the functional aspects of the microbiome and generating an increased interest among clinicians in applying these techniques with patients. This advancement has coincided with software and hardware improvements in the field of machine learning and deep learning. Combined, these advancements implicate further potential for progress in disease diagnosis and treatment in humans. The ability to classify a human microbiome profile into a disease category, and additionally identify the differentiating factors within the profile between …


Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian May 2021

Assessing The Re-Identification Risk In Ecg Datasets And An Application Of Privacy Preserving Techniques In Ecg Analysis, Arin Ghazarian

Computational and Data Sciences (PhD) Dissertations

In this work, first we investigate the use of ECG signal as a biometric in human identification systems using deep learning models. We train convolutional neural network models on ECG samples from approximately 81k patients. Our models achieved an over-all accuracy of 95.69%. Further, we assess the accuracy of our ECG identification model for distinct groups of patients with particular heart conditions and combinations of such conditions. For example, we observed that the identification accuracy was the highest (99.7%) for patients with both ST changes and supraventricular tachycardia. On the other hand, we also found that the identification rate was …


Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett May 2021

Novel Applications Of Statistical And Machine Learning Methods To Analyze Trial-Level Data From Cognitive Measures, Chelsea Parlett

Computational and Data Sciences (PhD) Dissertations

Many cognitive tasks and measures can benefit from trial-level analyses including Item Response Theory models as well as other Bayesian and Machine Learning models. Specifically, this dissertation focuses mainly on task-based measures of metamemory and how within-set variability as well as item-level characteristics can improve the inferences researchers make about these measures.First, a clustering analysis of judgements of learning across a task is examined in order to detect different participant strategies on a metamemory task and whether strategy use differs by age. Second, the benefits of using item response theory models to analyze both individual and item-level differences in metamemory …


Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng May 2021

Optimal Analytical Methods For High Accuracy Cardiac Disease Classification And Treatment Based On Ecg Data, Jianwei Zheng

Computational and Data Sciences (PhD) Dissertations

This work constitutes six projects. In the first project, a newly inaugurated research database for 12-lead electrocardiogram signals was created under the auspices of Chapman University and Shaoxing People's Hospital (Shaoxing Hospital Zhejiang University School of Medicine). This database aims to enable the scientific community in conducting new studies on arrhythmia and other cardiovascular conditions. In the second project, we created a new 12-lead ECG database under the auspices of Chapman University and Ningbo First Hospital of Zhejiang University that aims to provide high quality data enabling detection of the distinctions between idiopathic ventricular arrhythmia from right ventricular outflow tract …


Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best Jan 2021

Applications Of Machine Learning To Facilitate Software Engineering And Scientific Computing, Natalie Best

Computational and Data Sciences (PhD) Dissertations

The use of machine learning has risen in recent years, though many areas remain unexplored due to lack of data or lack of computational tools. This dissertation explores machine learning approaches in case studies involving image classification and natural language processing. In addition, a software library in the form of two-way bridge connecting deep learning models in Keras with ones available in the Fortran programming language is also presented.

In Chapter 2, we explore the applicability of transfer learning utilizing models pre-trained on non-software engineering data applied to the problem of classifying software unified modeling language diagrams where data is …


Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett Jan 2021

Forecasting The Prices Of Cryptocurrencies Using A Novel Parameter Optimization Of Varima Models, Alexander Barrett

Computational and Data Sciences (PhD) Dissertations

This work is a comparative study of different univariate and multivariate time series predictive models as applied to Bitcoin, other cryptocurrencies, and other related financial time series data. ARIMA models, long regarded as the gold standard of univariate financial time series prediction due to both its flexibility and simplicity, are used a baseline for prediction. Given the highly correlative nature amongst different cryptocurrencies, this work aims to show the benefit of forecasting with multivariate time series models—primarily focusing on a novel parameter optimization of VARIMA models outlined in this paper.

These models are trained on 3 years of historical data, …