Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Data Science

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath Jan 2024

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath

Electronic Theses and Dissertations

In this thesis, the Weighted Newton-Raphson Method (WNRM), an innovative optimization technique, is introduced in statistical supervised learning for categorization and applied to a diabetes predictive model, to find maximum likelihood estimates. The iterative optimization method solves nonlinear systems of equations with singular Jacobian matrices and is a modification of the ordinary Newton-Raphson algorithm. The quadratic convergence of the WNRM, and high efficiency for optimizing nonlinear likelihood functions, whenever singularity in the Jacobians occur allow for an easy inclusion to classical categorization and generalized linear models such as the Logistic Regression model in supervised learning. The WNRM is thoroughly investigated …


Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang Dec 2023

Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang

Electronic Thesis and Dissertation Repository

This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.

To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Determining The Proportionality Of Ischemic Stroke Risk Factors To Age, Elizabeth Hunter, John D. Kelleher Jan 2023

Determining The Proportionality Of Ischemic Stroke Risk Factors To Age, Elizabeth Hunter, John D. Kelleher

Articles

While age is an important risk factor, there are some disadvantages to including it in a stroke risk model: age can dominate the risk score and lead to over-or under-predictions in some age groups. There is evidence to suggest that some of these disadvantages are due to the non-proportionality of other risk factors with age, eg, risk factors contribute differently to stroke risk based on an individual’s age. In this paper, we present a framework to test if risk factors are proportional with age. We then apply the framework to a set of risk factors using Framingham heart study data …


Examining Bias In Jury Selection For Criminal Trials In Dallas County, Megan Ball, Brandon Birmingham, Matt Farrow, Katherine Mitchell, Bivin Sadler, Lynne Stokes Sep 2022

Examining Bias In Jury Selection For Criminal Trials In Dallas County, Megan Ball, Brandon Birmingham, Matt Farrow, Katherine Mitchell, Bivin Sadler, Lynne Stokes

SMU Data Science Review

One of the hallmarks of the American judicial system is the concept of trial by jury, and for said trial to consist of an impartial jury of your peers. Several landmark legal cases in the history of the United States have challenged this notion of equal representation by jury—most notably Batson v. Kentucky, 476 U.S. 79 (1986). Most of the previous research, focus, and legal precedence has centered around peremptory challenges and attempting to prove if bias was suspected in excluding certain jurors from serving. Few studies, however, focus on examining challenges for cause based on self-reported biases from the …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee Jan 2021

Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee

Senior Independent Study Theses

According to the National Institutes of Mental Health (NIMH), depressive disorders (or major depression) are considered one of the most common and serious health risks in the United States. Our study focuses on extracting non-medical factors of depressive disorders diagnosis, such as overall health states, health risk behaviors, demography, and healthcare access, using the Behavioral Risk Factor Surveillance System (BRFSS) data set collected by the Centers for Disease Control and Prevention (CDC) in 2018.

We set the two objectives of our study about depressive disorders diagnosis in the United States as follows. First, we aim to utilize machine learning algorithms …


Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara Aug 2017

Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara

Mathematics & Statistics Faculty Publications

Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed …