Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Logistic regression

Discipline
Institution
Publication Year
Publication
Publication Type

Articles 1 - 15 of 15

Full-Text Articles in Statistical Models

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair Jan 2022

Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair

Graduate Student Theses, Dissertations, & Professional Papers

Landslides are a globally pervasive problem with the potential to cause significant fatalities and economic losses. Although landslides are widespread, many at-risk regions may not have the high-quality data or resources used in most landslide susceptibility analyses. This study aims to develop regional susceptibility relationships that are versatile and use publicly available data and open-sourced software. Logistic Regression and Frequency Ratio susceptibility relationships were developed in 23 regions in Washington, Utah, North Carolina, and Kentucky, with a region referring to a unique area and data combination. Regions were diverse in their geology, morphology, climate, and nature and quality of their …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell Dec 2017

Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Movement and habitat selection by Greater Sage-grouse (Centrocercus uropasianus) is of great interest to wildlife managers tasked with applying conservation measures for this iconic western species. Current technology has created small and lightweight GPS (Global Positioning Systems) transmitters that can be attached to sage-grouse. Using GIS software and statistical programs such as Program R, land managers can analyze GPS location data to assess how sage-grouse are geospatially interacting with their habitats. Within the Panguitch Sage-Grouse Management Area (SGMA) thousands of acres of land have been restored or manipulated to enhance sage-grouse habitat; this usually involves removal of pinyon pine …


Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara Aug 2017

Supervised Classification Using Finite Mixture Copula, Sumen Sen, Norou Diawara

Mathematics & Statistics Faculty Publications

Use of copula for statistical classification is recent and gaining popularity. For example, statistical classification using copula has been proposed for automatic character recognition, medical diagnostic and most recently in data mining. Classical discrimination rules assume normality. But in this data age time, this assumption is often questionable. In fact features of data could be a mixture of discrete and continues random variables. In this paper, mixture copula densities are used to model class conditional distributions. Such types of densities are useful when the marginal densities of the vector of features are not normally distributed and are of a mixed …


Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd Jul 2017

Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd

Published and Grey Literature from PhD Candidates

Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with …


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Exploring New Models For Seatbelt Use In Survey Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter Oct 2016

Exploring New Models For Seatbelt Use In Survey Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter

Virginia Journal of Science

Problem: Several approaches to analyze seatbelt use have been proposed in the literature. Two methods that has not been explored are the use of unweighted and weighted logistic regression model and the use of item response theory (IRT) or the Rasch model. Since accurate methods to predict seatbelt use behavior based upon observed data must include a built-in design method and model, and overcome computation challenges, weighted and IRT method deem to be other options for an observational survey of seat belt use in the state of Virginia.

Method: The observed data from 136 sites within the Commonwealth …


Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe Mar 2014

Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe

Joseph M Hilbe

A basic overview of how to model and interpret a logistic regression model, as well as how to obtain the predicted probability or fit of the model and calculate its confidence intervals. R code used for all examples; some Stata is provided as a contrast.


Identifying The Spatial Distribution Of Three Plethodontid Salamanders In Great Smoky Mountains National Park Using Two Habitat Modeling Methods, Matthew Stephen Kookogey May 2012

Identifying The Spatial Distribution Of Three Plethodontid Salamanders In Great Smoky Mountains National Park Using Two Habitat Modeling Methods, Matthew Stephen Kookogey

Masters Theses

The main objective was to create habitat models of three plethodontid salamander species (Desmognathus conanti, D. ocoee, and Plethodon jordani) in GSMNP. To investigate the relationships between salamanders and their habitats, I used three models—logistic regression with use-availability sampling, logistic regression with case-control sampling, and Mahalanobis distance (D2)—for each species to gain a robust view of the relationships. The secondary objective was to compare the different modeling methods within and across the three species. Elevation was the dominant variable for all three species.

D2 for D. conanti predicted low elevations, close proximity …


Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin Jan 2012

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1991-2001 and …


Bayesian Phase I Dose Finding In Cancer Trials, Lin Yang Aug 2011

Bayesian Phase I Dose Finding In Cancer Trials, Lin Yang

Dissertations & Theses (Open Access)

This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model.

We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose.

The design based on a time-to-DLT model …


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …