Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Statistics and Probability

Logistic regression

Institution
Publication Year
Publication

Articles 1 - 29 of 29

Full-Text Articles in Physical Sciences and Mathematics

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath Jan 2024

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath

Electronic Theses and Dissertations

In this thesis, the Weighted Newton-Raphson Method (WNRM), an innovative optimization technique, is introduced in statistical supervised learning for categorization and applied to a diabetes predictive model, to find maximum likelihood estimates. The iterative optimization method solves nonlinear systems of equations with singular Jacobian matrices and is a modification of the ordinary Newton-Raphson algorithm. The quadratic convergence of the WNRM, and high efficiency for optimizing nonlinear likelihood functions, whenever singularity in the Jacobians occur allow for an easy inclusion to classical categorization and generalized linear models such as the Logistic Regression model in supervised learning. The WNRM is thoroughly investigated …


Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop Jul 2023

Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop

Theses and Dissertations

This dissertation focuses on theory and application of discrete data methods, particularly approaches to over- and underdispersion relative to the Poisson distribution and an application of random forest and logistic regression modeling. The first chapter derives a score test for over- and underdispersion in the heaped generalized Poisson distribution. Equi-, over-, and underdispersed heaped generalized Poisson and heaped negative binomial data are simulated to evaluate the performance of the score test by comparing the power it achieves to that of Wald and likelihood ratio tests. We find that the score test we derive performs comparably to both the Wald and …


Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair Jan 2022

Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair

Graduate Student Theses, Dissertations, & Professional Papers

Landslides are a globally pervasive problem with the potential to cause significant fatalities and economic losses. Although landslides are widespread, many at-risk regions may not have the high-quality data or resources used in most landslide susceptibility analyses. This study aims to develop regional susceptibility relationships that are versatile and use publicly available data and open-sourced software. Logistic Regression and Frequency Ratio susceptibility relationships were developed in 23 regions in Washington, Utah, North Carolina, and Kentucky, with a region referring to a unique area and data combination. Regions were diverse in their geology, morphology, climate, and nature and quality of their …


Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar Dec 2021

Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar

Honors Program Theses and Research Projects

Previous studies on conditions like obesity, hypertension, and type 2 diabetes mellitus (T2DM) have explored the correlations between them and various other human conditions, including aortic stiffness, left ventricular hypertrophy and sleep apnea, as they predict possibilities of developing certain diseases in Mexican Americans. This study aims to observe the correlation between lifestyle decisions that could relate to the onset of the depression in normal, prediabetic, and diabetic individuals. These include smoking habits and alcohol consumption. Many papers have previously conducted research on these lifestyle habits as they relate to obesity, hypertension, diabetes, however, have done so in a singular …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Developing Prediction Models For Kidney Stone Disease, Joseph Palko Jun 2021

Developing Prediction Models For Kidney Stone Disease, Joseph Palko

Honors Theses

Kidney stone disease has become more prevalent through the years, leading to high treatment cost and associated health risks. In this study, we explore a large medical database and machine learning methods to extract features and construct models for diagnosing kidney stone disease.

Data of 46,250 patients and 58,976 hospital admissions were extracted and analyzed, including patients’ demographic information, diagnoses, vital signs, and laboratory measurements of the blood and urine. We compared the kidney stone (KDS) patients to patients with abdominal and back pain (ABP), patients diagnosed with nephritis, nephrosis, renal sclerosis, chronic kidney disease, or acute and unspecified renal …


An Examination Into Retention Behavior Of Air Force Female Officers, Jessica M. Astudillo Mar 2021

An Examination Into Retention Behavior Of Air Force Female Officers, Jessica M. Astudillo

Theses and Dissertations

Female retention rates in the US military have been considerably lower than that of their male counterparts for numerous years. In the Air Force, women represent 14 percent of officer ranks from O-5 level and above. Comparatively, the overall rate of women officers in service is 20 percent. Understanding the negative factors associated with the attrition rate of this group can help the Air Force leverage positive change. It may also influence adjustments that will increase the number of women serving, and improve diversity throughout both the officer and enlisted ranks. In this study, logistic regression and survival analysis are …


An Examination Of Civilian Retention In The United States Air Force, William F. Wilson Mar 2021

An Examination Of Civilian Retention In The United States Air Force, William F. Wilson

Theses and Dissertations

The backbone of the United States Air Force is undoubtedly the large civilian workforce that supplements the great work that is accomplished. Many research studies have been conducted on officer and enlisted personnel to ensure that the career fields are properly developed and managed to meet the ever growing demands of the military's varied missions, but no recent studies have focused on the civilian workforce. Striking a balance between new and experienced employees is paramount to success given the ever-changing economic and political landscapes where we find ourselves. The first part of the research uses logistic regression to determine the …


Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue Feb 2021

Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue

Electronic Thesis and Dissertation Repository

Sample size estimation is usually the first step in planning a research study. Too small a study cannot adequately address the objectives, while too large a study may waste resources or unethical. For binary outcomes, several sample size estimation methods are available based on logistic regression models, which focusing on odds ratios. In prospective studies, risk ratios are preferable for ease of interpretation and communication. In this thesis, we compared the power difference between the logistic regression model and the modified Poisson regression model via simulation studies. We then proposed sample size estimation formulas based on the modified Poisson regression …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee Jan 2021

Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee

Senior Independent Study Theses

According to the National Institutes of Mental Health (NIMH), depressive disorders (or major depression) are considered one of the most common and serious health risks in the United States. Our study focuses on extracting non-medical factors of depressive disorders diagnosis, such as overall health states, health risk behaviors, demography, and healthcare access, using the Behavioral Risk Factor Surveillance System (BRFSS) data set collected by the Centers for Disease Control and Prevention (CDC) in 2018.

We set the two objectives of our study about depressive disorders diagnosis in the United States as follows. First, we aim to utilize machine learning algorithms …


Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu Jan 2020

Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu

Electronic Theses and Dissertations

The misclassification simulation extrapolation (MC-SIMEX) method proposed by Küchenho et al. is a general method of handling categorical data with measurement error. It consists of two steps, the simulation and extrapolation steps. In the simulation step, it simulates observations with varying degrees of measurement error. Then parameter estimators for varying degrees of measurement error are obtained based on these observations. In the extrapolation step, it uses a parametric extrapolation function to obtain the parameter estimators for data with no measurement error. However, as shown in many studies, the parameter estimators are still biased as a result of the parametric extrapolation …


Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov Aug 2019

Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov

Electronic Thesis and Dissertation Repository

Regression analyses in epidemiological and medical research typically begin with a model selection process, followed by inference assuming the selected model has generated the data at hand. It is well-known that this two-step procedure can yield biased estimates and invalid confidence intervals for model coefficients due to the uncertainty associated with the model selection. To account for this uncertainty, multiple models may be selected as a basis for inference. This method, commonly referred to as model-averaging, is increasingly becoming a viable approach in practice.

Previous research has demonstrated the advantage of model-averaging in reducing bias of parameter estimates. However, there …


Prediction Of High School Graduation With Decision Trees, Andrea M. Lee Aug 2019

Prediction Of High School Graduation With Decision Trees, Andrea M. Lee

MSU Graduate Theses

While working as an educator for the past fourteen years, we are always looking at data and determining ways to help our students. Graduation status is one area of interest. I wanted to apply statistical methods to try and find early indicators of those students who may drop out, thus being able to provide early intervention to those students. With early intervention, we may be able to lower our dropout rate. While studying different methods of pattern recognition, I found that the decision tree method in machine learning was the best for the data that I had collected. Decision trees …


The Impact Of Changing Requirements, James C. Ellis Mar 2018

The Impact Of Changing Requirements, James C. Ellis

Theses and Dissertations

The fundamental purpose of an Engineering Change Proposal (ECP) is to change the requirements of a contract. To build in flexibility, the acquisition practice is to estimate a dollar value to hold in reserve after the contract is awarded. There appears to be no empirical-based method for estimating this ECP withhold in the literature. Using the Cost Assessment Data Enterprise (CADE) database, 533 contracts were randomly selected to build two regression models: one to predict the likelihood of a contract experiencing an ECP, and the other to determine the expected median percent increase in baseline contract cost if an ECP …


Preference Probability Based On Ranks - A New Approach Using Logistic Regression With Zero Intercept, Oluwagbenga David Agboola Jan 2018

Preference Probability Based On Ranks - A New Approach Using Logistic Regression With Zero Intercept, Oluwagbenga David Agboola

Theses, Dissertations and Capstones

Many probability models have been proposed to describe rankings. One of these is the BradleyTerry model, which is based on observed pairwise preferences. For this study, we reverse the case and propose a new approach for estimating pairwise preference probabilities based on observed rankings. The new approach uses logistic regression with zero intercept as the statistical model that fits this situation. In order to implement the model, we first estimate the parameter using maximum likelihood estimation. Then we evaluate this estimation using numerical approximation procedures. We consider three such procedures: bisection method, Newton-Raphson method, and improved Newton’s method. Using simulated …


Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell Dec 2017

Seasonal Resource Selection And Habitat Treatment Use By A Fringe Population Of Greater Sage-Grouse, Rhett Boswell

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Movement and habitat selection by Greater Sage-grouse (Centrocercus uropasianus) is of great interest to wildlife managers tasked with applying conservation measures for this iconic western species. Current technology has created small and lightweight GPS (Global Positioning Systems) transmitters that can be attached to sage-grouse. Using GIS software and statistical programs such as Program R, land managers can analyze GPS location data to assess how sage-grouse are geospatially interacting with their habitats. Within the Panguitch Sage-Grouse Management Area (SGMA) thousands of acres of land have been restored or manipulated to enhance sage-grouse habitat; this usually involves removal of pinyon pine …


Exact Approaches For Bias Detection And Avoidance With Small, Sparse, Or Correlated Categorical Data, Sarah E. Schwartz Dec 2017

Exact Approaches For Bias Detection And Avoidance With Small, Sparse, Or Correlated Categorical Data, Sarah E. Schwartz

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Every day, traditional statistical methodology are used world wide to study a variety of topics and provides insight regarding countless subjects. Each technique is based on a distinct set of assumptions to ensure valid results. Additionally, many statistical approaches rely on large sample behavior and may collapse or degenerate in the presence of small, spare, or correlated data. This dissertation details several advancements to detect these conditions, avoid their consequences, and analyze data in a different way to yield trustworthy results.

One of the most commonly used modeling techniques for outcomes with only two possible categorical values (eg. live/die, pass/fail, …


Inference Using Bhattacharyya Distance To Model Interaction Effects When The Number Of Predictors Far Exceeds The Sample Size, Sarah A. Janse Jan 2017

Inference Using Bhattacharyya Distance To Model Interaction Effects When The Number Of Predictors Far Exceeds The Sample Size, Sarah A. Janse

Theses and Dissertations--Statistics

In recent years, statistical analyses, algorithms, and modeling of big data have been constrained due to computational complexity. Further, the added complexity of relationships among response and explanatory variables, such as higher-order interaction effects, make identifying predictors using standard statistical techniques difficult. These difficulties are only exacerbated in the case of small sample sizes in some studies. Recent analyses have targeted the identification of interaction effects in big data, but the development of methods to identify higher-order interaction effects has been limited by computational concerns. One recently studied method is the Feasible Solutions Algorithm (FSA), a fast, flexible method that …


What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng Dec 2016

What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng

Mathematics & Statistics ETDs

This study identifies the factors that influence parents’ choice of milk for their children, using data from a unique survey administered in 2013 in Hunan province, China. In this survey, we identified two brands of milk, which differ in their prices and safety claims by the producer. Data were collected on parents’ choice of milk between the two brands, demographics, attitude towards food safety and behaviors related to food. Stepwise model selection and Bayesian model averaging (BMA) are used to search for influential factors. The two approaches consistently select the same factors suggested by an economic theoretical model, including price …


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Separation Of Points And Interval Estimation In Mixed Dose-Response Curves With Selective Component Labeling, Darl D. Flake Ii May 2016

Separation Of Points And Interval Estimation In Mixed Dose-Response Curves With Selective Component Labeling, Darl D. Flake Ii

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Dose-response experiments are those that involve giving subjects different amounts of a treatment and observing the outcome. For example, plants may be given fertilizer and their growth could be measured or cancer patients could be given different doses of chemotherapy and their response could be monitored. These experiments are used to understand the relationship between the amount of, and response to, the treatment. Logistic regression models are often used to summarize data from these types of experiments. The dose-response experiment that motivated this dissertation involved treating a grain-pest with a pesticide. Some of the beetles had genes that made them …


Analysis Of Rheumatoid Arthritis Data Using Logistic Regression And Penalized Approach, Wei Chen Nov 2015

Analysis Of Rheumatoid Arthritis Data Using Logistic Regression And Penalized Approach, Wei Chen

USF Tampa Graduate Theses and Dissertations

In this paper, a rheumatoid arthritis (RA) medicine clinical dataset with an ordinal response is selected to study this new medicine. In the dataset, there are four features, sex, age,treatment, and preliminary. Sex is a binary categorical variable with 1 indicates male, and 0 indicates female. Age is the numerical age of the patients. And treatment is a binary categorical variable with 1 indicates has RA, and 0 indicates does not have RA. And preliminary is a five class categorical variable indicates the patient’s RA severity status before taking the medication. The response Y is 5 class ordinal variable shows …


Generation And Statistical Modeling Of Active Protein Chimeras: A Sequence Based Approach, Nicholas Fico Oct 2013

Generation And Statistical Modeling Of Active Protein Chimeras: A Sequence Based Approach, Nicholas Fico

Open Access Dissertations

Generation of active protein chimeras is a valuable tool to probe the functional space of proteins. Statistical modeling is the next logical step, allowing us to build a model of gene fragment replaceability between species. In this thesis I begin to develop the statistical tools that are needed to systematically describe combinatorial protein libraries. I present three sets of diverse chimeric protein libraries developed using sequence information. The statistical model of the human N-Ras and human K-Ras-4B genes reveal a set previously unidetifed surface residues on the N-Ras G-Domain that may be involved in cellular localization. Statistical modeling of a …


Identifying The Spatial Distribution Of Three Plethodontid Salamanders In Great Smoky Mountains National Park Using Two Habitat Modeling Methods, Matthew Stephen Kookogey May 2012

Identifying The Spatial Distribution Of Three Plethodontid Salamanders In Great Smoky Mountains National Park Using Two Habitat Modeling Methods, Matthew Stephen Kookogey

Masters Theses

The main objective was to create habitat models of three plethodontid salamander species (Desmognathus conanti, D. ocoee, and Plethodon jordani) in GSMNP. To investigate the relationships between salamanders and their habitats, I used three models—logistic regression with use-availability sampling, logistic regression with case-control sampling, and Mahalanobis distance (D2)—for each species to gain a robust view of the relationships. The secondary objective was to compare the different modeling methods within and across the three species. Elevation was the dominant variable for all three species.

D2 for D. conanti predicted low elevations, close proximity …


Bayesian Phase I Dose Finding In Cancer Trials, Lin Yang Aug 2011

Bayesian Phase I Dose Finding In Cancer Trials, Lin Yang

Dissertations & Theses (Open Access)

This dissertation explores phase I dose-finding designs in cancer trials from three perspectives: the alternative Bayesian dose-escalation rules, a design based on a time-to-dose-limiting toxicity (DLT) model, and a design based on a discrete-time multi-state (DTMS) model.

We list alternative Bayesian dose-escalation rules and perform a simulation study for the intra-rule and inter-rule comparisons based on two statistical models to identify the most appropriate rule under certain scenarios. We provide evidence that all the Bayesian rules outperform the traditional ``3+3'' design in the allocation of patients and selection of the maximum tolerated dose.

The design based on a time-to-DLT model …


Statistical Analysis Of Fatalities Due To Vehicle Accidents In Las Vegas, Nv, Annabelle Marie Mathis Aug 2011

Statistical Analysis Of Fatalities Due To Vehicle Accidents In Las Vegas, Nv, Annabelle Marie Mathis

UNLV Theses, Dissertations, Professional Papers, and Capstones

The goal of this thesis is to investigate factors that affect the odds of having a fatality in a vehicle collision. We will be looking at characteristics of the driver that caused the accident (age, gender, behavior, actions, influences, and seat belt worn), the characteristics of the vehicle the driver drove (type of vehicle, and air bag deployment), the characteristics of the environment in which the accident occurred (weather, road condition, lighting, time of day, the day of the week, and month of the year), the characteristics of the crash (direction of accident and how many vehicles were involved), and …


Bayesian Semiparametric Generalizations Of Linear Models Using Polya Trees, Angela Schoergendorfer Jan 2011

Bayesian Semiparametric Generalizations Of Linear Models Using Polya Trees, Angela Schoergendorfer

University of Kentucky Doctoral Dissertations

In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions.

One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations …


A Logistic Regression Analysis Of Utah Colleges Exit Poll Response Rates Using Sas Software, Clint W. Stevenson Oct 2006

A Logistic Regression Analysis Of Utah Colleges Exit Poll Response Rates Using Sas Software, Clint W. Stevenson

Theses and Dissertations

In this study I examine voter response at an interview level using a dataset of 7562 voter contacts (including responses and nonresponses) in the 2004 Utah Colleges Exit Poll. In 2004, 4908 of the 7562 voters approached responded to the exit poll for an overall response rate of 65 percent. Logistic regression is used to estimate factors that contribute to a success or failure of each interview attempt. This logistic regression model uses interviewer characteristics, voter characteristics (both respondents and nonrespondents), and exogenous factors as independent variables. Voter characteristics such as race, gender, and age are strongly associated with response. …