Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

PDF

Logistic regression

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 33

Full-Text Articles in Physical Sciences and Mathematics

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath Jan 2024

Classification In Supervised Statistical Learning With The New Weighted Newton-Raphson Method, Toma Debnath

Electronic Theses and Dissertations

In this thesis, the Weighted Newton-Raphson Method (WNRM), an innovative optimization technique, is introduced in statistical supervised learning for categorization and applied to a diabetes predictive model, to find maximum likelihood estimates. The iterative optimization method solves nonlinear systems of equations with singular Jacobian matrices and is a modification of the ordinary Newton-Raphson algorithm. The quadratic convergence of the WNRM, and high efficiency for optimizing nonlinear likelihood functions, whenever singularity in the Jacobians occur allow for an easy inclusion to classical categorization and generalized linear models such as the Logistic Regression model in supervised learning. The WNRM is thoroughly investigated …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair Jan 2022

Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair

Graduate Student Theses, Dissertations, & Professional Papers

Landslides are a globally pervasive problem with the potential to cause significant fatalities and economic losses. Although landslides are widespread, many at-risk regions may not have the high-quality data or resources used in most landslide susceptibility analyses. This study aims to develop regional susceptibility relationships that are versatile and use publicly available data and open-sourced software. Logistic Regression and Frequency Ratio susceptibility relationships were developed in 23 regions in Washington, Utah, North Carolina, and Kentucky, with a region referring to a unique area and data combination. Regions were diverse in their geology, morphology, climate, and nature and quality of their …


Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar Dec 2021

Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar

Honors Program Theses and Research Projects

Previous studies on conditions like obesity, hypertension, and type 2 diabetes mellitus (T2DM) have explored the correlations between them and various other human conditions, including aortic stiffness, left ventricular hypertrophy and sleep apnea, as they predict possibilities of developing certain diseases in Mexican Americans. This study aims to observe the correlation between lifestyle decisions that could relate to the onset of the depression in normal, prediabetic, and diabetic individuals. These include smoking habits and alcohol consumption. Many papers have previously conducted research on these lifestyle habits as they relate to obesity, hypertension, diabetes, however, have done so in a singular …


Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith Sep 2020

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith

Journal of Modern Applied Statistical Methods

The impact of sparse data conditions was examined among one or more predictor variables in logistic regression and assessed the effectiveness of the Firth (1993) procedure in reducing potential parameter estimation bias. Results indicated sparseness in binary predictors introduces bias that is substantial with small sample sizes, and the Firth procedure can effectively correct this bias.


Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox Jun 2020

Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox

Journal of Modern Applied Statistical Methods

For a binary random variable Y, let p(x) = P(Y = 1 | X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodness-of-fit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the sample size increases. The goal is to suggest an alternative approach based on a smoother, which provides a more flexible approximation of p(x).


Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo Apr 2020

Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …


The Price Is Right: Analyzing Bidding Behavior On Contestants’ Row, Paul Kvam May 2019

The Price Is Right: Analyzing Bidding Behavior On Contestants’ Row, Paul Kvam

Department of Math & Statistics Faculty Publications

The TV game show “The Price is Right” features a bidding auction called Contestant’s Row that rewards the player (out of four) who bids closest to an item’s value without overbidding. By exploring 903 game outcomes from the 2000–2001 season, we show how player strategies are significantly inefficient, and compare the empirical results to probability outcomes for optimal bid strategies found in a recent study. Findings show that the last bidder would do better using the naïve strategy of bidding a dollar more than the highest of the three bids. We apply the EM algorithm in a novel way to …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler Aug 2018

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable …


Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan Jun 2018

Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan

Journal of Modern Applied Statistical Methods

This article showed how and why the Rasch model can be fitted under the logistic regression framework. Then a penalized maximum likelihood (Firth 1993) for logistic regression models can also be used to reduce ML biases when fitting the Rasch model. These conclusions are supported by a simulation study.


On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams Mar 2018

On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams

FIU Electronic Theses and Dissertations

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of …


The Use Of Item Response Theory In Survey Methodology: Application In Seat Belt Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter Jan 2018

The Use Of Item Response Theory In Survey Methodology: Application In Seat Belt Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter

Mathematics & Statistics Faculty Publications

Problem: Several approaches to analyze survey data have been proposed in the literature. One method that is not popular in survey research methodology is the use of item response theory (IRT). Since accurate methods to make prediction behaviors are based upon observed data, the design model must overcome computation challenges, but also consideration towards calibration and proficiency estimation. The IRT model deems to be offered those latter options. We review that model and apply it to an observational survey data. We then compare the findings with the more popular weighted logistic regression. Method: Apply IRT model to the observed data …


Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd Jul 2017

Application Of Support Vector Machine Modeling And Graph Theory Metrics For Disease Classification, Jessica M. Rudd

Published and Grey Literature from PhD Candidates

Disease classification is a crucial element of biomedical research. Recent studies have demonstrated that machine learning techniques, such as Support Vector Machine (SVM) modeling, produce similar or improved predictive capabilities in comparison to the traditional method of Logistic Regression. In addition, it has been found that social network metrics can provide useful predictive information for disease modeling. In this study, we combine simulated social network metrics with SVM to predict diabetes in a sample of data from the Behavioral Risk Factor Surveillance System. In this dataset, Logistic Regression outperformed SVM with ROC index of 81.8 and 81.7 for models with …


What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng Dec 2016

What Affects Parents’ Choice Of Milk? An Application Of Bayesian Model Averaging, Yingzhe Cheng

Mathematics & Statistics ETDs

This study identifies the factors that influence parents’ choice of milk for their children, using data from a unique survey administered in 2013 in Hunan province, China. In this survey, we identified two brands of milk, which differ in their prices and safety claims by the producer. Data were collected on parents’ choice of milk between the two brands, demographics, attitude towards food safety and behaviors related to food. Stepwise model selection and Bayesian model averaging (BMA) are used to search for influential factors. The two approaches consistently select the same factors suggested by an economic theoretical model, including price …


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Exploring New Models For Seatbelt Use In Survey Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter Oct 2016

Exploring New Models For Seatbelt Use In Survey Data, Mark K. Ledbetter, Norou Diawara, Bryan E. Porter

Virginia Journal of Science

Problem: Several approaches to analyze seatbelt use have been proposed in the literature. Two methods that has not been explored are the use of unweighted and weighted logistic regression model and the use of item response theory (IRT) or the Rasch model. Since accurate methods to predict seatbelt use behavior based upon observed data must include a built-in design method and model, and overcome computation challenges, weighted and IRT method deem to be other options for an observational survey of seat belt use in the state of Virginia.

Method: The observed data from 136 sites within the Commonwealth …


Liu-Type Logistic Estimators With Optimal Shrinkage Parameter, Yasin Asar May 2016

Liu-Type Logistic Estimators With Optimal Shrinkage Parameter, Yasin Asar

Journal of Modern Applied Statistical Methods

Multicollinearity in logistic regression affects the variance of the maximum likelihood estimator negatively. In this study, Liu-type estimators are used to reduce the variance and overcome the multicollinearity by applying some existing ridge regression estimators to the case of logistic regression model. A Monte Carlo simulation is given to evaluate the performances of these estimators when the optimal shrinkage parameter is used in the Liu-type estimators, along with an application of real case data.


Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe Mar 2014

Interpretation And Prediction Of A Logistic Model, Joseph M. Hilbe

Joseph M Hilbe

A basic overview of how to model and interpret a logistic regression model, as well as how to obtain the predicted probability or fit of the model and calculate its confidence intervals. R code used for all examples; some Stata is provided as a contrast.


Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin Jan 2012

Regression Trees For Predicting Mortality In Patients With Cardiovascular Disease: What Improvement Is Achieved By Using Ensemble-Based Methods?, Peter C. Austin

Peter Austin

In biomedical research, the logistic regression model is the most commonly used method for predicting the probability of a binary outcome. While many clinical researchers have expressed an enthusiasm for regression trees, this method may have limited accuracy for predicting health outcomes. We aimed to evaluate the improvement that is achieved by using ensemble-based methods, including bootstrap aggregation (bagging) of regression trees, random forests, and boosted regression trees. We analyzed 30-day mortality in two large cohorts of patients hospitalized with either acute myocardial infarction (N = 16,230) or congestive heart failure (N = 15,848) in two distinct eras (1991-2001 and …


Statistical Analysis Of Fatalities Due To Vehicle Accidents In Las Vegas, Nv, Annabelle Marie Mathis Aug 2011

Statistical Analysis Of Fatalities Due To Vehicle Accidents In Las Vegas, Nv, Annabelle Marie Mathis

UNLV Theses, Dissertations, Professional Papers, and Capstones

The goal of this thesis is to investigate factors that affect the odds of having a fatality in a vehicle collision. We will be looking at characteristics of the driver that caused the accident (age, gender, behavior, actions, influences, and seat belt worn), the characteristics of the vehicle the driver drove (type of vehicle, and air bag deployment), the characteristics of the environment in which the accident occurred (weather, road condition, lighting, time of day, the day of the week, and month of the year), the characteristics of the crash (direction of accident and how many vehicles were involved), and …


Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed May 2011

Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed

Journal of Modern Applied Statistical Methods

Logistic regression models for transition probabilities of higher order Markov models are developed for the sequence of chain dependent repeated observations. To identify the significance of these models and their parameters a test procedure for a likelihood ratio criterion is developed. A method of model selection is suggested on the basis of AIC and BIC procedures. The proposed models and test procedures are applied to analyze the occurrences of daily rainfall data for selected stations in Bangladesh. Based on results from these models, the transition probabilities of first order Markov model for temperature and humidity provided the most suitable option …


Bayesian Semiparametric Generalizations Of Linear Models Using Polya Trees, Angela Schoergendorfer Jan 2011

Bayesian Semiparametric Generalizations Of Linear Models Using Polya Trees, Angela Schoergendorfer

University of Kentucky Doctoral Dissertations

In a Bayesian framework, prior distributions on a space of nonparametric continuous distributions may be defined using Polya trees. This dissertation addresses statistical problems for which the Polya tree idea can be utilized to provide efficient and practical methodological solutions.

One problem considered is the estimation of risks, odds ratios, or other similar measures that are derived by specifying a threshold for an observed continuous variable. It has been previously shown that fitting a linear model to the continuous outcome under the assumption of a logistic error distribution leads to more efficient odds ratio estimates. We will show that deviations …


Robust Estimators In Logistic Regression: A Comparative Simulation Study, Sanizah Ahmad, Norazan Mohamed Ramli, Habshah Midi Nov 2010

Robust Estimators In Logistic Regression: A Comparative Simulation Study, Sanizah Ahmad, Norazan Mohamed Ramli, Habshah Midi

Journal of Modern Applied Statistical Methods

The maximum likelihood estimator (MLE) is commonly used to estimate the parameters of logistic regression models due to its efficiency under a parametric model. However, evidence has shown the MLE has an unduly effect on the parameter estimates in the presence of outliers. Robust methods are put forward to rectify this problem. This article examines the performance of the MLE and four existing robust estimators under different outlier patterns, which are investigated by real data sets and Monte Carlo simulation.


Estimation Of Risk For Developing Cardiac Problem In Patients Of Type 2 Diabetes As Obtained By The Technique Of Density Estimation, Ajit Mukherjee, Ajit Mathur, Rakesh Mittal May 2007

Estimation Of Risk For Developing Cardiac Problem In Patients Of Type 2 Diabetes As Obtained By The Technique Of Density Estimation, Ajit Mukherjee, Ajit Mathur, Rakesh Mittal

Journal of Modern Applied Statistical Methods

High levels of cholesterol and triglyceride are known to be strongly associated with development of cardiac problem in patients of type 2 diabetes. In a hospital-based study, patients showing ECG positive were compared with those who were not. The observations on cholesterol and triglyceride were considered for estimation of risk for developing the cardiac problem. The technique of density estimation employing Epanechnikov kernel was used for estimating bivariate probability density functions with respect to observations on cholesterol and triglyceride of the two groups. Using the odds form of Bayes’ rule, the estimates of posterior odds were computed.


Entropy Criterion In Logistic Regression And Shapley Value Of Predictors, Stan Lipovetsky May 2006

Entropy Criterion In Logistic Regression And Shapley Value Of Predictors, Stan Lipovetsky

Journal of Modern Applied Statistical Methods

Entropy criterion is used for constructing a binary response regression model with a logistic link. This approach yields a logistic model with coefficients proportional to the coefficients of linear regression. Based on this property, the Shapley value estimation of predictors’ contribution is applied for obtaining robust coefficients of the linear aggregate adjusted to the logistic model. This procedure produces a logistic regression with interpretable coefficients robust to multicollinearity. Numerical results demonstrate theoretical and practical advantages of the entropy-logistic regression.


Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia, Stylianos Katsaragakis, Christos Koukouvinos, Stella Stylianou, Eleni-Maria Theodoraki, Eleni-Maria Theodoraki Nov 2005

Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia, Stylianos Katsaragakis, Christos Koukouvinos, Stella Stylianou, Eleni-Maria Theodoraki, Eleni-Maria Theodoraki

Journal of Modern Applied Statistical Methods

The logistic regression has become an integral component of any medical data analysis concerning binary responses. The main issue rising after the adaptation of the final model is its goodness-of-fit. The fit of the model is assessed via the overall measures and summary statistics and comparing them in the case of hypernateamia.


Testing The Goodness Of Fit Of Multivariate Multiplicative-Intercept Risk Models Based On Case-Control Data, Biao Zhang May 2005

Testing The Goodness Of Fit Of Multivariate Multiplicative-Intercept Risk Models Based On Case-Control Data, Biao Zhang

Journal of Modern Applied Statistical Methods

The validity of the multivariate multiplicative-intercept risk model with I +1 categories based on casecontrol data is tested. After reparametrization, the assumed risk model is equivalent to an (I +1) -sample semiparametric model in which the I ratios of two unspecified density functions have known parametric forms. By identifying this (I +1) -sample semiparametric model, which is of intrinsic interest in general (I +1) -sample problems, with an (I +1) -sample semiparametric selection bias model, we propose a weighted Kolmogorov-Smirnov-type statistic to test the validity of the multivariate multiplicativeintercept risk model. Established are some asymptotic results …


A Generalized Quasi-Likelihood Model Application To Modeling Poverty Of Asian American Women, Jeffrey R. Wilson May 2004

A Generalized Quasi-Likelihood Model Application To Modeling Poverty Of Asian American Women, Jeffrey R. Wilson

Journal of Modern Applied Statistical Methods

A generalized quasi-likelihood function that does not require the assumption of an underlying distribution when modeling jointly the mean and the variance, is introduced to examine poverty of Asian American women living in the West coast of the United States, using data from U.S. Census Bureau.