Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Logistic regression

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 83

Full-Text Articles in Statistics and Probability

Effects Of Maternal Anthropometry On Infant Anthropometry: A Cross-Sectional Study At Public Hospital X In Ternate, Indonesia, Yuni Nurwati, Hardinsyah Hardinsyah, Sri Anna Marliyati, Budi Iman Santoso, Dewi Anggraini Feb 2024

Effects Of Maternal Anthropometry On Infant Anthropometry: A Cross-Sectional Study At Public Hospital X In Ternate, Indonesia, Yuni Nurwati, Hardinsyah Hardinsyah, Sri Anna Marliyati, Budi Iman Santoso, Dewi Anggraini

Kesmas

Infant anthropometry is an indicator of neonatal survival. This study aimed to determine the effects of maternal anthropometry on estimating infant anthropom­etry. This cross-sectional study on 173 pregnant women at Public Hospital X in Ternate, Indonesia, was conducted from August 2018 to March 2023. The el­igible criteria were pregnant women aged ≥18 years, single pregnancy, and antenatal care (ANC) visits to the same hospital. The variables used included ma­ternal anthropometric measurements (body weight, body height, third-trimester weight (TTW)), gestational weight gain (GWG), education, age, ANC visits, and gestational age at delivery (GAD). A logistic regression model was employed to estimate …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …


Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop Jul 2023

Approaches To Detecting And Modeling Over-And Underdispersion In Alternative Count Data Distributions And An Application Of Logistic Regression And Random Forest Modeling To Improve Screening Tools For Tic Disorders In Children, Rebecca C. Wardrop

Theses and Dissertations

This dissertation focuses on theory and application of discrete data methods, particularly approaches to over- and underdispersion relative to the Poisson distribution and an application of random forest and logistic regression modeling. The first chapter derives a score test for over- and underdispersion in the heaped generalized Poisson distribution. Equi-, over-, and underdispersed heaped generalized Poisson and heaped negative binomial data are simulated to evaluate the performance of the score test by comparing the power it achieves to that of Wald and likelihood ratio tests. We find that the score test we derive performs comparably to both the Wald and …


Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair Jan 2022

Development Of Regional Landslide Susceptibility Models: A First Step Towards Model Transferability, Gina M. Belair

Graduate Student Theses, Dissertations, & Professional Papers

Landslides are a globally pervasive problem with the potential to cause significant fatalities and economic losses. Although landslides are widespread, many at-risk regions may not have the high-quality data or resources used in most landslide susceptibility analyses. This study aims to develop regional susceptibility relationships that are versatile and use publicly available data and open-sourced software. Logistic Regression and Frequency Ratio susceptibility relationships were developed in 23 regions in Washington, Utah, North Carolina, and Kentucky, with a region referring to a unique area and data combination. Regions were diverse in their geology, morphology, climate, and nature and quality of their …


Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar Dec 2021

Smoking, Alcohol Consumption, And Depression In Association With Incidence Of Type 2 Diabetes Among Mexican Americans In Starr County, Texas, Gabriela Rubannelsonkumar

Honors Program Theses and Research Projects

Previous studies on conditions like obesity, hypertension, and type 2 diabetes mellitus (T2DM) have explored the correlations between them and various other human conditions, including aortic stiffness, left ventricular hypertrophy and sleep apnea, as they predict possibilities of developing certain diseases in Mexican Americans. This study aims to observe the correlation between lifestyle decisions that could relate to the onset of the depression in normal, prediabetic, and diabetic individuals. These include smoking habits and alcohol consumption. Many papers have previously conducted research on these lifestyle habits as they relate to obesity, hypertension, diabetes, however, have done so in a singular …


Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao Jul 2021

Privacy-Preserving Cloud-Assisted Data Analytics, Wei Bao

Graduate Theses and Dissertations

Nowadays industries are collecting a massive and exponentially growing amount of data that can be utilized to extract useful insights for improving various aspects of our life. Data analytics (e.g., via the use of machine learning) has been extensively applied to make important decisions in various real world applications. However, it is challenging for resource-limited clients to analyze their data in an efficient way when its scale is large. Additionally, the data resources are increasingly distributed among different owners. Nonetheless, users' data may contain private information that needs to be protected.

Cloud computing has become more and more popular in …


Developing Prediction Models For Kidney Stone Disease, Joseph Palko Jun 2021

Developing Prediction Models For Kidney Stone Disease, Joseph Palko

Honors Theses

Kidney stone disease has become more prevalent through the years, leading to high treatment cost and associated health risks. In this study, we explore a large medical database and machine learning methods to extract features and construct models for diagnosing kidney stone disease.

Data of 46,250 patients and 58,976 hospital admissions were extracted and analyzed, including patients’ demographic information, diagnoses, vital signs, and laboratory measurements of the blood and urine. We compared the kidney stone (KDS) patients to patients with abdominal and back pain (ABP), patients diagnosed with nephritis, nephrosis, renal sclerosis, chronic kidney disease, or acute and unspecified renal …


An Examination Into Retention Behavior Of Air Force Female Officers, Jessica M. Astudillo Mar 2021

An Examination Into Retention Behavior Of Air Force Female Officers, Jessica M. Astudillo

Theses and Dissertations

Female retention rates in the US military have been considerably lower than that of their male counterparts for numerous years. In the Air Force, women represent 14 percent of officer ranks from O-5 level and above. Comparatively, the overall rate of women officers in service is 20 percent. Understanding the negative factors associated with the attrition rate of this group can help the Air Force leverage positive change. It may also influence adjustments that will increase the number of women serving, and improve diversity throughout both the officer and enlisted ranks. In this study, logistic regression and survival analysis are …


An Examination Of Civilian Retention In The United States Air Force, William F. Wilson Mar 2021

An Examination Of Civilian Retention In The United States Air Force, William F. Wilson

Theses and Dissertations

The backbone of the United States Air Force is undoubtedly the large civilian workforce that supplements the great work that is accomplished. Many research studies have been conducted on officer and enlisted personnel to ensure that the career fields are properly developed and managed to meet the ever growing demands of the military's varied missions, but no recent studies have focused on the civilian workforce. Striking a balance between new and experienced employees is paramount to success given the ever-changing economic and political landscapes where we find ourselves. The first part of the research uses logistic regression to determine the …


Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue Feb 2021

Sample Size Formulas For Estimating Risk Ratios With The Modified Poisson Model For Binary Outcomes, Zhenni Xue

Electronic Thesis and Dissertation Repository

Sample size estimation is usually the first step in planning a research study. Too small a study cannot adequately address the objectives, while too large a study may waste resources or unethical. For binary outcomes, several sample size estimation methods are available based on logistic regression models, which focusing on odds ratios. In prospective studies, risk ratios are preferable for ease of interpretation and communication. In this thesis, we compared the power difference between the logistic regression model and the modified Poisson regression model via simulation studies. We then proposed sample size estimation formulas based on the modified Poisson regression …


A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill Jan 2021

A Bayesian Hierarchical Mixture Model With Continuous-Time Markov Chains To Capture Bumblebee Foraging Behavior, Max Thrush Hukill

Honors Projects

The standard statistical methodology for analyzing complex case-control studies in ethology is often limited by approaches that force researchers to model distinct aspects of biological processes in a piecemeal, disjointed fashion. By developing a hierarchical Bayesian model, this work demonstrates that statistical inference in this context can be done using a single coherent framework. To do this, we construct a continuous-time Markov chain (CTMC) to model bumblebee foraging behavior. To connect the experimental design with the CTMC, we employ a mixture model controlled by a logistic regression on the two-factor design matrix. We then show how to infer these model …


Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee Jan 2021

Statistical And Machine Learning Approaches To Depressive Disorders Among Adults In The United States: From Factor Discovery To Prediction Evaluation, Minhwa Lee

Senior Independent Study Theses

According to the National Institutes of Mental Health (NIMH), depressive disorders (or major depression) are considered one of the most common and serious health risks in the United States. Our study focuses on extracting non-medical factors of depressive disorders diagnosis, such as overall health states, health risk behaviors, demography, and healthcare access, using the Behavioral Risk Factor Surveillance System (BRFSS) data set collected by the Centers for Disease Control and Prevention (CDC) in 2018.

We set the two objectives of our study about depressive disorders diagnosis in the United States as follows. First, we aim to utilize machine learning algorithms …


Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith Sep 2020

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith

Journal of Modern Applied Statistical Methods

The impact of sparse data conditions was examined among one or more predictor variables in logistic regression and assessed the effectiveness of the Firth (1993) procedure in reducing potential parameter estimation bias. Results indicated sparseness in binary predictors introduces bias that is substantial with small sample sizes, and the Firth procedure can effectively correct this bias.


Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox Jun 2020

Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox

Journal of Modern Applied Statistical Methods

For a binary random variable Y, let p(x) = P(Y = 1 | X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodness-of-fit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the sample size increases. The goal is to suggest an alternative approach based on a smoother, which provides a more flexible approximation of p(x).


Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo Apr 2020

Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …


Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu Jan 2020

Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu

Electronic Theses and Dissertations

The misclassification simulation extrapolation (MC-SIMEX) method proposed by Küchenho et al. is a general method of handling categorical data with measurement error. It consists of two steps, the simulation and extrapolation steps. In the simulation step, it simulates observations with varying degrees of measurement error. Then parameter estimators for varying degrees of measurement error are obtained based on these observations. In the extrapolation step, it uses a parametric extrapolation function to obtain the parameter estimators for data with no measurement error. However, as shown in many studies, the parameter estimators are still biased as a result of the parametric extrapolation …


Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov Aug 2019

Towards Using Model Averaging To Construct Confidence Intervals In Logistic Regression Models, Artem Uvarov

Electronic Thesis and Dissertation Repository

Regression analyses in epidemiological and medical research typically begin with a model selection process, followed by inference assuming the selected model has generated the data at hand. It is well-known that this two-step procedure can yield biased estimates and invalid confidence intervals for model coefficients due to the uncertainty associated with the model selection. To account for this uncertainty, multiple models may be selected as a basis for inference. This method, commonly referred to as model-averaging, is increasingly becoming a viable approach in practice.

Previous research has demonstrated the advantage of model-averaging in reducing bias of parameter estimates. However, there …


Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels Aug 2019

Longitudinal Analysis With Modes Of Operation For Aes, Dana Geislinger, Cory Thigpen, Daniel W. Engels

SMU Data Science Review

In this paper, we present an empirical evaluation of the randomness of the ciphertext blocks generated by the Advanced Encryption Standard (AES) cipher in Counter (CTR) mode and in Cipher Block Chaining (CBC) mode. Vulnerabilities have been found in the AES cipher that may lead to a reduction in the randomness of the generated ciphertext blocks that can result in a practical attack on the cipher. We evaluate the randomness of the AES ciphertext using the standard key length and NIST randomness tests. We evaluate the randomness through a longitudinal analysis on 200 billion ciphertext blocks using logistic regression and …


Prediction Of High School Graduation With Decision Trees, Andrea M. Lee Aug 2019

Prediction Of High School Graduation With Decision Trees, Andrea M. Lee

MSU Graduate Theses

While working as an educator for the past fourteen years, we are always looking at data and determining ways to help our students. Graduation status is one area of interest. I wanted to apply statistical methods to try and find early indicators of those students who may drop out, thus being able to provide early intervention to those students. With early intervention, we may be able to lower our dropout rate. While studying different methods of pattern recognition, I found that the decision tree method in machine learning was the best for the data that I had collected. Decision trees …


The Price Is Right: Analyzing Bidding Behavior On Contestants’ Row, Paul Kvam May 2019

The Price Is Right: Analyzing Bidding Behavior On Contestants’ Row, Paul Kvam

Department of Math & Statistics Faculty Publications

The TV game show “The Price is Right” features a bidding auction called Contestant’s Row that rewards the player (out of four) who bids closest to an item’s value without overbidding. By exploring 903 game outcomes from the 2000–2001 season, we show how player strategies are significantly inefficient, and compare the empirical results to probability outcomes for optimal bid strategies found in a recent study. Findings show that the last bidder would do better using the naïve strategy of bidding a dollar more than the highest of the three bids. We apply the EM algorithm in a novel way to …


Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley May 2019

Leveraging Reviews To Improve User Experience, Anthony Schams, Iram Bakhtiar, Cristina Stanley

SMU Data Science Review

In this paper, we will explore and present a method of finding characteristics of a restaurant using its reviews through machine learning algorithms. We begin by building models to predict the ratings of individual reviews using text and categorical features. This is to examine the efficacy of the algorithms to the task. Both XGBoost and logistic regression will be examined. With these models, our goal is then to identify key phrases in reviews that are correlated with positive and negative experience. Our analysis makes use of review data publicly made available by Yelp. Key bigrams extracted were non-specific to the …


Logistic Ensemble Models, Bob Vanderheyden, Jennifer L. Priestley Mar 2019

Logistic Ensemble Models, Bob Vanderheyden, Jennifer L. Priestley

Jennifer L. Priestley

Predictive models that are developed in a regulated industry or a regulated application, like determination of credit worthiness must be interpretable and “rational” (e.g., improvements in basic credit behavior must result in improved credit worthiness scores). Machine Learning technologies provide very good performance with minimal analyst intervention, so they are well suited to a high volume analytic environment but the majority are “black box” tools that provide very limited insight or interpretability into key drivers of model performance or predicted model output values. This paper presents a methodology that blends one of the most popular predictive statistical modeling methods with …


Application Of Isotonic Regression In Predicting Business Risk Scores, Linh T. Le, Jennifer L. Priestley Mar 2019

Application Of Isotonic Regression In Predicting Business Risk Scores, Linh T. Le, Jennifer L. Priestley

Jennifer L. Priestley

An isotonic regression model fits an isotonic function of the explanatory variables to estimate the expectation of the response variable. In other words, as the function increases, the estimated expectation of the response must be non-decreasing. With this characteristic, isotonic regression could be a suitable option to analyze and predict business risk scores. A current challenge of isotonic regression is the decrease of performance when the model is fitted in a large data set e.g. more than four or five dimensions. This paper attempts to apply isotonic regression models into prediction of business risk scores using a large data set …


A Comparison Of Decision Tree With Logistic Regression Model For Prediction Of Worst Non-Financial Payment Status In Commercial Credit, Jessica M. Rudd Mph, Gstat, Jennifer L. Priestley Mar 2019

A Comparison Of Decision Tree With Logistic Regression Model For Prediction Of Worst Non-Financial Payment Status In Commercial Credit, Jessica M. Rudd Mph, Gstat, Jennifer L. Priestley

Jennifer L. Priestley

Credit risk prediction is an important problem in the financial services domain. While machine learning techniques such as Support Vector Machines and Neural Networks have been used for improved predictive modeling, the outcomes of such models are not readily explainable and, therefore, difficult to apply within financial regulations. In contrast, Decision Trees are easy to explain, and provide an easy to interpret visualization of model decisions. The aim of this paper is to predict worst non-financial payment status among businesses, and evaluate decision tree model performance against traditional Logistic Regression model for this task. The dataset for analysis is provided …


Binary Classification On Past Due Of Service Accounts Using Logistic Regression And Decision Tree, Yan Wang, Jennifer L. Priestley Mar 2019

Binary Classification On Past Due Of Service Accounts Using Logistic Regression And Decision Tree, Yan Wang, Jennifer L. Priestley

Jennifer L. Priestley

This paper aims at predicting businesses’ past due in service accounts as well as determining the variables that impact the likelihood of repayment. Two binary classification approaches, logistic regression and the decision tree, were conducted and compared. Both approaches have very good performances with respect to the accuracy. However, the decision tree only uses 10 predictors and reaches an accuracy of 96.69% on the validation set while logistic regression includes 14 predictors and reaches an accuracy of 94.58%. Due to the large concern of false negatives in financial industry, the decision tree technique is a better option than logistic regression …


An Analysis Of Accuracy Using Logistic Regression And Time Series, Edwin Baidoo, Jennifer L. Priestley Mar 2019

An Analysis Of Accuracy Using Logistic Regression And Time Series, Edwin Baidoo, Jennifer L. Priestley

Jennifer L. Priestley

This paper analyzes the accuracy rates for logistic regression and time series models. It also examines a relatively new performance index that takes into consideration the business assumptions of credit markets. Although prior research has focused on evaluation metrics, such as AUC and Gini index, this new measure has a more intuitive interpretation for various managers and decision makers and can be applied to both Logistic and Time Series models.


Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler Aug 2018

Random Forest Vs Logistic Regression: Binary Classification For Heterogeneous Datasets, Kaitlin Kirasich, Trace Smith, Bivin Sadler

SMU Data Science Review

Selecting a learning algorithm to implement for a particular application on the basis of performance still remains an ad-hoc process using fundamental benchmarks such as evaluating a classifier’s overall loss function and misclassification metrics. In this paper we address the difficulty of model selection by evaluating the overall classification performance between random forest and logistic regression for datasets comprised of various underlying structures: (1) increasing the variance in the explanatory and noise variables, (2) increasing the number of noise variables, (3) increasing the number of explanatory variables, (4) increasing the number of observations. We developed a model evaluation tool capable …


Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan Jun 2018

Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan

Journal of Modern Applied Statistical Methods

This article showed how and why the Rasch model can be fitted under the logistic regression framework. Then a penalized maximum likelihood (Firth 1993) for logistic regression models can also be used to reduce ML biases when fitting the Rasch model. These conclusions are supported by a simulation study.


On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams Mar 2018

On Some Ridge Regression Estimators For Logistic Regression Models, Ulyana P. Williams

FIU Electronic Theses and Dissertations

The purpose of this research is to investigate the performance of some ridge regression estimators for the logistic regression model in the presence of moderate to high correlation among the explanatory variables. As a performance criterion, we use the mean square error (MSE), the mean absolute percentage error (MAPE), the magnitude of bias, and the percentage of times the ridge regression estimator produces a higher MSE than the maximum likelihood estimator. A Monto Carlo simulation study has been executed to compare the performance of the ridge regression estimators under different experimental conditions. The degree of correlation, sample size, number of …