Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Statistics and Probability

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith Sep 2020

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith

Journal of Modern Applied Statistical Methods

The impact of sparse data conditions was examined among one or more predictor variables in logistic regression and assessed the effectiveness of the Firth (1993) procedure in reducing potential parameter estimation bias. Results indicated sparseness in binary predictors introduces bias that is substantial with small sample sizes, and the Firth procedure can effectively correct this bias.


Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox Jun 2020

Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox

Journal of Modern Applied Statistical Methods

For a binary random variable Y, let p(x) = P(Y = 1 | X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodness-of-fit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the sample size increases. The goal is to suggest an alternative approach based on a smoother, which provides a more flexible approximation of p(x).


Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo Apr 2020

Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.


Prediction Of High School Graduation With Decision Trees, Andrea M. Lee Aug 2019

Prediction Of High School Graduation With Decision Trees, Andrea M. Lee

MSU Graduate Theses

While working as an educator for the past fourteen years, we are always looking at data and determining ways to help our students. Graduation status is one area of interest. I wanted to apply statistical methods to try and find early indicators of those students who may drop out, thus being able to provide early intervention to those students. With early intervention, we may be able to lower our dropout rate. While studying different methods of pattern recognition, I found that the decision tree method in machine learning was the best for the data that I had collected. Decision trees …


Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan Jun 2018

Fitting The Rasch Model Under The Logistic Regression Framework To Reduce Estimation Bias, Tianshu Pan

Journal of Modern Applied Statistical Methods

This article showed how and why the Rasch model can be fitted under the logistic regression framework. Then a penalized maximum likelihood (Firth 1993) for logistic regression models can also be used to reduce ML biases when fitting the Rasch model. These conclusions are supported by a simulation study.


Liu-Type Logistic Estimators With Optimal Shrinkage Parameter, Yasin Asar May 2016

Liu-Type Logistic Estimators With Optimal Shrinkage Parameter, Yasin Asar

Journal of Modern Applied Statistical Methods

Multicollinearity in logistic regression affects the variance of the maximum likelihood estimator negatively. In this study, Liu-type estimators are used to reduce the variance and overcome the multicollinearity by applying some existing ridge regression estimators to the case of logistic regression model. A Monte Carlo simulation is given to evaluate the performances of these estimators when the optimal shrinkage parameter is used in the Liu-type estimators, along with an application of real case data.


Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed May 2011

Logistic Regression Models For Higher Order Transition Probabilities Of Markov Chain For Analyzing The Occurrences Of Daily Rainfall Data, Narayan Chanra Sinha, M. Ataharul Islam, Kazi Saleh Ahamed

Journal of Modern Applied Statistical Methods

Logistic regression models for transition probabilities of higher order Markov models are developed for the sequence of chain dependent repeated observations. To identify the significance of these models and their parameters a test procedure for a likelihood ratio criterion is developed. A method of model selection is suggested on the basis of AIC and BIC procedures. The proposed models and test procedures are applied to analyze the occurrences of daily rainfall data for selected stations in Bangladesh. Based on results from these models, the transition probabilities of first order Markov model for temperature and humidity provided the most suitable option …


Robust Estimators In Logistic Regression: A Comparative Simulation Study, Sanizah Ahmad, Norazan Mohamed Ramli, Habshah Midi Nov 2010

Robust Estimators In Logistic Regression: A Comparative Simulation Study, Sanizah Ahmad, Norazan Mohamed Ramli, Habshah Midi

Journal of Modern Applied Statistical Methods

The maximum likelihood estimator (MLE) is commonly used to estimate the parameters of logistic regression models due to its efficiency under a parametric model. However, evidence has shown the MLE has an unduly effect on the parameter estimates in the presence of outliers. Robust methods are put forward to rectify this problem. This article examines the performance of the MLE and four existing robust estimators under different outlier patterns, which are investigated by real data sets and Monte Carlo simulation.


Estimation Of Risk For Developing Cardiac Problem In Patients Of Type 2 Diabetes As Obtained By The Technique Of Density Estimation, Ajit Mukherjee, Ajit Mathur, Rakesh Mittal May 2007

Estimation Of Risk For Developing Cardiac Problem In Patients Of Type 2 Diabetes As Obtained By The Technique Of Density Estimation, Ajit Mukherjee, Ajit Mathur, Rakesh Mittal

Journal of Modern Applied Statistical Methods

High levels of cholesterol and triglyceride are known to be strongly associated with development of cardiac problem in patients of type 2 diabetes. In a hospital-based study, patients showing ECG positive were compared with those who were not. The observations on cholesterol and triglyceride were considered for estimation of risk for developing the cardiac problem. The technique of density estimation employing Epanechnikov kernel was used for estimating bivariate probability density functions with respect to observations on cholesterol and triglyceride of the two groups. Using the odds form of Bayes’ rule, the estimates of posterior odds were computed.


Entropy Criterion In Logistic Regression And Shapley Value Of Predictors, Stan Lipovetsky May 2006

Entropy Criterion In Logistic Regression And Shapley Value Of Predictors, Stan Lipovetsky

Journal of Modern Applied Statistical Methods

Entropy criterion is used for constructing a binary response regression model with a logistic link. This approach yields a logistic model with coefficients proportional to the coefficients of linear regression. Based on this property, the Shapley value estimation of predictors’ contribution is applied for obtaining robust coefficients of the linear aggregate adjusted to the logistic model. This procedure produces a logistic regression with interpretable coefficients robust to multicollinearity. Numerical results demonstrate theoretical and practical advantages of the entropy-logistic regression.


Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia, Stylianos Katsaragakis, Christos Koukouvinos, Stella Stylianou, Eleni-Maria Theodoraki, Eleni-Maria Theodoraki Nov 2005

Comparison Of Statistical Tests In Logistic Regression: The Case Of Hypernatreamia, Stylianos Katsaragakis, Christos Koukouvinos, Stella Stylianou, Eleni-Maria Theodoraki, Eleni-Maria Theodoraki

Journal of Modern Applied Statistical Methods

The logistic regression has become an integral component of any medical data analysis concerning binary responses. The main issue rising after the adaptation of the final model is its goodness-of-fit. The fit of the model is assessed via the overall measures and summary statistics and comparing them in the case of hypernateamia.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


Testing The Goodness Of Fit Of Multivariate Multiplicative-Intercept Risk Models Based On Case-Control Data, Biao Zhang May 2005

Testing The Goodness Of Fit Of Multivariate Multiplicative-Intercept Risk Models Based On Case-Control Data, Biao Zhang

Journal of Modern Applied Statistical Methods

The validity of the multivariate multiplicative-intercept risk model with I +1 categories based on casecontrol data is tested. After reparametrization, the assumed risk model is equivalent to an (I +1) -sample semiparametric model in which the I ratios of two unspecified density functions have known parametric forms. By identifying this (I +1) -sample semiparametric model, which is of intrinsic interest in general (I +1) -sample problems, with an (I +1) -sample semiparametric selection bias model, we propose a weighted Kolmogorov-Smirnov-type statistic to test the validity of the multivariate multiplicativeintercept risk model. Established are some asymptotic results …


A Generalized Quasi-Likelihood Model Application To Modeling Poverty Of Asian American Women, Jeffrey R. Wilson May 2004

A Generalized Quasi-Likelihood Model Application To Modeling Poverty Of Asian American Women, Jeffrey R. Wilson

Journal of Modern Applied Statistical Methods

A generalized quasi-likelihood function that does not require the assumption of an underlying distribution when modeling jointly the mean and the variance, is introduced to examine poverty of Asian American women living in the West coast of the United States, using data from U.S. Census Bureau.


Modeling Strategies In Logistic Regression With Sas, Spss, Systat, Bmdp, Minitab, And Stata, Chao-Ying Joanne Peng, Tak-Shing Harry So May 2002

Modeling Strategies In Logistic Regression With Sas, Spss, Systat, Bmdp, Minitab, And Stata, Chao-Ying Joanne Peng, Tak-Shing Harry So

Journal of Modern Applied Statistical Methods

This paper addresses modeling strategies in logistic regression within the context of a real-world data set. Six commercially available statistical packages were evaluated in how they addressed modeling issues and in the accuracy of their regression results. Recommendations are offered for data analysts in terms of each package's strengths and weaknesses.