Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith Sep 2020

Logistic Regression Under Sparse Data Conditions, David A. Walker, Thomas J. Smith

Journal of Modern Applied Statistical Methods

The impact of sparse data conditions was examined among one or more predictor variables in logistic regression and assessed the effectiveness of the Firth (1993) procedure in reducing potential parameter estimation bias. Results indicated sparseness in binary predictors introduces bias that is substantial with small sample sizes, and the Firth procedure can effectively correct this bias.


Examining The Statistical Relationships Between Volcanic Seismic, Infrasound, And Electrical Signals: A Case Study Of Sakurajima Volcano, 2015, Cassandra M. Smith, Glenn Thompson, Steven Reader, Sonja A. Behnke, Stephen R. Mcnutt, Ron Thomas, Harald Edens Sep 2020

Examining The Statistical Relationships Between Volcanic Seismic, Infrasound, And Electrical Signals: A Case Study Of Sakurajima Volcano, 2015, Cassandra M. Smith, Glenn Thompson, Steven Reader, Sonja A. Behnke, Stephen R. Mcnutt, Ron Thomas, Harald Edens

School of Geosciences Faculty and Staff Publications

Sakurajima volcano in Japan is known for frequent eruptions containing prolific volcanic lightning. Previous studies from eruptions at Redoubt have shown preliminary correlations between seismic, infrasound, and radio frequency signals. This study uses field data collected at Sakurajima from 28 May–7 June 2015 and multivariable statistical modeling to quantify these relationships. We build regression equations to examine each of the following parameters of electrical activity: (1) the presence of electrical activity, (2) the presence of the radio frequency signal called continual radio frequency impulses (CRF), (3) the presence of lightning, (4) the overall duration of electrical activity, and (5) the …


Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox Jun 2020

Inferences About The Probability Of Success, Given The Value Of A Covariate, Using A Nonparametric Smoother, Rand Wilcox

Journal of Modern Applied Statistical Methods

For a binary random variable Y, let p(x) = P(Y = 1 | X = x) for some covariate X. The goal of computing a confidence interval for p(x) is considered. In the logistic regression model, even a slight departure difficult to detect via a goodness-of-fit test can yield inaccurate results. The accuracy of a confidence interval can deteriorate as the sample size increases. The goal is to suggest an alternative approach based on a smoother, which provides a more flexible approximation of p(x).


Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo Apr 2020

Investigating The Performance Of Propensity Score Approaches For Differential Item Functioning Analysis, Yan Liu, Chanmin Kim, Amrey D. Wu, Paul Gustafson, Edward Kroc, Bruno D. Zumbo

Journal of Modern Applied Statistical Methods

To evaluate the performance of propensity score approaches for differential item functioning analysis, this simulation study was conducted to assess bias, mean square error, Type I error, and power under different levels of effect size and a variety of model misspecification conditions, including different types and missing patterns of covariates.


An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone Jan 2020

An Automatic Interaction Detection Hybrid Model For Bankcard Response Classification, Yan Wang, Sherry Ni, Brian Stone

Published and Grey Literature from PhD Candidates

Data mining techniques have numerous applications in bankcard response modeling. Logistic regression has been used as the standard modeling tool in the financial industry because of its almost always desirable performance and its interpretability. In this paper, we propose a hybrid bankcard response model, which integrates decision tree-based chi-square automatic interaction detection (CHAID) into logistic regression. In the first stage of the hybrid model, CHAID analysis is used to detect the possible potential variable interactions. Then in the second stage, these potential interactions are served as the additional input variables in logistic regression. The motivation of the proposed hybrid model …


Gdom: Granulometry For The Detection Of Obfuscated Malware, John A. Aruta, N. Paul Schembari Jan 2020

Gdom: Granulometry For The Detection Of Obfuscated Malware, John A. Aruta, N. Paul Schembari

Journal of Cybersecurity Education, Research and Practice

We describe the results of a master's thesis in malware detection and discuss the connection to the learning goals of the project. As part of the thesis, we studied obfuscation of malware, conversion of files into images, image processing, and machine learning, a process of benefit to both the student and faculty.

Malware detection becomes significantly more difficult when the malicious specimen is obfuscated or transformed in an attempt to avoid detection. However, computer files have been shown to exhibit evidence of structure when converted into images, so with image processing filters such as granulometry, it is possible to generate …


Quantifying The Varying Predictive Value Of Physical Activity Measures Obtained From Wearable Accelerometers On All-Cause Mortality Over Short To Medium Time Horizons In Nhanes 2003-2006, Lucia Tabacu, Mark Ledbetter, Andrew Leroux, Ciprian Crainiceanu, Ekaterina Smirnova Jan 2020

Quantifying The Varying Predictive Value Of Physical Activity Measures Obtained From Wearable Accelerometers On All-Cause Mortality Over Short To Medium Time Horizons In Nhanes 2003-2006, Lucia Tabacu, Mark Ledbetter, Andrew Leroux, Ciprian Crainiceanu, Ekaterina Smirnova

Mathematics & Statistics Faculty Publications

Physical activity measures derived from wearable accelerometers have been shown to be highly predictive of all-cause mortality. Prediction models based on traditional risk factors and accelerometry-derived physical activity measures are developed for five time horizons. The data set contains 2978 study participants between 50 and 85 years old with an average of 13.08 years of follow-up in the NHANES 2003–2004 and 2005–2006. Univariate and multivariate logistic regression models were fit separately for five datasets for one- to five-year all-cause mortality as outcome (number of events 46, 94, 155, 218, and 297, respectively). In univariate models the total activity count (TAC) …


Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu Jan 2020

Nonparametric Misclassification Simulation And Extrapolation Method And Its Application, Congjian Liu

Electronic Theses and Dissertations

The misclassification simulation extrapolation (MC-SIMEX) method proposed by Küchenho et al. is a general method of handling categorical data with measurement error. It consists of two steps, the simulation and extrapolation steps. In the simulation step, it simulates observations with varying degrees of measurement error. Then parameter estimators for varying degrees of measurement error are obtained based on these observations. In the extrapolation step, it uses a parametric extrapolation function to obtain the parameter estimators for data with no measurement error. However, as shown in many studies, the parameter estimators are still biased as a result of the parametric extrapolation …