Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Statistical Models

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman Aug 2023

Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman

Electronic Theses and Dissertations

This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A …


Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi Jan 2023

Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi

Electronic Theses and Dissertations

This thesis delves into cybersecurity by applying Deep Reinforcement(DRL) Learning in network intrusion detection. One advantage of DRL is the ability to adapt to changing network conditions and evolving attack methods, making it a promising solution for addressing the challenges involved in intrusion detection. The thesis will also discuss the obstacles and benefits of using Classification methods for network intrusion detection and the need for high-quality training data. To train and test our proposed method, the NSL-KDD dataset was used and then adjusted by converting it from a multi-classification to a binary classification, achieved by joining all attacks into one. …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage Aug 2022

Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage

Electronic Theses and Dissertations

Due to the wide availability of functional data from multiple disciplines, the studies of functional data analysis have become popular in the recent literature. However, the related development in censored survival data has been relatively sparse. In Chapter 2, we consider the problem of analyzing time-to-event data in the presence of functional predictors. We develop a conditional generalized Kaplan Meier (KM) estimator that incorporates functional predictors using kernel weights and rigorously establishes its asymptotic properties. In addition, we propose to select the optimal bandwidth based on a time-dependent Brier score. We then carry out extensive numerical studies to examine the …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel Dec 2021

Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel

Electronic Theses and Dissertations

Statistical inference for the mean of a beta distribution has become increasingly popular in various fields of academic research. In this study, we developed a novel statistical model from likelihood-based techniques to evaluate various confidence interval techniques for the mean of a beta distribution. Simulation studies will be implemented to compare the performance of the confidence intervals. In addition to the development and study involving confidence intervals, we will also apply the confidence intervals to real biological data that was gathered by the Department of Biology at Stephen F. Austin State University and provide recommendations on the best practice.


Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin Aug 2021

Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin

Electronic Theses and Dissertations

In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Essays On Mixture Models, Trevor R. Camper Jan 2019

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …


Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh Jan 2018

Some New And Generalized Distributions Via Exponentiation, Gamma And Marshall-Olkin Generators With Applications, Hameed Abiodun Jimoh

Electronic Theses and Dissertations

Three new generalized distributions developed via completing risk, gamma generator, Marshall-Olkin generator and exponentiation techniques are proposed and studied. Structural properties including quantile functions, hazard rate functions, moment, conditional moments, mean deviations, R\'enyi entropy, distribution of order statistics and maximum likelihood estimates are presented. Monte Carlo simulation is employed to examine the performance of the proposed distributions. Applications of the generalized distributions to real lifetime data are presented to illustrate the usefulness of the models.


Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek Aug 2017

Examination And Comparison Of The Performance Of Common Non-Parametric And Robust Regression Models, Gregory F. Malek

Electronic Theses and Dissertations

ABSTRACT

Examination and Comparison of the Performance of Common Non-Parametric and Robust Regression Models

By

Gregory Frank Malek

Stephen F. Austin State University, Masters in Statistics Program,

Nacogdoches, Texas, U.S.A.

g_m_2002@live.com

This work investigated common alternatives to the least-squares regression method in the presence of non-normally distributed errors. An initial literature review identified a variety of alternative methods, including Theil Regression, Wilcoxon Regression, Iteratively Re-Weighted Least Squares, Bounded-Influence Regression, and Bootstrapping methods. These methods were evaluated using a simple simulated example data set, as well as various real data sets, including math proficiency data, Belgian telephone call data, and faculty …


Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch May 2017

Performance Of Imputation Algorithms On Artificially Produced Missing At Random Data, Tobias O. Oketch

Electronic Theses and Dissertations

Missing data is one of the challenges we are facing today in modeling valid statistical models. It reduces the representativeness of the data samples. Hence, population estimates, and model parameters estimated from such data are likely to be biased.

However, the missing data problem is an area under study, and alternative better statistical procedures have been presented to mitigate its shortcomings. In this paper, we review causes of missing data, and various methods of handling missing data. Our main focus is evaluating various multiple imputation (MI) methods from the multiple imputation of chained equation (MICE) package in the statistical software …


A Markov Decision Process Approach To Adaptive Contact Strategies, Artur Grygorian Jan 2017

A Markov Decision Process Approach To Adaptive Contact Strategies, Artur Grygorian

Electronic Theses and Dissertations

In the field of survey methodology, optimizing contact strategies helps organizations increase response rates using their allocated budget. Markov Decision Processes (MDP) are widely used to model decision-making strategies in situations where the outcomes have a random component. In this research, we use MDPs and adaptive sampling techniques to construct a strategy that, based on target audience characteristics, suggests the best contact policy. The data we use comes from the First Destination Survey conducted by the Office of Career Services at Georgia Southern University. The constructed model is quite flexible and can be used by other organizations to optimize their …


Quasi-Random Action Selection In Markov Decision Processes, Samuel D. Walker Jan 2017

Quasi-Random Action Selection In Markov Decision Processes, Samuel D. Walker

Electronic Theses and Dissertations

In Markov decision processes an operator exploits known data regarding the environment it inhabits. The information exploited is learned from random exploration of the state-action space. This paper proposes to optimize exploration through the implementation of quasi-random sequences in both discrete and continuous state-action spaces. For the discrete case a permutation is applied to the indices of the action space to avoid repetitive behavior. In the continuous case sequences of low discrepancy, such as Halton sequences, are utilized to disperse the actions more uniformly.


A Multi-Indexed Logistic Model For Time Series, Xiang Liu Dec 2016

A Multi-Indexed Logistic Model For Time Series, Xiang Liu

Electronic Theses and Dissertations

In this thesis, we explore a multi-indexed logistic regression (MILR) model, with particular emphasis given to its application to time series. MILR includes simple logistic regression (SLR) as a special case, and the hope is that it will in some instances also produce significantly better results. To motivate the development of MILR, we consider its application to the analysis of both simulated sine wave data and stock data. We looked at well-studied SLR and its application in the analysis of time series data. Using a more sophisticated representation of sequential data, we then detail the implementation of MILR. We compare …


Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh Aug 2016

Newsvendor Models With Monte Carlo Sampling, Ijeoma W. Ekwegh

Electronic Theses and Dissertations

Newsvendor Models with Monte Carlo Sampling by Ijeoma Winifred Ekwegh The newsvendor model is used in solving inventory problems in which demand is random. In this thesis, we will focus on a method of using Monte Carlo sampling to estimate the order quantity that will either maximizes revenue or minimizes cost given that demand is uncertain. Given data, the Monte Carlo approach will be used in sampling data over scenarios and also estimating the probability density function. A bootstrapping process yields an empirical distribution for the order quantity that will maximize the expected profit. Finally, this method will be used …


Multilevel Models For Longitudinal Data, Aastha Khatiwada Aug 2016

Multilevel Models For Longitudinal Data, Aastha Khatiwada

Electronic Theses and Dissertations

Longitudinal data arise when individuals are measured several times during an ob- servation period and thus the data for each individual are not independent. There are several ways of analyzing longitudinal data when different treatments are com- pared. Multilevel models are used to analyze data that are clustered in some way. In this work, multilevel models are used to analyze longitudinal data from a case study. Results from other more commonly used methods are compared to multilevel models. Also, comparison in output between two software, SAS and R, is done. Finally a method consisting of fitting individual models for each …


Are Highly Dispersed Variables More Extreme? The Case Of Distributions With Compact Support, Benedict E. Adjogah May 2014

Are Highly Dispersed Variables More Extreme? The Case Of Distributions With Compact Support, Benedict E. Adjogah

Electronic Theses and Dissertations

We consider discrete and continuous symmetric random variables X taking values in [0; 1], and thus having expected value 1/2. The main thrust of this investigation is to study the correlation between the variance, Var(X) of X and the value of the expected maximum E(Mn) = E(X1,...,Xn) of n independent and identically distributed random variables X1,X2,...,Xn, each distributed as X. Many special cases are studied, some leading to very interesting alternating sums, and some progress is made towards a general theory.


Generalized Weibull And Inverse Weibull Distributions With Applications, Valeriia Sherina Jan 2014

Generalized Weibull And Inverse Weibull Distributions With Applications, Valeriia Sherina

Electronic Theses and Dissertations

In this thesis, new classes of Weibull and inverse Weibull distributions including the generalized new modified Weibull (GNMW), gamma-generalized inverse Weibull (GGIW), the weighted proportional inverse Weibull (WPIW) and inverse new modified Weibull (INMW) distributions are introduced. The GNMW contains several sub-models including the new modified Weibull (NMW), generalized modified Weibull (GMW), modified Weibull (MW), Weibull (W) and exponential (E) distributions, just to mention a few. The class of WPIW distributions contains several models such as: length-biased, hazard and reverse hazard proportional inverse Weibull, proportional inverse Weibull, inverse Weibull, inverse exponential, inverse Rayleigh, and Frechet distributions as special cases. Included …


An Investigation Of Sensitivity Of An F Test In Locating Change Points In Linear Regression, Jing Sun Jan 2014

An Investigation Of Sensitivity Of An F Test In Locating Change Points In Linear Regression, Jing Sun

Electronic Theses and Dissertations

Change point is a statistic phenomenon, which has many direct applications in climatology, bioinformatics, finance, oceanography and medical imaging. In this thesis, we investigate the sensitivity of the F-test for detecting change points in linear regression, using a two-phase linear regression model. it offers an effective method to detect "undocumented" change points using a form of an F-test. Using simulated data, we explore its sensitivity and accuracy with respect t different parameters in the model.