Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Electronic Theses and Dissertations

Discipline
Institution
Keyword
Publication Year

Articles 1 - 30 of 52

Full-Text Articles in Statistical Models

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara Dec 2023

Bayesian Strategies For Propensity Score Estimation In Causal Inference., Uthpala I. Wanigasekara

Electronic Theses and Dissertations

Causal inference is a method used in various fields to draw causal conclusions based on data. It involves using assumptions, study designs, and estimation strategies to minimize the impact of confounding variables. Propensity scores are used to estimate outcome effects, through matching methods, stratification, weighting methods, and the Covariate Balancing Propensity Score method. However, they can be sensitive to estimation techniques and can lead to unstable findings. Researchers have proposed integrating weighing with regression adjustment in parametric models to improve causal inference validity. The first project focuses on Bayesian joint and two-stage methods for propensity score analysis. Propensity score modeling …


The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña Nov 2023

The Use Of Regularization To Detect Racial Inequities In Pay Equity Studies: An Empirical Study And Reflections On Regulation Methods, Christopher M. Peña

Electronic Theses and Dissertations

Since the late 1970s, multiple linear regression has been the preferred method for identifying discrimination in pay. An empirical study on this topic was conducted using quantitative critical methods. A literature review first examined conflicting views on using multiple linear regression in pay equity studies. The review found that multiple linear regression is used so prevalently in pay equity studies because the courts and practitioners have widely accepted it and because of its simplicity and ability to parse multiple sources of variance simultaneously. Commentaries in the literature cautioned about errors in model specification, the use of tainted variables, and the …


Indirect Aggression And Victimization: Investigating Instrument Psychometrics, Gender Differences, And Its Relationship To Social Information Processing, Taylor Steeves Aug 2023

Indirect Aggression And Victimization: Investigating Instrument Psychometrics, Gender Differences, And Its Relationship To Social Information Processing, Taylor Steeves

Electronic Theses and Dissertations

The study of indirect bullying behaviors, relational aggression and social aggression, has been of theoretical importance and interest to researchers and psychologists within the last few decades. In this investigation, using a convenience sample of 451 late adolescents attending a private university in the mid-Atlantic U.S., I examined the factor structure of two measures of indirect bullying, the Young Adult Social Behavior Scale – Victim (YASB-V) and the Young Adult Social Behavior Scale – Perpetrator (YASB-P). Using confirmatory factor analysis (CFA), I found that the YASB-V comprised a four-factor model, differing from the model that had been identified in the …


Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman Aug 2023

Statistical Inference On Lung Cancer Screening Using The National Lung Screening Trial Data., Farhin Rahman

Electronic Theses and Dissertations

This dissertation consists of three research projects on cancer screening probability modeling. In these projects, the three key modeling parameters (sensitivity, sojourn time, transition density) for cancer screening were estimated, along with the long-term outcomes (including overdiagnosis as one outcome), the optimal screening time/age, the lead time distribution, and the probability of overdiagnosis at the future screening time were simulated to provide a statistical perspective on the effectiveness of cancer screening programs. In the first part of this dissertation, a statistical inference was conducted for male and female smokers using the National Lung Screening Trial (NLST) chest X-ray data. A …


Factors Affecting Apothecia Production And Primary Infection By Monilinia Vaccinii-Corymbosi On Vaccinium Angustifolium, Ian Leonard May 2023

Factors Affecting Apothecia Production And Primary Infection By Monilinia Vaccinii-Corymbosi On Vaccinium Angustifolium, Ian Leonard

Electronic Theses and Dissertations

Mummy berry, caused by Monilinia vaccinii-corymbosi (MVC), is a prolific disease of Vaccinium angustifolium (wild blueberry) leading to decreased yield in wild blueberry fields throughout the Downeast (DE) and Midcoast (MC) regions of Maine (ME). This study aimed to identify factors affecting primary inoculum production and infection by MVC on wild blueberry, and what bud stages of wild blueberry are most susceptible to infection. Through common garden (CGE), field and incubation experiments conducted in 2021 and 2022, factors affecting carpogenic germination of MVC pseudosclerotia and relationships between susceptible wild blueberry buds and environmental factors were analyzed. The CGE conducted in …


An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting May 2023

An Analysis Of Changes In Seasonal Dynamics And Generational Differences In The Maine Lobster Fishery, Emily Fitting

Electronic Theses and Dissertations

The American lobster (Homarus americanus) supports the most valuable single species fishery in the US. Lobster landings have been increasing steadily for the last three decades, but before that landings were more variable. The high value of the lobster fishery combined with the decline of other commercially important species in this region has created increasing dependence on the resource, and previous research questions the resilience of the fishery in the face of social and environmental changes.

Important lobster life history processes, including migration patterns, growth rates, and reproduction, are driven by ocean bottom temperature, which creates a strong seasonal cycle …


Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi Jan 2023

Network Intrusion Detection Using Deep Reinforcement Learning, Hamed T. Sanusi

Electronic Theses and Dissertations

This thesis delves into cybersecurity by applying Deep Reinforcement(DRL) Learning in network intrusion detection. One advantage of DRL is the ability to adapt to changing network conditions and evolving attack methods, making it a promising solution for addressing the challenges involved in intrusion detection. The thesis will also discuss the obstacles and benefits of using Classification methods for network intrusion detection and the need for high-quality training data. To train and test our proposed method, the NSL-KDD dataset was used and then adjusted by converting it from a multi-classification to a binary classification, achieved by joining all attacks into one. …


The Influence Of Urban Forms And Street Infrastructure On Pedestrian-Motorist Collisions, Taylor J. Foreman Jan 2023

The Influence Of Urban Forms And Street Infrastructure On Pedestrian-Motorist Collisions, Taylor J. Foreman

Electronic Theses and Dissertations

Unwalkable cities are afflicted by serious issues such as increasing rates of pedestrian traffic accidents, public health concerns, and the denied right to have an accessible city. This study examines how different types of urban forms and street infrastructure contribute to the prevalence of traffic accidents in two major metropolitan cities in the United States: Atlanta, Georgia, and Boston, Massachusetts. This study utilizes geospatial analysis through the Average Nearest Neighbor and Optimized Hot Spot Analysis tools to determine the spatial distribution of traffic accidents throughout both cities. Additionally, statistical tests were conducted to explore the relationships between the number of …


Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury Dec 2022

Bayesian Methods For Graphical Models With Neighborhood Selection., Sagnik Bhadury

Electronic Theses and Dissertations

Graphical models determine associations between variables through the notion of conditional independence. Gaussian graphical models are a widely used class of such models, where the relationships are formalized by non-null entries of the precision matrix. However, in high-dimensional cases, covariance estimates are typically unstable. Moreover, it is natural to expect only a few significant associations to be present in many realistic applications. This necessitates the injection of sparsity techniques into the estimation method. Classical frequentist methods, like GLASSO, use penalization techniques for this purpose. Fully Bayesian methods, on the contrary, are slow because they require iteratively sampling over a quadratic …


Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche Aug 2022

Computer Aided Diagnosis System For Breast Cancer Using Deep Learning., Asma Baccouche

Electronic Theses and Dissertations

The recent rise of big data technology surrounding the electronic systems and developed toolkits gave birth to new promises for Artificial Intelligence (AI). With the continuous use of data-centric systems and machines in our lives, such as social media, surveys, emails, reports, etc., there is no doubt that data has gained the center of attention by scientists and motivated them to provide more decision-making and operational support systems across multiple domains. With the recent breakthroughs in artificial intelligence, the use of machine learning and deep learning models have achieved remarkable advances in computer vision, ecommerce, cybersecurity, and healthcare. Particularly, numerous …


Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage Aug 2022

Statistical Methods For Personalized Treatment Selection And Survival Data Analysis Based On Observational Data With High-Dimensional Covariates., Don Ramesh Dinendra Sudaraka Tholkage

Electronic Theses and Dissertations

Due to the wide availability of functional data from multiple disciplines, the studies of functional data analysis have become popular in the recent literature. However, the related development in censored survival data has been relatively sparse. In Chapter 2, we consider the problem of analyzing time-to-event data in the presence of functional predictors. We develop a conditional generalized Kaplan Meier (KM) estimator that incorporates functional predictors using kernel weights and rigorously establishes its asymptotic properties. In addition, we propose to select the optimal bandwidth based on a time-dependent Brier score. We then carry out extensive numerical studies to examine the …


Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns May 2022

Finding A Representative Distribution For The Tail Index Alpha, Α, For Stock Return Data From The New York Stock Exchange, Jett Burns

Electronic Theses and Dissertations

Statistical inference is a tool for creating models that can accurately display real-world events. Special importance is given to the financial methods that model risk and large price movements. A parameter that describes tail heaviness, and risk overall, is α. This research finds a representative distribution that models α. The absolute value of standardized stock returns from the Center for Research on Security Prices are used in this research. The inference is performed using R. Approximations for α are found using the ptsuite package. The GAMLSS package employs maximum likelihood estimation to estimate distribution parameters using the CRSP data. The …


Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel Dec 2021

Confidence Interval For The Mean Of A Beta Distribution, Sean Rangel

Electronic Theses and Dissertations

Statistical inference for the mean of a beta distribution has become increasingly popular in various fields of academic research. In this study, we developed a novel statistical model from likelihood-based techniques to evaluate various confidence interval techniques for the mean of a beta distribution. Simulation studies will be implemented to compare the performance of the confidence intervals. In addition to the development and study involving confidence intervals, we will also apply the confidence intervals to real biological data that was gathered by the Department of Biology at Stephen F. Austin State University and provide recommendations on the best practice.


Predictive Modeling Of Clinical Outcomes For Hospitalized Covid-19 Patients Utilizing Cytof And Clinical Data., Onajia Stubblefield Aug 2021

Predictive Modeling Of Clinical Outcomes For Hospitalized Covid-19 Patients Utilizing Cytof And Clinical Data., Onajia Stubblefield

Electronic Theses and Dissertations

In December 2019, an outbreak of a novel coronavirus initiated a global pandemic. Severe acute respiratory syndrome coronavirus 2 (SARS-CoV-2) is a virus that causes the disease coronavirus disease 2019 (COVID-19). Symptoms of infection with COVID-19 vary widely between individuals. While some infected individuals are asymptomatic, others need more extensive care and require hospitalization. Indeed, the COVID-19 pandemic was characterized by a shortage of hospital beds which presented additional complications in providing adequate care for patients. In this study, we used a combination of T cell population data collected from mass cytometry analysis and clinical markers to form a predictive …


Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin Aug 2021

Bayesian Variable Selection Strategies In Longitudinal Mixture Models And Categorical Regression Problems., Md Nazir Uddin

Electronic Theses and Dissertations

In this work, we seek to develop a variable screening and selection method for Bayesian mixture models with longitudinal data. To develop this method, we consider data from the Health and Retirement Survey (HRS) conducted by University of Michigan. Considering yearly out-of-pocket expenditures as the longitudinal response variable, we consider a Bayesian mixture model with $K$ components. The data consist of a large collection of demographic, financial, and health-related baseline characteristics, and we wish to find a subset of these that impact cluster membership. An initial mixture model without any cluster-level predictors is fit to the data through an MCMC …


Assessing The Variations Of Educational Attainment At National And Subnational Levels Using Hierarchical Linear Models, Bingxin Qi Jan 2021

Assessing The Variations Of Educational Attainment At National And Subnational Levels Using Hierarchical Linear Models, Bingxin Qi

Electronic Theses and Dissertations

Education is a human right, and equal access to education is not only crucial for an individual’s well-being, but also essential for eradicating poverty, ensuring long-term prosperity for all, transforming the society, and achieving sustainable development. Measuring education development, especially the variations of educational attainment, in a timely and accurate manner can help educators, practitioners, scientists, and policymakers compare and evaluate various education indicators at both subnational and national levels. This research presents an approach that combines multi-source and multidimensional data including population distribution, human settlement, and education data to assess and explore educational attainment trajectories at both national and …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Measuring The Connective Action Of Black Lives Matter Activists: A Psychometric Investigation Into Twitter Data, Paige Alfonzo Jan 2020

Measuring The Connective Action Of Black Lives Matter Activists: A Psychometric Investigation Into Twitter Data, Paige Alfonzo

Electronic Theses and Dissertations

Many protest movements from the last twenty-first century have become increasingly networked and personalized. Several scholars have tapped into this change coining terms such as participatory action, digitally mediated action, computer-mediated communication, issue-based organization, and what I focus on in this project, connective action. Building on the ideas percolating across the literary landscape at the time, Bennett and Segerberg (2012) introduced the logic of connective action based on emergent characteristics they observed in post-2010 large-scale social movements. Both the logic of connective action and related work have become deeply ingrained in today's social movement scholarship. As such, I felt it …


Assessing Robustness Of The Rasch Mixture Model To Detect Differential Item Functioning - A Monte Carlo Simulation Study, Jinjin Huang Jan 2020

Assessing Robustness Of The Rasch Mixture Model To Detect Differential Item Functioning - A Monte Carlo Simulation Study, Jinjin Huang

Electronic Theses and Dissertations

Measurement invariance is crucial for an effective and valid measure of a construct. Invariance holds when the latent trait varies consistently across subgroups; in other words, the mean differences among subgroups are only due to true latent ability differences. Differential item functioning (DIF) occurs when measurement invariance is violated. There are two kinds of traditional tools for DIF detection: non-parametric methods and parametric methods. Mantel Haenszel (MH), SIBTEST, and standardization are examples of non-parametric DIF detection methods. The majority of parametric DIF detection methods are item response theory (IRT) based. Both non-parametric methods and parametric methods compare differences among subgroups …


Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood Aug 2019

Identifying Risk Factors Related To Premature Birth Through Binary Logistic And Proportional Odds Ordinal Logistic Regression, Clayton Elwood

Electronic Theses and Dissertations

Premature birth has been identified as the single greatest cause of death worldwide in children under the age of five. This thesis will implement binary logistic regression and proportional odds ordinal logistic regression to predict different levels of premature birth and identify associated risk factors. The models will be built from the Center for Disease Control and Prevention's 2014 Vital Statistics Natality Birth Data containing nearly 4 million live births within the United States. Odds ratios and confidence intervals on risk factors were produced utilizing binary logistic regression.


Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu Aug 2019

Robustness Of Semi-Parametric Survival Model: Simulation Studies And Application To Clinical Data, Isaac Nwi-Mozu

Electronic Theses and Dissertations

An efficient way of analyzing survival clinical data such as cancer data is a great concern to health experts. In this study, we investigate and propose an efficient way of handling survival clinical data. Simulation studies were conducted to compare performances of various forms of survival model techniques using an R package ``survsim". Models performance was conducted with varying sample sizes as small ($n5000$). For small and mild samples, the performance of the semi-parametric outperform or approximate the performance of the parametric model. However, for large samples, the parametric model outperforms the semi-parametric model. We compared the effectiveness and reliability …


Paper Structure Formation Simulation, Tyler R. Seekins May 2019

Paper Structure Formation Simulation, Tyler R. Seekins

Electronic Theses and Dissertations

On the surface, paper appears simple, but closer inspection yields a rich collection of chaotic dynamics and random variables. Predictive simulation of paper product properties is desirable for screening candidate experiments and optimizing recipes but existing models are inadequate for practical use. We present a novel structure simulation and generation system designed to narrow the gap between mathematical model and practical prediction. Realistic inputs to the system are preserved as randomly distributed variables. Rapid fiber placement (~1 second/fiber) is achieved with probabilistic approximation of chaotic fluid dynamics and minimization of potential energy to determine flexible fiber conformations. Resulting digital packed …


Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis Jan 2019

Data Patterns Discovery Using Unsupervised Learning, Rachel A. Lewis

Electronic Theses and Dissertations

Self-care activities classification poses significant challenges in identifying children’s unique functional abilities and needs within the exceptional children healthcare system. The accuracy of diagnosing a child's self-care problem, such as toileting or dressing, is highly influenced by an occupational therapists’ experience and time constraints. Thus, there is a need for objective means to detect and predict in advance the self-care problems of children with physical and motor disabilities. We use clustering to discover interesting information from self-care problems, perform automatic classification of binary data, and discover outliers. The advantages are twofold: the advancement of knowledge on identifying self-care problems in …


Essays On Mixture Models, Trevor R. Camper Jan 2019

Essays On Mixture Models, Trevor R. Camper

Electronic Theses and Dissertations

When considering statistical scenarios where one can sample from populations that are not of interest for the purposes of a study, bivariate mixture models can be used to study the effect that this missampling can have on parameter estimation. In this thesis, we will examine the behavior that bivariate mixture models have on two statistical constructs: Cronbach's alpha \cite{C51}, and Spearman's rho \cite{S04}. Chapter 1 will introduce notions of mixture models and the definition of bias under mixture models which will serve as the central concept of this thesis. Chapter 2 will investigate a particular psychometric issue known as insufficient …


Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya Jan 2019

Variable Selection In Accelerated Failure Time (Aft) Frailty Models: An Application Of Penalized Quasi-Likelihood, Sarbesh R. Pandeya

Electronic Theses and Dissertations

Variable selection is one of the standard ways of selecting models in large scale datasets. It has applications in many fields of research study, especially in large multi-center clinical trials. One of the prominent methods in variable selection is the penalized likelihood, which is both consistent and efficient. However, the penalized selection is significantly challenging under the influence of random (frailty) covariates. It is even more complicated when there is involvement of censoring as it may not have a closed-form solution for the marginal log-likelihood. Therefore, we applied the penalized quasi-likelihood (PQL) approach that approximates the solution for such a …


Effectiveness Of Prescribed Fire On Meeting Fuel Load And Wildlife Habitat Management Objectives In East Texas National Forests, Trey Wall Dec 2018

Effectiveness Of Prescribed Fire On Meeting Fuel Load And Wildlife Habitat Management Objectives In East Texas National Forests, Trey Wall

Electronic Theses and Dissertations

Using standardized methodology outlined by the United States Forest Service and the National Forests and Grasslands in Texas’ Fire Monitoring Program for data collection, the efficacy of current Forest Service prescribed burn regimes were analyzed for 24 study sites in East Texas National Forests. Study sites were located within Sam Houston, Davy Crockett, and Angelina/Sabine National Forests. Efficacy was determined by comparing defined management objectives established by the Forest Service to the data collected at the study sites. The results conclude that most objectives, as outlined by the Forest Service, are not being met with the current practices. Re-visitation of …


Wald Confidence Intervals For A Single Poisson Parameter And Binomial Misclassification Parameter When The Data Is Subject To Misclassification, Nishantha Janith Chandrasena Poddiwala Hewage Aug 2018

Wald Confidence Intervals For A Single Poisson Parameter And Binomial Misclassification Parameter When The Data Is Subject To Misclassification, Nishantha Janith Chandrasena Poddiwala Hewage

Electronic Theses and Dissertations

This thesis is based on a Poisson model that uses both error-free data and error-prone data subject to misclassification in the form of false-negative and false-positive counts. We present maximum likelihood estimators (MLEs), Fisher's Information, and Wald statistics for Poisson rate parameter and the two misclassification parameters. Next, we invert the Wald statistics to get asymptotic confidence intervals for Poisson rate parameter and false-negative rate parameter. The coverage and width properties for various sample size and parameter configurations are studied via a simulation study. Finally, we apply the MLEs and confidence intervals to one real data set and another realistic …


Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister Aug 2018

Distribution Of A Sum Of Random Variables When The Sample Size Is A Poisson Distribution, Mark Pfister

Electronic Theses and Dissertations

A probability distribution is a statistical function that describes the probability of possible outcomes in an experiment or occurrence. There are many different probability distributions that give the probability of an event happening, given some sample size n. An important question in statistics is to determine the distribution of the sum of independent random variables when the sample size n is fixed. For example, it is known that the sum of n independent Bernoulli random variables with success probability p is a Binomial distribution with parameters n and p: However, this is not true when the sample size …


Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal Aug 2018

Generalized Spatiotemporal Modeling And Causal Inference For Assessing Treatment Effects For Multiple Groups For Ordinal Outcome., Soutik Ghosal

Electronic Theses and Dissertations

This dissertation consists of three projects and can be categorized in two broad research areas: generalized spatiotemporal modeling and causal inference based on observational data. In the first project, I introduce a Bayesian hierarchical mixed effect hurdle model with a nested random effect structure to model the count for primary care providers and understand their spatial and temporal variation. This study further enables us to identify the health professional shortage areas and the possible impacting factors. In the second project, I have unified popular parametric and nonparametric propensity score-based methods to assess the treatment effect of multiple groups for ordinal …


Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish May 2018

Spatio-Temporal Dynamics Of Atlantic Cod Bycatch In The Maine Lobster Fishery And Its Impacts On Stock Assessment, Robert E. Boenish

Electronic Theses and Dissertations

Of the most iconic fish species in the world, the Atlantic cod (Gadus morhua, hereafter, cod) has been a mainstay in the North Atlantic for centuries. While many global fish stocks have received increased pressure with the advent of new, more efficient fishing technology in the mid-20th century, exceptional pressure has been placed on this prized gadoid. Bycatch, or the unintended catch of organisms, is one of the biggest global fisheries issues. Directly resulting from the failed recovery of cod in the GoM, attention has been placed as to possible sources of unaccounted catch. Among the most …