Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

Theses/Dissertations

2020

Institution
Keyword
Publication

Articles 1 - 18 of 18

Full-Text Articles in Statistical Models

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek Dec 2020

Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek

Graduate Theses and Dissertations

Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear models, …


Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman Dec 2020

Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman

Master's Theses

Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …


Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi Nov 2020

Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi

Electronic Thesis and Dissertation Repository

Understanding the dynamics of wildfires contributes significantly to the development of fire science. Challenges in the analysis of historical fire data include defining fire dynamics within existing statistical frameworks, modeling the duration and size of fires as joint outcomes, identifying the how fires are grouped into clusters of subpopulations, and assessing the effect of environmental variables in different modeling frameworks. We develop novel statistical methods to consider outcomes related to fire science jointly. These methods address these challenges by linking univariate models for separate outcomes through shared random effects, an approach referred to as joint modeling. Comparisons with existing …


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li Apr 2020

Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li

Electronic Thesis and Dissertation Repository

In this thesis, Gibbs point process (GPP) models are constructed to study the spatial distribution of objects in the star formation complexes of the M33 galaxy. The GPP models circumvent the limitations of the two-point correlation function employed in the current astronomy literature by naturally accounting for the inhomogeneous distribution of these objects. The spatial distribution of these objects serves as a sensitive probe in understanding the star formation process, which is crucial in understanding the formation of galaxies and the Universe. The objects under study include the CO filament structure, giant molecular clouds (GMCs) and young stellar cluster candidates …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


The Analysis Of Neural Heterogeneity Through Mathematical And Statistical Methods, Kyle Wendling Jan 2020

The Analysis Of Neural Heterogeneity Through Mathematical And Statistical Methods, Kyle Wendling

Theses and Dissertations

Diversity of intrinsic neural attributes and network connections is known to exist in many areas of the brain and is thought to significantly affect neural coding. Recent theoretical and experimental work has argued that in uncoupled networks, coding is most accurate at intermediate levels of heterogeneity. I explore this phenomenon through two distinct approaches: a theoretical mathematical modeling approach and a data-driven statistical modeling approach.

Through the mathematical approach, I examine firing rate heterogeneity in a feedforward network of stochastic neural oscillators utilizing a high-dimensional model. The firing rate heterogeneity stems from two sources: intrinsic (different individual cells) and network …


Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero Jan 2020

Zero-Inflated Longitudinal Mixture Model For Stochastic Radiographic Lung Compositional Change Following Radiotherapy Of Lung Cancer, Viviana A. Rodríguez Romero

Theses and Dissertations

Compositional data (CD) is mostly analyzed as relative data, using ratios of components, and log-ratio transformations to be able to use known multivariable statistical methods. Therefore, CD where some components equal zero represent a problem. Furthermore, when the data is measured longitudinally, observations are spatially related and appear to come from a mixture population, the analysis becomes highly complex. For this matter, a two-part model was proposed to deal with structural zeros in longitudinal CD using a mixed-effects model. Furthermore, the model has been extended to the case where the non-zero components of the vector might a two component mixture …


Statistical Analysis Of Demographic Effects On Insurance Coverage Of Perinatal And Neonatal Morbidity, Madeline Durbin Jan 2020

Statistical Analysis Of Demographic Effects On Insurance Coverage Of Perinatal And Neonatal Morbidity, Madeline Durbin

Undergraduate Honors Thesis Projects

In the United States of America, Ohio has one of the worst neonatal and perinatal death rates. Within Ohio, Montgomery County has an above average neonatal and perinatal death rate. This statistic can be lowered if more women in Montgomery County have health insurance. They would be more likely to seek out prenatal health care, since they would no longer have to pay as much money out-of-pocket. This would allow medical professionals to be able to diagnose and treat any potential issues in the mother or child earlier. Having health insurance would also prevent mothers-to-be from seeking out other potentially …


Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li Jan 2020

Semiparametric And Nonparametric Methods For Comparing Biomarker Levels Between Groups, Yuntong Li

Theses and Dissertations--Statistics

Comparing the distribution of biomarker measurements between two groups under either an unpaired or paired design is a common goal in many biomarker studies. However, analyzing biomarker data is sometimes challenging because the data may not be normally distributed and contain a large fraction of zero values or missing values. Although several statistical methods have been proposed, they either require data normality assumption, or are inefficient. We proposed a novel two-part semiparametric method for data under an unpaired setting and a nonparametric method for data under a paired setting. The semiparametric method considers a two-part model, a logistic regression for …


Phenotype Extraction: Estimation And Biometrical Genetic Analysis Of Individual Dynamics, Kevin L. Mckee Jan 2020

Phenotype Extraction: Estimation And Biometrical Genetic Analysis Of Individual Dynamics, Kevin L. Mckee

Theses and Dissertations

Within-person data can exhibit a virtually limitless variety of statistical patterns, but it can be difficult to distinguish meaningful features from statistical artifacts. Studies of complex traits have previously used genetic signals like twin-based heritability to distinguish between the two. This dissertation is a collection of studies applying state-space modeling to conceptualize and estimate novel phenotypic constructs for use in psychiatric research and further biometrical genetic analysis. The aims are to: (1) relate control theoretic concepts to health-related phenotypes; (2) design statistical models that formally define those phenotypes; (3) estimate individual phenotypic values from time series data; (4) consider hierarchical …


Aggregate Loss Model With Poisson-Tweedie Loss Frequency, Si Chen Jan 2020

Aggregate Loss Model With Poisson-Tweedie Loss Frequency, Si Chen

Theses and Dissertations (Comprehensive)

The aggregate loss model has applications in various areas such as financial risk management and actuarial science. The aggregate loss is the summation of all random losses occurred in a period, and it is governed by both the loss severity and the loss frequency. While the impact of the loss severity on aggregate loss is well studied, less focus is paid on the influence of loss frequency on aggregate loss, which motivates our study. In this thesis, we enrich the aggregate loss framework by introducing the Poisson-Tweedie distribution as a candidate for modelling loss frequency, prove the closedness of Poisson-Tweedie …


Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu Jan 2020

Nonparametric Tests Of Lack Of Fit For Multivariate Data, Yan Xu

Theses and Dissertations--Statistics

A common problem in regression analysis (linear or nonlinear) is assessing the lack-of-fit. Existing methods make parametric or semi-parametric assumptions to model the conditional mean or covariance matrices. In this dissertation, we propose fully nonparametric methods that make only additive error assumptions. Our nonparametric approach relies on ideas from nonparametric smoothing to reduce the test of association (lack-of-fit) problem into a nonparametric multivariate analysis of variance. A major problem that arises in this approach is that the key assumptions of independence and constant covariance matrix among the groups will be violated. As a result, the standard asymptotic theory is not …


Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou Jan 2020

Statistical Intervals For Various Distributions Based On Different Inference Methods, Yixuan Zou

Theses and Dissertations--Statistics

Statistical intervals (e.g., confidence, prediction, or tolerance) are widely used to quantify uncertainty, but complex settings can create challenges to obtain such intervals that possess the desired properties. My thesis will address diverse data settings and approaches that are shown empirically to have good performance. We first introduce a focused treatment on using a single-layer bootstrap calibration to improve the coverage probabilities of two-sided parametric tolerance intervals for non-normal distributions. We then turn to zero-inflated data, which are commonly found in, among other areas, pharmaceutical and quality control applications. However, the inference problem often becomes difficult in the presence of …


Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang Jan 2020

Bayesian Kinetic Modeling For Tracer-Based Metabolomic Data, Xu Zhang

Theses and Dissertations--Statistics

Kinetic modeling of the time dependence of metabolite concentrations including the unstable isotope labeled species is an important approach to simulate metabolic pathway dynamics. It is also essential for quantitative metabolic flux analysis using tracer data. However, as the metabolic networks are complex including extensive compartmentation and interconnections, the parameter estimation for enzymes that catalyze individual reactions needed for kinetic modeling is challenging. As the pa- rameter space is large and multi-dimensional while kinetic data are comparatively sparse, the estimation procedure (especially the point estimation methods) often en- counters multiple local maximum such that standard maximum likelihood methods may yield …


How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller Jan 2020

How Machine Learning And Probability Concepts Can Improve Nba Player Evaluation, Harrison Miller

CMC Senior Theses

In this paper I will be breaking down a scholarly article, written by Sameer K. Deshpande and Shane T. Jensen, that proposed a new method to evaluate NBA players. The NBA is the highest level professional basketball league in America and stands for the National Basketball Association. They proposed to build a model that would result in how NBA players impact their teams chances of winning a game, using machine learning and probability concepts. I preface that by diving into these concepts and their mathematical backgrounds. These concepts include building a linear model using ordinary least squares method, the bias …