Open Access. Powered by Scholars. Published by Universities.®

Statistical Models Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

2020

Discipline
Institution
Keyword
Publication

Articles 1 - 30 of 47

Full-Text Articles in Statistical Models

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake Dec 2020

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake

Theses & Dissertations

Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman Dec 2020

Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman

Master's Theses

Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …


A Management Strategy Evaluation Of The Impacts Of Interspecific Competition And Recreational Fishery Dynamics On Vermilion Snapper (Rhomboplites Aurorubens) In The Gulf Of Mexico, Megumi C. Oshima Dec 2020

A Management Strategy Evaluation Of The Impacts Of Interspecific Competition And Recreational Fishery Dynamics On Vermilion Snapper (Rhomboplites Aurorubens) In The Gulf Of Mexico, Megumi C. Oshima

Dissertations

In the Gulf of Mexico (GOM), Vermilion Snapper (Rhomboplites auroruben), are believed to compete with Red Snapper directly for prey and habitat. The two species share similar diets and have significant spatial overlap in the Gulf. Red Snapper are thought to be the dominate competitor, forcing Vermilion Snapper to feed on less nutritious prey when local resources are depleted. In addition to ecological pressures, GOM Vermilion Snapper support substantial commercial and recreational fisheries. Over the past decade, recreational landings have steadily increased, reaching a historical high in 2018. One cause may be stricter regulations for similar target species such as …


Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek Dec 2020

Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek

Graduate Theses and Dissertations

Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear models, …


Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi Nov 2020

Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi

Electronic Thesis and Dissertation Repository

Understanding the dynamics of wildfires contributes significantly to the development of fire science. Challenges in the analysis of historical fire data include defining fire dynamics within existing statistical frameworks, modeling the duration and size of fires as joint outcomes, identifying the how fires are grouped into clusters of subpopulations, and assessing the effect of environmental variables in different modeling frameworks. We develop novel statistical methods to consider outcomes related to fire science jointly. These methods address these challenges by linking univariate models for separate outcomes through shared random effects, an approach referred to as joint modeling. Comparisons with existing …


Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen Aug 2020

Renewable-Energy Resources, Economic Growth And Their Causal Link, Yiyang Chen

Electronic Thesis and Dissertation Repository

This thesis examines the presence and strength of predictive causal relationship between re-newable energy prices and economic growth. We look for evidence by investigating the cases of Norway, New Zealand, and Canada’s two provinces of Alberta and Ontario. The usual vectorautoregressive model (VAR) and its various improved versions still assume constant parametersover time. We devise a Markov-switching VAR (MS-VAR) model in order to accommodate the observed time-dependent causal relation changes. Our proposed modelling approach is induced by the hidden Markov model methodologies in terms of an online parameter estimationthrough recursive filtering. The parameters of the MS-VAR model are governed by …


D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin Aug 2020

D-Vine Pair-Copula Models For Longitudinal Binary Data, Huihui Lin

Mathematics & Statistics Theses & Dissertations

Dependent longitudinal binary data are prevalent in a wide range of scientific disciplines, including healthcare and medicine. A popular method for analyzing such data is the multivariate probit (MP) model. The motivation for this dissertation stems from the fact that the MP model fails even the binary correlations are within the feasible range. The reason being the underlying correlation matrix of the latent variables in the MP model may not be positive definite. In this dissertation, we study alternatives that are based on D-vine pair-copula models. We consider both the serial dependence modeled by the first order autoregressive (AR(1)) and …


A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega Aug 2020

A Geochemical And Statistical Investigation Of The Big Four Springs Region In Southern Missouri, Jordan Jasso Vega

MSU Graduate Theses

The Big Four Springs region hosts four major first-order magnitude springs in southern Missouri and northern Arkansas. These springs are Big Spring (Carter County, MO), Greer Spring (Oregon County, MO), Mammoth Spring (Fulton County, AR), and Hodgson Mill Spring (Ozark County, MO). Based on historic dye traces and hydrogeological investigations, these springs drain an area of approximately 1500 square miles and collectively discharge an average of 780 million gallons of water per day. The rocks from youngest to oldest that are found in Big Four Springs region are the Cotter and Jefferson City Dolomite (Ordovician), Roubidoux Formation (Ordovician), Gasconade Dolomite …


Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo Aug 2020

Machine Learning Approaches For Improving Prediction Performance Of Structure-Activity Relationship Models, Gabriel Idakwo

Dissertations

In silico bioactivity prediction studies are designed to complement in vivo and in vitro efforts to assess the activity and properties of small molecules. In silico methods such as Quantitative Structure-Activity/Property Relationship (QSAR) are used to correlate the structure of a molecule to its biological property in drug design and toxicological studies. In this body of work, I started with two in-depth reviews into the application of machine learning based approaches and feature reduction methods to QSAR, and then investigated solutions to three common challenges faced in machine learning based QSAR studies.

First, to improve the prediction accuracy of learning …


Latent Class Models For At-Risk Populations, Shuaimin Kang Jul 2020

Latent Class Models For At-Risk Populations, Shuaimin Kang

Doctoral Dissertations

Clustering Network Tree Data From Respondent-Driven Sampling With Application to Opioid Users in New York City There is great interest in finding meaningful subgroups of attributed network data. There are many available methods for clustering complete network. Unfortunately, much network data is collected through sampling, and therefore incomplete. Respondent-driven sampling (RDS) is a widely used method for sampling hard-to-reach human populations based on tracing links in the underlying unobserved social network. The resulting data therefore have tree structure representing a sub-sample of the network, along with many nodal attributes. In this paper, we introduce an approach to adjust mixture models …


Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen Jul 2020

Causal Inference And Prediction On Observational Data With Survival Outcomes, Xiaofei Chen

Statistical Science Theses and Dissertations

Infants with hypoplastic left heart syndrome require an initial Norwood operation, followed some months later by a stage 2 palliation (S2P). The timing of S2P is critical for the operation’s success and the infant’s survival, but the optimal timing, if one exists, is unknown. We attempt to estimate the optimal timing of S2P by analyzing data from the Single Ventricle Reconstruction Trial (SVRT), which randomized patients between two different types of Norwood procedure. In the SVRT, the timing of the S2P was chosen by the medical team; thus with respect to this exposure, the trial constitutes an observational study, and …


Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers Jul 2020

Structural Analysis Of The Multifunctional Spoiie Regulatory Protein Of Clostridioides Difficile., Blythe Emily Bunkers

Graduate Theses and Dissertations

Clostridioides (formally Clostridium) difficile is a medically relevant pathogen pertinent to infectious disease research. C. difficile is distinctly known for its ability to produce two toxins, enterotoxin A and cytotoxin B, and the propensity to colonize the mammalian gastrointestinal tract. It is known that metabolism is tightly correlated with sporulation in endospore producers such as C. difficile, but an interesting and novel regulatory relationship found by the Ivey lab has yet to be understood. The relationship explored in this study is observed between the sporulation factor, SpoIIE, which represses expression of an ABC peptide transporter, app. In this study, two …


Extensions Of Classification Method Based On Quantiles, Yuanhao Lai Jun 2020

Extensions Of Classification Method Based On Quantiles, Yuanhao Lai

Electronic Thesis and Dissertation Repository

This thesis deals with the problem of classification in general, with a particular focus on heavy-tailed or skewed data. The classification problem is first formalized by statistical learning theory and several important classification methods are reviewed, where the distance-based classifiers, including the median-based classifier and the quantile-based classifier (QC), are especially useful for the heavy-tailed or skewed inputs. However, QC is limited by its model capacity and the issue of high-dimensional accumulated errors. Our objective of this study is to investigate more general methods while retaining the merits of QC.

We present four extensions of QC, which appear in chronological …


Generalized 4/2 Factor Model, Yuyang Cheng Jun 2020

Generalized 4/2 Factor Model, Yuyang Cheng

Electronic Thesis and Dissertation Repository

We investigate portfolio optimization, risk management, and derivative pricing for a factor stochastic model that considers the 4/2 stochastic volatility on the common/systematic factor as well as on the intrinsic factor. This setting allows us to capture stochastic volatility and stochastic covariation among assets. The model is also a generalization of existing models in the literature as it includes the mean reverting property and spillover effect to capture wider types of financial assets. At a theoretical level we identify conditions for well-defined changes of measure. A quasi-closed form solution within a 4/2 structured model is obtained for a portfolio optimization …


Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh May 2020

Edge-Cloud Iot Data Analytics: Intelligence At The Edge With Deep Learning, Ananda Mohon M. Ghosh

Electronic Thesis and Dissertation Repository

Rapid growth in numbers of connected devices, including sensors, mobile, wearable, and other Internet of Things (IoT) devices, is creating an explosion of data that are moving across the network. To carry out machine learning (ML), IoT data are typically transferred to the cloud or another centralized system for storage and processing; however, this causes latencies and increases network traffic. Edge computing has the potential to remedy those issues by moving computation closer to the network edge and data sources. On the other hand, edge computing is limited in terms of computational power and thus is not well suited for …


Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda May 2020

Statistical Models And Analysis Of Univariate And Multivariate Degradation Data, Lochana Palayangoda

Statistical Science Theses and Dissertations

For degradation data in reliability analysis, estimation of the first-passage time (FPT) distribution to a threshold provides valuable information on reliability characteristics. Recently, Balakrishnan and Qin (2019; Applied Stochastic Models in Business and Industry, 35:571-590) studied a nonparametric method to approximate the FPT distribution of such degradation processes if the underlying process type is unknown. In this thesis, we propose improved techniques based on saddlepoint approximation, which enhance upon their suggested methods. Numerical examples and Monte Carlo simulation studies are used to illustrate the advantages of the proposed techniques. Limitations of the improved techniques are discussed and some possible solutions …


The Primary Volatile Composition Of Comet C/2015 Er61 (Panstarrs), Aaron Butler May 2020

The Primary Volatile Composition Of Comet C/2015 Er61 (Panstarrs), Aaron Butler

Theses

In the outer edges of the solar system exist two regions: the Kuiper belt and Oort cloud. These two regions have a high amount of icy bodies (comets) orbiting the Sun. Comets located within the Oort cloud and Kuiper belt contain an ancient codex to the solar systems contents, before the formation of our solar system. Presented are near-infrared, high-resolution (λ/Δλ ~40000) data obtained from the immersion-grating echelle spectrograph iSHELL at the 3m NASA Infrared Telescope Facility (IRTF) in Maunakea, Hawaii of the Oort cloud comet C/2015 ER61 (PANSTARRS). Observations took place on April 15 and 17 in 2017 while …


Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang May 2020

Predictive Modeling Of Asynchronous Event Sequence Data, Jin Shang

LSU Doctoral Dissertations

Large volumes of temporal event data, such as online check-ins and electronic records of hospital admissions, are becoming increasingly available in a wide variety of applications including healthcare analytics, smart cities, and social network analysis. Those temporal events are often asynchronous, interdependent, and exhibiting self-exciting properties. For example, in the patient's diagnosis events, the elevated risk exists for a patient that has been recently at risk. Machine learning that leverages event sequence data can improve the prediction accuracy of future events and provide valuable services. For example, in e-commerce and network traffic diagnosis, the analysis of user activities can be …


Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters May 2020

Modeling Species Distribution And Habitat Suitability Of American Ginseng (Panax Quinquefolius) In Virginia, Jacob D. J. Peters

Masters Theses, 2020-current

American ginseng (Panax quinquefolius) is a well-known and sought-after medicinal plant native to North America that is facing increased threat of extinction due to overharvesting, herbivory, and habitat loss. Species distribution and habitat suitability models may be valuable to landowners interested in sustainable harvest or to institutions interested in the conservation and restoration of the species. With unequal sampling efforts across a region of interest, it is likely that some locations with appropriate habitat may be misrepresented in model predictions. This study refined a state-derived species distribution model for ginseng through increased sampling effort across the Cumberland Plateau …


Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen May 2020

Applications Of Machine Learning In High-Frequency Trade Direction Classification, Jared E. Hansen

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

The correct assignment of trades as buyer-initiated or seller-initiated is paramount in many quantitative finance studies. Simple decision rule methods have been used for signing trades since many data sets available to researchers do not include the sign of each trade executed. By utilizing these decision rule methods, as well as engineering new variables from available data, we have demonstrated that machine learning models outperform prior methods for accurately signing trades as buys and sells, achieving state-of-the-art results. The best model developed was 4.5 percentage points more accurate than older methods when predicting onto unseen data. Since finance and economics …


An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard May 2020

An Analysis Of Dredge Efficiency For Surfclam And Ocean Quahog Commercial Dredges, Leanne Poussard

Master's Theses

Between 1997 and 2011, The National Marine Fisheries Service conducted 50 depletion experiments to estimate survey gear efficiency and stock density for Atlantic surfclam (Spisula solidissima) and ocean quahog (Arctica islandica) populations using commercial hydraulic dredges. The Patch Model was formulated to estimate gear efficiency and organism density from the data. The range of efficiencies estimated is substantial, leading to uncertainty in the application of these estimates in stock assessment. Analysis of depletion experiment simulations showed that uncertainty in the estimates of gear efficiency from depletion experiments was reduced by higher numbers of dredge tows per experiment, more tow overlap …


Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim May 2020

Predicting Disease Progression Using Deep Recurrent Neural Networks And Longitudinal Electronic Health Record Data, Seunghwan Kim

McKelvey School of Engineering Theses & Dissertations

Electronic Health Records (EHR) are widely adopted and used throughout healthcare systems and are able to collect and store longitudinal information data that can be used to describe patient phenotypes. From the underlying data structures used in the EHR, discrete data can be extracted and analyzed to improve patient care and outcomes via tasks such as risk stratification and prospective disease management. Temporality in EHR is innately present given the nature of these data, however, and traditional classification models are limited in this context by the cross- sectional nature of training and prediction processes. Finding temporal patterns in EHR is …


Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li Apr 2020

Point Process Modelling Of Objects In The Star Formation Complexes Of The M33 Galaxy, Dayi Li

Electronic Thesis and Dissertation Repository

In this thesis, Gibbs point process (GPP) models are constructed to study the spatial distribution of objects in the star formation complexes of the M33 galaxy. The GPP models circumvent the limitations of the two-point correlation function employed in the current astronomy literature by naturally accounting for the inhomogeneous distribution of these objects. The spatial distribution of these objects serves as a sensitive probe in understanding the star formation process, which is crucial in understanding the formation of galaxies and the Universe. The objects under study include the CO filament structure, giant molecular clouds (GMCs) and young stellar cluster candidates …


Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang Apr 2020

Data-Driven Investment Decisions In P2p Lending: Strategies Of Integrating Credit Scoring And Profit Scoring, Yan Wang

Doctor of Data Science and Analytics Dissertations

In this dissertation, we develop and discuss several loan evaluation methods to guide the investment decisions for peer-to-peer (P2P) lending. In evaluating loans, credit scoring and profit scoring are the two widely utilized approaches. Credit scoring aims at minimizing the risk while profit scoring aims at maximizing the profit. This dissertation addresses the strengths and weaknesses of each scoring method by integrating them in various ways in order to provide the optimal investment suggestions for different investors. Before developing the methods for loan evaluation at the individual level, we applied the state-of-the-art method called the Long Short Term Memory (LSTM) …


Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice Apr 2020

Boom Or Bust: Examining The Relationship Between High School Recruiting Rankings And The Nfl Draft, Nicholas E. Tice

Senior Theses

The goal of this thesis is to model the probability of a high school football player’s chance of being drafted based on information taken from their recruiting profile. The response variable is binary and defined as drafted (1) or undrafted (0). The independent variables were collected by scraping data from the recruiting websites including height, weight, position, hometown, recruiting grade and other socioeconomic factors based on the player’s high school. 247Sports and ESPN were the two recruiting services used and compared in this study. Because of the binary nature of the dependent variable, logistic regression and decision trees were chosen …


Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson Mar 2020

Bayesian Methods For The Assessment Of Reporting Errors For Data-Sparse Population-Periods With Applications To Estimating Mortality, Emily Peterson

Doctoral Dissertations

Population level mortality data is often subject to substantial reporting errors due to misclassification of cause of death, misclassification of death status, or age reporting errors. Accuracy of error-prone data sources can be assessed by comparing such data to gold standard data for the same population-period. We present Bayesian methods for assessing the extent of reporting errors across different population-periods and generalizing those to settings where gold-standard data are lacking. Firstly, we investigate misclassification errors of maternal cause of death reporting in civil registration vital statistics data. We use a Bayesian hierarchical bivariate random-walk model to estimate country-year specific sensitivity …


Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown Jan 2020

Evaluating An Ordinal Output Using Data Modeling, Algorithmic Modeling, And Numerical Analysis, Martin Keagan Wynne Brown

Murray State Theses and Dissertations

Data and algorithmic modeling are two different approaches used in predictive analytics. The models discussed from these two approaches include the proportional odds logit model (POLR), the vector generalized linear model (VGLM), the classification and regression tree model (CART), and the random forests model (RF). Patterns in the data were analyzed using trigonometric polynomial approximations and Fast Fourier Transforms. Predictive modeling is used frequently in statistics and data science to find the relationship between the explanatory (input) variables and a response (output) variable. Both approaches prove advantageous in different cases depending on the data set. In our case, the data …


The Analysis Of Neural Heterogeneity Through Mathematical And Statistical Methods, Kyle Wendling Jan 2020

The Analysis Of Neural Heterogeneity Through Mathematical And Statistical Methods, Kyle Wendling

Theses and Dissertations

Diversity of intrinsic neural attributes and network connections is known to exist in many areas of the brain and is thought to significantly affect neural coding. Recent theoretical and experimental work has argued that in uncoupled networks, coding is most accurate at intermediate levels of heterogeneity. I explore this phenomenon through two distinct approaches: a theoretical mathematical modeling approach and a data-driven statistical modeling approach.

Through the mathematical approach, I examine firing rate heterogeneity in a feedforward network of stochastic neural oscillators utilizing a high-dimensional model. The firing rate heterogeneity stems from two sources: intrinsic (different individual cells) and network …