Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

2020

Theses/Dissertations

Institution
Keyword
Publication

Articles 1 - 30 of 258

Full-Text Articles in Physical Sciences and Mathematics

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh Dec 2020

Variation In Personality Among Semi-Wild Myanmar Timber Elephants, Sateesh Venkatesh

Theses and Dissertations

This study examines two personality traits: exploration and neophobia, which could influence human-elephant conflicts. Thirty-one semi-wild elephants were tested over two trials using a custom novel puzzle tube containing three tasks and three rewards. Our studies show that elephants do vary significantly between individuals in both exploration and neophobia.


Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang Dec 2020

Bayesian Semi-Supervised Keyphrase Extraction And Jackknife Empirical Likelihood For Assessing Heterogeneity In Meta-Analysis, Guanshen Wang

Statistical Science Theses and Dissertations

This dissertation investigates: (1) A Bayesian Semi-supervised Approach to Keyphrase Extraction with Only Positive and Unlabeled Data, (2) Jackknife Empirical Likelihood Confidence Intervals for Assessing Heterogeneity in Meta-analysis of Rare Binary Events.

In the big data era, people are blessed with a huge amount of information. However, the availability of information may also pose great challenges. One big challenge is how to extract useful yet succinct information in an automated fashion. As one of the first few efforts, keyphrase extraction methods summarize an article by identifying a list of keyphrases. Many existing keyphrase extraction methods focus on the unsupervised setting, …


Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha Dec 2020

Examining Multiple Imputation For Measurement Error Correction In Count Data With Excess Zeros, Shalima Zalsha

Statistical Science Theses and Dissertations

Measurement error and missing data are two common problems in wildlife population surveys. These data are collected from the environment and may be missing or measured with error when the observer’s ability to see the animal is obscured. Methods such as video transects for estimating red snapper abundance and aerial surveys for estimating moose population sizes are highly affected by these problems since total abundance will be underestimated if missing/mismeasured counts are ignored. We shall refer to this problem as visibility bias; it occurs when the true counts are observed when visibility is high, partially observed when visibility is low …


Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu Dec 2020

Improved Statistical Methods For Time-Series And Lifetime Data, Xiaojie Zhu

Statistical Science Theses and Dissertations

In this dissertation, improved statistical methods for time-series and lifetime data are developed. First, an improved trend test for time series data is presented. Then, robust parametric estimation methods based on system lifetime data with known system signatures are developed.

In the first part of this dissertation, we consider a test for the monotonic trend in time series data proposed by Brillinger (1989). It has been shown that when there are highly correlated residuals or short record lengths, Brillinger’s test procedure tends to have significance level much higher than the nominal level. This could be related to the discrepancy between …


Bayesian Modeling For Longitudinal Count Data: Applications In Biomedical Research, Morshed Alam Dec 2020

Bayesian Modeling For Longitudinal Count Data: Applications In Biomedical Research, Morshed Alam

Theses & Dissertations

Biomedical count data such as the number of seizures for epilepsy patients, number of new tumors at each visit or the number vomiting after each chemo-radiation for the cancer patients are common. Often these counts are measured longitudinally from patients or within clusters in multi-site trials. The Poisson and negative binomial models may not be adequate when data exhibit over or under-dispersion, respectively. On the contrary, a variety of dispersion conditions in count data can be captured by Conway-Maxwell Poisson (CMP) model.

This doctoral dissertation relegates to developing a statistical methodology to model longitudinal count data distributed as CMP via …


Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake Dec 2020

Multi-Level Small Area Estimation Based On Calibrated Hierarchical Likelihood Approach Through Bias Correction With Applications To Covid-19 Data, Nirosha Rathnayake

Theses & Dissertations

Small area estimation (SAE) has been widely used in a variety of applications to draw estimates in geographic domains represented as a metropolitan area, district, county, or state. The direct estimation methods provide accurate estimates when the sample size of study participants within each area unit is sufficiently large, but it might not always be realistic to have large sample sizes of study participants when considering small geographical regions. Meanwhile, high dimensional socio-ecological data exist at the community level, providing an opportunity for model-based estimation by incorporating rich auxiliary information at the individual and area levels. Thus, it is critical …


Application Of Crowdsourced Data In Transportation Operations And Safety, Nima Hoseinzadeh Dec 2020

Application Of Crowdsourced Data In Transportation Operations And Safety, Nima Hoseinzadeh

Doctoral Dissertations

Crowdsourcing refers to the acquisition of data from users who contribute their information via smartphone, social media, or the internet. In transportation systems, crowdsourcing turns users into real-time sensors, providing data on traffic speed, travel time, mile traveled, incidents, roadway conditions, weather severity, irregularities in traffic patterns, and hazards. These data can be collected actively or passively in quantitative or qualitative forms. With the emergence of smartphones and navigation apps, crowdsourced data are gaining increased attention in transportation. Crowdsourced data have advantages over traditional fixed-location sensors and camera monitoring: low implementation costs, extended geographic coverage, high resolution, real-time application, increased …


Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal Dec 2020

Nonparametric Bayesian Deep Learning For Scientific Data Analysis, Devanshu Agrawal

Doctoral Dissertations

Deep learning (DL) has emerged as the leading paradigm for predictive modeling in a variety of domains, especially those involving large volumes of high-dimensional spatio-temporal data such as images and text. With the rise of big data in scientific and engineering problems, there is now considerable interest in the research and development of DL for scientific applications. The scientific domain, however, poses unique challenges for DL, including special emphasis on interpretability and robustness. In particular, a priority of the Department of Energy (DOE) is the research and development of probabilistic ML methods that are robust to overfitting and offer reliable …


Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer Dec 2020

Root Stage Distributions And Their Importance In Plant-Soil Feedback Models, Tyler Poppenwimer

Doctoral Dissertations

Roots are fundamental to PSFs, being a key mediator of these feedbacks by interacting with and affecting the soil environment and soil microbial communities. However, most PSF models aggregate roots into a homogeneous component or only implicitly simulate roots via functions. Roots are not homogeneous and root traits (nutrient and water uptake, turnover rate, respiration rate, mycorrhizal colonization, etc.) vary with age, branch order, and diameter. Trait differences among a plant’s roots lead to variation in root function and roots can be disaggregated according to their function. The impact on plant growth and resource cycling of changes in the distribution …


Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel Dec 2020

Survival Analysis: An Exact Method For Rare Events, Kristina Reutzel

All Graduate Plan B and other Reports, Spring 1920 to Spring 2023

Conventional asymptotic methods for survival analysis work well when sample sizes are at least moderately sufficient. When dealing with small sample sizes or rare events, the results from these methods have the potential to be inaccurate or misleading. To handle such data, an exact method is proposed and compared against two other methods: 1) the Cox proportional hazards model and 2) stratified logistic regression for discrete survival analysis data.


Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali Dec 2020

Delta Hedging Of Financial Options Using Reinforcement Learning And An Impossibility Hypothesis, Ronak Tali

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

In this thesis we take a fresh perspective on delta hedging of financial options as undertaken by market makers. The current industry standard of delta hedging relies on the famous Black Scholes formulation that prescribes continuous time hedging in a way that allows the market maker to remain risk neutral at all times. But the Black Scholes formulation is a deterministic model that comes with several strict assumptions such as zero transaction costs, log normal distribution of the underlying stock prices, etc. In this paper we employ Reinforcement Learning to redesign the delta hedging problem in way that allows us …


Dynamic Neuromechanical Sets For Locomotion, Aravind Sundararajan Dec 2020

Dynamic Neuromechanical Sets For Locomotion, Aravind Sundararajan

Doctoral Dissertations

Most biological systems employ multiple redundant actuators, which is a complicated problem of controls and analysis. Unless assumptions about how the brain and body work together, and assumptions about how the body prioritizes tasks are applied, it is not possible to find the actuator controls. The purpose of this research is to develop computational tools for the analysis of arbitrary musculoskeletal models that employ redundant actuators. Instead of relying primarily on optimization frameworks and numerical methods or task prioritization schemes used typically in biomechanics to find a singular solution for actuator controls, tools for feasible sets analysis are instead developed …


Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li Dec 2020

Random Search Plus: A More Effective Random Search For Machine Learning Hyperparameters Optimization, Bohan Li

Masters Theses

Machine learning hyperparameter optimization has always been the key to improve model performance. There are many methods of hyperparameter optimization. The popular methods include grid search, random search, manual search, Bayesian optimization, population-based optimization, etc. Random search occupies less computations than the grid search, but at the same time there is a penalty for accuracy. However, this paper proposes a more effective random search method based on the traditional random search and hyperparameter space separation. This method is named random search plus. This thesis empirically proves that random search plus is more effective than random search. There are some case …


Comparative Evaluation Of Statistical Dependence Measures, Eman Abdel Rahman Ibrahim Dec 2020

Comparative Evaluation Of Statistical Dependence Measures, Eman Abdel Rahman Ibrahim

Graduate Theses and Dissertations

Measuring and testing dependence between random variables is of great importance in many scientific fields. In the case of linearly correlated variables, Pearson’s correlation coefficient is a commonly used measure of the correlation strength. In the case of nonlinear correlation, several innovative measures have been proposed, such as distance-based correlation, rank-based correlations, and information theory-based correlation. This thesis focuses on the statistical comparison of several important correlations, including Spearman’s correlation, mutual information, maximal information coefficient, biweight midcorrelation, distance correlation, and copula correlation, under various simulation settings such as correlative patterns and the level of random noise. Furthermore, we apply those …


Gene Set Testing By Distance Correlation, Sho-Hsien Su Dec 2020

Gene Set Testing By Distance Correlation, Sho-Hsien Su

Graduate Theses and Dissertations

Pathways are the functional building blocks of complex diseases such as cancers. Pathway-level studies may provide insights on some important biological processes. Gene set test is an important tool to study the differential expression of a gene set between two groups, e.g., cancer vs normal. The differential expression of a gene set could be due to the difference in mean, variability, or both. However, most existing gene set tests only target the mean difference but overlook other types of differential expression. In this thesis, we propose to use the recently developed distance correlation for gene set testing. To assess the …


Developing A Tourism Opportunity Index Regarding The Prospective Of Overtourism In Nepal, Susan Phuyal Dec 2020

Developing A Tourism Opportunity Index Regarding The Prospective Of Overtourism In Nepal, Susan Phuyal

MSU Graduate Theses

This research explores Nepal's overtourism scenario based on the capacity of a locality to manage sustainable tourism practices. Environmental degradation, local infrastructure degradation, negative tourist experience and local resident responses regarding visitors are the four main variables used in this study to analyze overtourism. In order to analyze the case study of overtourism, we select the three top touristic cities of Nepal, Kathmandu, Pokhara, and Chitwan based on the number of annual visitors. Nepal's case analysis of overtourism conditions reviews the overall threat of over-tourism and establishes a metric by which tourism can be viewed as potentially detrimental to sustainability. …


Development Of An Effect Size To Classify The Magnitude Of Dif In Dichotomous And Polytomous Items, James D. Weese Dec 2020

Development Of An Effect Size To Classify The Magnitude Of Dif In Dichotomous And Polytomous Items, James D. Weese

Graduate Theses and Dissertations

A standardized effect size for the SIBTEST/POLYSIBTEST procedure is proposed, allowing for Differential Item Functioning (DIF) to be classified with a single set of DIF heuristics regardless of whether data are dichotomous or polytomous. This proposed standardized effect size accounts for both variability in responses and whether participants are included in the SIBTEST/POLYSIBTEST calculations. First, a new set of unstandardized effect size heuristics are established for dichotomous data that are more aligned with Educational Testing Service (ETS) standards using two and three parameter logistic (2PL and 3PL) models. Second, a standardized effect size is proposed and compared to other DIF …


Inference And Estimation In Change Point Models For Censored Data, Kristine Gierz Dec 2020

Inference And Estimation In Change Point Models For Censored Data, Kristine Gierz

Mathematics & Statistics Theses & Dissertations

In general, the change point problem considers inference of a change in distribution for a set of time-ordered observations. This has applications in a large variety of fields and can also apply to survival data. With improvements to medical diagnoses and treatments, incidences and mortality rates have changed. However, the most commonly used analysis methods do not account for such distributional changes. In survival analysis, change point problems can concern a shift in a distribution for a set of time-ordered observations, potentially under censoring or truncation.

In this dissertation, we first propose a sequential testing approach for detecting multiple change …


Gardasil Vaccine Trends Within Nevada, California, And The U.S.: A Comparative Study, Karen S. Gutierrez Dec 2020

Gardasil Vaccine Trends Within Nevada, California, And The U.S.: A Comparative Study, Karen S. Gutierrez

UNLV Theses, Dissertations, Professional Papers, and Capstones

Despite decreasing incidence in cervical cancer in the U.S., there continues to be an increase in public health concern for cervical cancer worldwide. Recent studies report that individuals are disproportionately affected based on region, sex, and race. Additionally, the human papillomavirus (HPV) attributable cancers may be reduced between 70% and 90% through the universal use of HPV-vaccines. In order to expand current knowledge and implement intervention programs in Nevada, it is critical to examine the associations among the Gardasil vaccine, cervical cancer screening, and adverse events following immunization as well as to understand the different socio-demographic subgroups affected. To our …


Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das Dec 2020

Statistical Approaches Of Gene Set Analysis With Quantitative Trait Loci For High-Throughput Genomic Studies., Samarendra Das

Electronic Theses and Dissertations

Recently, gene set analysis has become the first choice for gaining insights into the underlying complex biology of diseases through high-throughput genomic studies, such as Microarrays, bulk RNA-Sequencing, single cell RNA-Sequencing, etc. It also reduces the complexity of statistical analysis and enhances the explanatory power of the obtained results. Further, the statistical structure and steps common to these approaches have not yet been comprehensively discussed, which limits their utility. Hence, a comprehensive overview of the available gene set analysis approaches used for different high-throughput genomic studies is provided. The analysis of gene sets is usually carried out based on …


Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun Dec 2020

Modified-Half-Normal Distribution And Different Methods To Estimate Average Treatment Effect., Jingchao Sun

Electronic Theses and Dissertations

This dissertation consists of three projects related to Modified-Half-Normal distribution and causal inference. In my first project, a new distribution called Modified-Half-Normal distribution was introduced. I explored a few of its distributional properties, the procedures for generating random samples based on Bayesian approaches, and the parameter estimation based on the method of moments. The second project deals with the problem of selection bias of average treatment effect (ATE) if we use the observational data. I combined the propensity score based inverse probability of treatment weighting (IPTW) method and the directed acyclic graph (DAG) to solve this problem. The third project …


Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman Dec 2020

Incorporating Shear Resistance Into Debris Flow Triggering Model Statistics, Noah J. Lyman

Master's Theses

Several regions of the Western United States utilize statistical binary classification models to predict and manage debris flow initiation probability after wildfires. As the occurrence of wildfires and large intensity rainfall events increase, so has the frequency in which development occurs in the steep and mountainous terrain where these events arise. This resulting intersection brings with it an increasing need to derive improved results from existing models, or develop new models, to reduce the economic and human impacts that debris flows may bring. Any development or change to these models could also theoretically increase the ease of collection, processing, and …


Aspects Of Causal Inference., John A. Craycroft Dec 2020

Aspects Of Causal Inference., John A. Craycroft

Electronic Theses and Dissertations

Observational studies differ from experimental studies in that assignment of subjects to treatments is not randomized but rather occurs due to natural mechanisms, which are usually hidden from researchers. Yet objectives of the two studies are frequently the same: identify the causal – rather than merely associational – relationship between some treatment or exposure and an outcome. The statistical issues that arise in properly analyzing observational data for this goal are numerous and fascinating, and these issues are encompassed in the domain of causal inference. The research presented in this dissertation explores several distinct aspects of causal inference. This dissertation …


A Management Strategy Evaluation Of The Impacts Of Interspecific Competition And Recreational Fishery Dynamics On Vermilion Snapper (Rhomboplites Aurorubens) In The Gulf Of Mexico, Megumi C. Oshima Dec 2020

A Management Strategy Evaluation Of The Impacts Of Interspecific Competition And Recreational Fishery Dynamics On Vermilion Snapper (Rhomboplites Aurorubens) In The Gulf Of Mexico, Megumi C. Oshima

Dissertations

In the Gulf of Mexico (GOM), Vermilion Snapper (Rhomboplites auroruben), are believed to compete with Red Snapper directly for prey and habitat. The two species share similar diets and have significant spatial overlap in the Gulf. Red Snapper are thought to be the dominate competitor, forcing Vermilion Snapper to feed on less nutritious prey when local resources are depleted. In addition to ecological pressures, GOM Vermilion Snapper support substantial commercial and recreational fisheries. Over the past decade, recreational landings have steadily increased, reaching a historical high in 2018. One cause may be stricter regulations for similar target species such as …


Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek Dec 2020

Quantifying The Simultaneous Effect Of Socio-Economic Predictors And Build Environment On Spatial Crime Trends, Alfieri Daniel Ek

Graduate Theses and Dissertations

Proper allocation of law enforcement agencies falls under the umbrella of risk terrainmodeling (Caplan et al., 2011, 2015; Drawve, 2016) that primarily focuses on crime prediction and prevention by spatially aggregating response and predictor variables of interest. Although mental health incidents demand resource allocation from law enforcement agencies and the city, relatively less emphasis has been placed on building spatial models for mental health incidents events. Analyzing spatial mental health events in Little Rock, AR over 2015 to 2018, we found evidence of spatial heterogeneity via Moran’s I statistic. A spatial modeling framework is then built using generalized linear models, …


Conditional Distance Correlation Test For Gene Expression Level, Dna Methylation Level And Copy Number, Shanshan Zhang Dec 2020

Conditional Distance Correlation Test For Gene Expression Level, Dna Methylation Level And Copy Number, Shanshan Zhang

Graduate Theses and Dissertations

Over the past years, efforts have been devoted to the genome-wide analysis of genetic and epigenetic profiles to better understand the underlying biological mechanisms of complex diseases such as cancer. It is of great importance to unravel the complex dependence structure between biological factors, and many conditional dependence tests have been developed to meet this need. The traditional partial correlation method can only capture the linear partial correlation, but not the nonlinear correlation. To overcome this limitation, we propose to use the innovative conditional distance correlation (CDC), which measures the conditional dependence between random vectors and detect nonlinear relations. In …


Bayesian Variable Selection Methods For Genome-Wide Association Studies With Categorical Phenotypes, Benazir Rowe Dec 2020

Bayesian Variable Selection Methods For Genome-Wide Association Studies With Categorical Phenotypes, Benazir Rowe

UNLV Theses, Dissertations, Professional Papers, and Capstones

Genome-wide association studies (GWAS) attempt to find the associations between genetic markers and studied traits (phenotypes). The problem of GWAS is complex and various methods have been developed to approach it. One of such methods is Bayesian variable selection (BVS). We describe the BVS methods in detail and demonstrate the ability of BVS method Posterior Inference via Model Averaging and Subset Selection (piMASS) to improve the power of detecting phenotype-associated genetic loci, potentially leading to new discoveries from existing data without increasing the sample size.

We present several ways to improve and extend the applicability of piMASS for GWAS. The …


On Simes’S Second Conjecture: An Extended Single-Step Simes Test Procedure For Multiple Testing, Matthew G. Hudson Dec 2020

On Simes’S Second Conjecture: An Extended Single-Step Simes Test Procedure For Multiple Testing, Matthew G. Hudson

Dissertations

One of the major concerns with multiple tests of significance is controlling the family wise error rate. Various methods have been developed to ensure that the false positive rate be maintained at some prespecified level. One of the most well know being the Bonferroni procedure. Simes presented an improved Bonferroni procedure for testing the global hypothesis that is more powerful and less conservative, especially with positively correlated tests. While Simes’s procedure is more powerful, it does not allow for making inferences on the individual hypotheses. However, the Simes procedure has since become the foundation of many p-value based multiple testing …


Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi Nov 2020

Statistical Methods With A Focus On Joint Outcome Modeling And On Methods For Fire Science, Da Zhong Xi

Electronic Thesis and Dissertation Repository

Understanding the dynamics of wildfires contributes significantly to the development of fire science. Challenges in the analysis of historical fire data include defining fire dynamics within existing statistical frameworks, modeling the duration and size of fires as joint outcomes, identifying the how fires are grouped into clusters of subpopulations, and assessing the effect of environmental variables in different modeling frameworks. We develop novel statistical methods to consider outcomes related to fire science jointly. These methods address these challenges by linking univariate models for separate outcomes through shared random effects, an approach referred to as joint modeling. Comparisons with existing …


An Analysis Of Growth Of The Community Integration Psychological Score In An Ethnically Diverse Population Experiencing Homelessness In A Permanent Supportive Housing Program Using Hierarchical Mixed Modeling, Leah Hollis Puglisi Nov 2020

An Analysis Of Growth Of The Community Integration Psychological Score In An Ethnically Diverse Population Experiencing Homelessness In A Permanent Supportive Housing Program Using Hierarchical Mixed Modeling, Leah Hollis Puglisi

Mathematics & Statistics ETDs

Hierarchical models are becoming increasingly common in epidemiological and psychological research. When analyzing data from such studies, the nested structure of the data must be taken into account. Mixed modeling in conjunction with hierarchical mixed modeling allows researchers to ask broad questions about the population of interest. Modeling under restricted maximum likelihood estimation (REML), as opposed to full maximum likelihood estimation (ML), increases the accuracy of estimates for the random effects in the model. We use hierarchical mixed modeling under REML estimation to analyze which factors increase “community integration”, a concept and a construct developed and used in the mental …