Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik Aug 2023

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik

Department of Statistics: Dissertations, Theses, and Student Work

The gut microbiome plays a crucial role in human health, and by working collaboratively with microbiologists, we aim to further our understanding of the human gut and its impact on human health. Promoting a diverse microbiome is emphasized throughout microbiology literature, and involving a statistician in designing experiments to relate gut bacteria and some measured health outcome is crucial for ensuring valid and accurate results. By adopting new experimental design and analysis methods, researchers can begin to gain a deeper understanding of how the genetics of our food affect the composition of taxa within the gut microbiome. This dissertation is …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta Jul 2020

Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta

Department of Statistics: Dissertations, Theses, and Student Work

Novel diagnostic tests are usually compared with gold standard tests for evaluating diagnostic accuracy. For assessing antimicrobial resistance (AMR) to bovine respiratory disease (BRD) pathogens, phenotypic broth microdilution method is used as gold standard (GS). The objective of the thesis is to evaluate the optimal cycle threshold (Ct) generated by real-time polymerase chain reaction (rtPCR) to genes that confer resistance that will translate to the phenotypic classification of AMR. Data from two different methodologies are assessed to identify Ct that will discriminate between resistance (R) and susceptibility (S). First, the receiver operating characteristic (ROC) curve was used to determine the …


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt Apr 2020

Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt

Department of Statistics: Dissertations, Theses, and Student Work

Group testing is the process of combining items into groups to test for a binary characteristic. One of its most widely used applications is infectious disease testing. In this context, specimens (e.g., blood, urine) are amalgamated into groups and tested. For groups that test positive, there are many algorithmic retesting procedures available to identify positive individuals. The appeal of group testing is that the overall number of tests needed is significantly less than for individual testing when disease prevalence is small and an appropriate algorithm is chosen. Group testing has a number of applications beyond infectious disease testing, such as …


Optimal Design For A Causal Structure, Zaher Kmail Aug 2019

Optimal Design For A Causal Structure, Zaher Kmail

Department of Statistics: Dissertations, Theses, and Student Work

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …


Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells Aug 2016

Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells

Department of Statistics: Dissertations, Theses, and Student Work

In livestock, prediction of an animal’s genetic merit using genomic information is becoming increasingly common. The models used to make these predictions typically assume that we are sampling from a homogeneous population. However, in both commercial and experimental populations the sire and dam of an individual may be a mixture of different breeds. Haplotype models can capture this population structure.

Two models based on breed specific haplotype clusters where developed to account for differences across multiple breeds. The first model utilizes the breed composition of the individual, while the second utilizes the breed composition from the sire and dam. Haplotype …


Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab Aug 2015

Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab

Department of Statistics: Dissertations, Theses, and Student Work

Spatially correlated count data sets appear often in applied data analysis problems, but there is little consensus in the literature about how best to analyze the data. The two prevailing approaches provide accurate parameter estimates and predictions, at the cost of model interpretability and simplicity. This dissertation will present a new approach to modeling spatially correlated binomial observations: beta-binomial kriging. The model proposed here is a modified form of spatial kriging which assumes the data are generated from a correlated beta-binomial distribution. Given this assumption, the spatial parameters and predicted values can be estimated using simple matrix algebra. Beta-binomial kriging …


A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny May 2015

A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny

Department of Statistics: Dissertations, Theses, and Student Work

In certain situations, observations are collected on a multivariate time series at a certain temporal scale. However, there may also exist underlying time series behavior on a larger temporal scale that is of interest. Often times, identifying the behavior of the data over the course of the larger scale is the key objective. Because this large scale trend is not being directly observed, describing the trends of the data on this scale can be more difficult. To further complicate matters, the observed data on the smaller time scale may be unevenly spaced from one larger scale time point to the …


New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley Mar 2014

New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley

Department of Statistics: Dissertations, Theses, and Student Work

Wildlife biologists, many times with the help of ordinary citizens, have developed and maintained long-term datasets for monitoring the status of wildlife populations. These datasets can range from a collection of citizen-reported sightings of a rare species, to datasets collected by biologists using standardized methods. The commonality is that these datasets span a temporal and spatial scale that is beyond the scope of most scientific studies. Ensuring the continued persistence of wildlife populations requires predictions of the impact of human actions. Regardless if the predictions are quantitative or qualitative, the best we can do is use the past data to …


A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman Jul 2013

A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman

Department of Statistics: Dissertations, Theses, and Student Work

This dissertation presents a formal method for detecting changes in a closed communications network based on an “abnormal” shift in the number of communications between some of the nodes. The method relies on the analyst’s ability to define the network of interest; capture the number of communications between nodes; and to establish a history of normal communications flow between nodes over fixed intervals of time. A metric multi-dimensional scaling technique is then used to represent the network at each time interval with a k-dimensional (k = 1, 2, …) configuration. The affine bi-dimensional regression coefficient of determination (aR2) …


Informative Retesting For Hierarchical Group Testing, Michael S. Black Jun 2013

Informative Retesting For Hierarchical Group Testing, Michael S. Black

Department of Statistics: Dissertations, Theses, and Student Work

Group testing is the process of pooling samples (e.g., blood, chemical compounds) from multiple sources and testing the pooled material for some binary characteristic. It is used in pathogen screening for humans and animals, drug discovery studies, electrical systems testing, and many other applications. Group testing has traditionally been used for two main types of investigations: 1) the identification of positive specimens and 2) the estimation of a characteristic’s prevalence in a population. This dissertation focuses on the identification process. We propose new identification procedures that exploit the heterogeneity among samples in order to reduce the number of tests needed …


A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar May 2011

A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar

Department of Statistics: Dissertations, Theses, and Student Work

The overall goal of this research, which is common to most spatial studies, is to predict a value of interest at an unsampled location based on measured values at nearby sampled locations. To accomplish this goal, ordinary kriging can be used to obtain the best linear unbiased predictor. However, there is often a large amount of variability surrounding the measurements of environmental variables, and traditional prediction methods, such as ordinary kriging, do not account for an attribute with more than one level of uncertainty. This dissertation addresses this limitation by introducing a new methodology called weighted kriging. This prediction technique …


Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang Nov 2009

Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang

Department of Statistics: Dissertations, Theses, and Student Work

This dissertation presents two statistical methodologies developed on multi-order Markov models. First, we introduce an alignment-free sequence comparison method, which represents a sequence using a multi-order transition matrix (MTM). The MTM contains information of multi-order dependencies and provides a comprehensive representation of the heterogeneous composition within a sequence. Based on the MTM, a distance measure is developed for pair-wise comparison of sequences. The new method is compared with the traditional maximum likelihood (ML) method, the complete composition vector (CCV) method and the improved version of the complete composition vector (ICCV) method using simulated sequences. We further illustrate the application of …