Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Applied Statistics (14)
- Statistical Models (7)
- Statistical Methodology (5)
- Life Sciences (4)
- Statistical Theory (4)
-
- Longitudinal Data Analysis and Time Series (3)
- Biostatistics (2)
- Categorical Data Analysis (2)
- Design of Experiments and Sample Surveys (2)
- Genetics and Genomics (2)
- Agronomy and Crop Sciences (1)
- Applied Mathematics (1)
- Bioinformatics (1)
- Clinical Trials (1)
- Data Science (1)
- Environmental Sciences (1)
- Genomics (1)
- Microbiology (1)
- Multivariate Analysis (1)
- Natural Resources and Conservation (1)
- Other Genetics and Genomics (1)
- Other Microbiology (1)
- Other Statistics and Probability (1)
- Plant Sciences (1)
- Probability (1)
- Keyword
-
- Binary response (2)
- Group testing (2)
- Kriging (2)
- Pooled testing (2)
- AUC (1)
-
- Antimicrobial Resistance (1)
- Bayesian (1)
- Bayesian Maximum Entropy (1)
- Beta-binomial (1)
- Bi-dimensional Regression (1)
- Biased Coin Design (1)
- Causal structure modeling (1)
- Chickens (1)
- Clinical Trials (1)
- Cross-validation techniques (1)
- Crossed random effects (1)
- Deep learning (1)
- Design of Experiments (1)
- Diagnostic tests (1)
- Distance education (1)
- EM (1)
- EM algorithm (1)
- Fake news (1)
- False discovery rate (1)
- Firth estimator (1)
- Full Information Maximum Likelihood (1)
- GBLUP (1)
- GWAS (1)
- Generalized Linear Mixed Models (1)
- Generalized linear model (1)
Articles 1 - 29 of 29
Full-Text Articles in Physical Sciences and Mathematics
Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik
Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik
Department of Statistics: Dissertations, Theses, and Student Work
The gut microbiome plays a crucial role in human health, and by working collaboratively with microbiologists, we aim to further our understanding of the human gut and its impact on human health. Promoting a diverse microbiome is emphasized throughout microbiology literature, and involving a statistician in designing experiments to relate gut bacteria and some measured health outcome is crucial for ensuring valid and accurate results. By adopting new experimental design and analysis methods, researchers can begin to gain a deeper understanding of how the genetics of our food affect the composition of taxa within the gut microbiome. This dissertation is …
Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild
Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild
Department of Statistics: Dissertations, Theses, and Student Work
The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …
Human Perception Of Exponentially Increasing Data Displayed On A Log Scale Evaluated Through Experimental Graphics Tasks, Emily Robinson
Human Perception Of Exponentially Increasing Data Displayed On A Log Scale Evaluated Through Experimental Graphics Tasks, Emily Robinson
Department of Statistics: Dissertations, Theses, and Student Work
Log scales are often used to display data over several orders of magnitude within one graph. We conducted a series of three graphical studies to evaluate the impact displaying data on the log scale has on human perception of exponentially increasing trends compared to displaying data on the linear scale. Each study was related to a different graphical task, each requiring a different level of interaction and cognitive use of the data being presented. The first experiment evaluated whether our ability to perceptually notice differences in exponentially increasing trends is impacted by the choice of scale. Participants were shown a …
Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray
Department of Statistics: Dissertations, Theses, and Student Work
Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …
Factors Influencing Student Outcomes In A Large, Online Simulation-Based Introductory Statistics Course, Ella M. Burnham
Factors Influencing Student Outcomes In A Large, Online Simulation-Based Introductory Statistics Course, Ella M. Burnham
Department of Statistics: Dissertations, Theses, and Student Work
The demand for statistical knowledge and skills is growing in many disciplines, so more students are enrolling in introductory statistics courses (Blair, Kirkman, & Maxwell, 2018). At the same time, institutions are seeking course delivery methods that allow for greater flexibility for students, especially following the onset of the COVID-19 pandemic; therefore, there is more interest in the development and delivery of online introductory statistics courses.
To address this, I collaboratively designed an online introductory statistics course which focuses on simulation-based inference for the University of Nebraska-Lincoln. The course design was informed by the Community of Inquiry framework (Garrison, Anderson, …
Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta
Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta
Department of Statistics: Dissertations, Theses, and Student Work
Novel diagnostic tests are usually compared with gold standard tests for evaluating diagnostic accuracy. For assessing antimicrobial resistance (AMR) to bovine respiratory disease (BRD) pathogens, phenotypic broth microdilution method is used as gold standard (GS). The objective of the thesis is to evaluate the optimal cycle threshold (Ct) generated by real-time polymerase chain reaction (rtPCR) to genes that confer resistance that will translate to the phenotypic classification of AMR. Data from two different methodologies are assessed to identify Ct that will discriminate between resistance (R) and susceptibility (S). First, the receiver operating characteristic (ROC) curve was used to determine the …
Using Stability To Select A Shrinkage Method, Dean Dustin
Using Stability To Select A Shrinkage Method, Dean Dustin
Department of Statistics: Dissertations, Theses, and Student Work
Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …
Community Impact On The Home Advantage Within Ncaa Men's Basketball, Erin O'Donnell
Community Impact On The Home Advantage Within Ncaa Men's Basketball, Erin O'Donnell
Department of Statistics: Dissertations, Theses, and Student Work
The home advantage is a commonly accepted truth throughout sports performances. This paper investigates the magnitude of the home advantage among NCAA Men’s Basketball teams. It will then look to draw relationships between the magnitude of the home advantage and community aspects such as attendance, location, past program success, and social media presence. Univariate and Multivariate models will be investigated.
Advisor: Walter S Stroup
Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt
Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt
Department of Statistics: Dissertations, Theses, and Student Work
Group testing is the process of combining items into groups to test for a binary characteristic. One of its most widely used applications is infectious disease testing. In this context, specimens (e.g., blood, urine) are amalgamated into groups and tested. For groups that test positive, there are many algorithmic retesting procedures available to identify positive individuals. The appeal of group testing is that the overall number of tests needed is significantly less than for individual testing when disease prevalence is small and an appropriate algorithm is chosen. Group testing has a number of applications beyond infectious disease testing, such as …
Optimal Design For A Causal Structure, Zaher Kmail
Optimal Design For A Causal Structure, Zaher Kmail
Department of Statistics: Dissertations, Theses, and Student Work
Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.
Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …
Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak
Department of Statistics: Dissertations, Theses, and Student Work
Infectious disease assays can be imperfect. When estimating disease prevalence, these imperfections are accounted for by incorporating assay sensitivity and specificity into point and variance estimates. Unfortunately, these accuracy measures are often treated as fixed constants, rather than acknowledging that they are estimates from an assay validation process. The purpose of this study is to show the detrimental effect of not taking into account this sampling variability when samples are obtained through group testing (aka, pooled testing). We show that confidence interval coverage can dramatically decline as the sample size increases for the main sample of interest. As a remedy …
A Characterization Of A Value Added Model And A New Multi-Stage Model For Estimating Teacher Effects Within Small School Systems, Julie M. Garai
A Characterization Of A Value Added Model And A New Multi-Stage Model For Estimating Teacher Effects Within Small School Systems, Julie M. Garai
Department of Statistics: Dissertations, Theses, and Student Work
At both the national and state level there is increasing pressure to develop metrics to determine if school systems are meeting educational objectives. All states mandate some form of assessment by standardized tests. One method currently used to model student test scores is Value Added Modeling (VAM), which models student scores as a product of classroom and school environments. One VAM approach is the Tennessee Value Added Assessment System (TVAAS) which models student gains from year to year. Teacher effects are included in this layered model, which estimates the teacher’s added value to a student score through best linear unbiased …
Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells
Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells
Department of Statistics: Dissertations, Theses, and Student Work
In livestock, prediction of an animal’s genetic merit using genomic information is becoming increasingly common. The models used to make these predictions typically assume that we are sampling from a homogeneous population. However, in both commercial and experimental populations the sire and dam of an individual may be a mixture of different breeds. Haplotype models can capture this population structure.
Two models based on breed specific haplotype clusters where developed to account for differences across multiple breeds. The first model utilizes the breed composition of the individual, while the second utilizes the breed composition from the sire and dam. Haplotype …
Simulations Of A New Response-Adaptive Biased Coin Design, Aleksandra Stein
Simulations Of A New Response-Adaptive Biased Coin Design, Aleksandra Stein
Department of Statistics: Dissertations, Theses, and Student Work
Modern medical experiments accrue and treat patients--hence obtain treatment response data--throughout a trial. Designs which prospectively plan to modify patient allocation by leveraging accumulating data are response-adaptive randomization (RAR) designs. Many such designs attempt to balance the desire to bias assignment proportions towards a treatment which is performing better against the need to maintain randomization in the face of continued equipoise.
This dissertation consists of simulated investigations into frequentist and ethical properties of an new RAR biased coin design. Chapter 2 proposes a new adaptive design for phase III clinical trials, a modification of the 2001 Bandyopadhyay and Biswas biased …
Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab
Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab
Department of Statistics: Dissertations, Theses, and Student Work
Spatially correlated count data sets appear often in applied data analysis problems, but there is little consensus in the literature about how best to analyze the data. The two prevailing approaches provide accurate parameter estimates and predictions, at the cost of model interpretability and simplicity. This dissertation will present a new approach to modeling spatially correlated binomial observations: beta-binomial kriging. The model proposed here is a modified form of spatial kriging which assumes the data are generated from a correlated beta-binomial distribution. Given this assumption, the spatial parameters and predicted values can be estimated using simple matrix algebra. Beta-binomial kriging …
A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny
A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny
Department of Statistics: Dissertations, Theses, and Student Work
In certain situations, observations are collected on a multivariate time series at a certain temporal scale. However, there may also exist underlying time series behavior on a larger temporal scale that is of interest. Often times, identifying the behavior of the data over the course of the larger scale is the key objective. Because this large scale trend is not being directly observed, describing the trends of the data on this scale can be more difficult. To further complicate matters, the observed data on the smaller time scale may be unevenly spaced from one larger scale time point to the …
Modeling The Dynamic Processes Of Challenge And Recovery (Stress And Strain) Over Time, Fan Yang
Modeling The Dynamic Processes Of Challenge And Recovery (Stress And Strain) Over Time, Fan Yang
Department of Statistics: Dissertations, Theses, and Student Work
A dynamic process with challenge and recovery is an important branch in the family of stochastic processes. The dependent data of such processes are often observed over time, and hence, are time dependent. The purpose of this dissertation is to develop methods to characterize a dynamic process with challenge and recovery under different dimensionalities and error assumptions. In this dissertation, a univariate dynamic process under Gaussian assumption is discussed first and a bi-logistic model is developed by three different methods: compartment, additive, and Bayesian. Then the discussion is extended to a bivariate hysteresis system with challenge and recovery. Three methods: …
A Reduced Bias Method Of Estimating Variance Components In Generalized Linear Mixed Models, Elizabeth A. Claassen
A Reduced Bias Method Of Estimating Variance Components In Generalized Linear Mixed Models, Elizabeth A. Claassen
Department of Statistics: Dissertations, Theses, and Student Work
In small samples it is well known that the standard methods for estimating variance components in a generalized linear mixed model (GLMM), pseudo-likelihood and maximum likelihood, yield estimates that are biased downward. An important consequence of this is that inferences on fixed effects will have inflated Type I error rates because their precision is overstated. We introduce a new method for estimating parameters in GLMMs that applies a Firth bias adjustment to the maximum likelihood-based GLMM estimating algorithm. We apply this technique to one- and two-treatment logistic regression models with a single random effect. We show simulation results that demonstrate …
New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley
New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley
Department of Statistics: Dissertations, Theses, and Student Work
Wildlife biologists, many times with the help of ordinary citizens, have developed and maintained long-term datasets for monitoring the status of wildlife populations. These datasets can range from a collection of citizen-reported sightings of a rare species, to datasets collected by biologists using standardized methods. The commonality is that these datasets span a temporal and spatial scale that is beyond the scope of most scientific studies. Ensuring the continued persistence of wildlife populations requires predictions of the impact of human actions. Regardless if the predictions are quantitative or qualitative, the best we can do is use the past data to …
A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman
A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman
Department of Statistics: Dissertations, Theses, and Student Work
This dissertation presents a formal method for detecting changes in a closed communications network based on an “abnormal” shift in the number of communications between some of the nodes. The method relies on the analyst’s ability to define the network of interest; capture the number of communications between nodes; and to establish a history of normal communications flow between nodes over fixed intervals of time. A metric multi-dimensional scaling technique is then used to represent the network at each time interval with a k-dimensional (k = 1, 2, …) configuration. The affine bi-dimensional regression coefficient of determination (aR2) …
Informative Retesting For Hierarchical Group Testing, Michael S. Black
Informative Retesting For Hierarchical Group Testing, Michael S. Black
Department of Statistics: Dissertations, Theses, and Student Work
Group testing is the process of pooling samples (e.g., blood, chemical compounds) from multiple sources and testing the pooled material for some binary characteristic. It is used in pathogen screening for humans and animals, drug discovery studies, electrical systems testing, and many other applications. Group testing has traditionally been used for two main types of investigations: 1) the identification of positive specimens and 2) the estimation of a characteristic’s prevalence in a population. This dissertation focuses on the identification process. We propose new identification procedures that exploit the heterogeneity among samples in order to reduce the number of tests needed …
Group Testing Regression Models, Boan Zhang
Group Testing Regression Models, Boan Zhang
Department of Statistics: Dissertations, Theses, and Student Work
Group testing, where groups of individual specimens are composited to test for the presence or absence of a disease (or some other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Statistical research in group testing has traditionally focused on a homogeneous population, where individuals are assumed to have the same probability of having a disease. However, individuals often have different risks of positivity, so recent research has examined regression models that allow for heterogeneity among individuals within the population. This dissertation focuses on two problems involving group testing regression models. …
Studying The Handling Of Heat Stressed Cattle Using The Additive Bi-Logistic Model To Fit Body Temperature, Fan Yang
Department of Statistics: Dissertations, Theses, and Student Work
Daily activities consume the energy of heifers, subsequently causing an elevation of body temperature, depending on the ambient conditions. A better understanding of the dynamics of body temperature (Tb) would be helpful when deciding how to process and handle heifers. It would also lead to specific recommendations on moving heifers under different ambient conditions, especially during the summer. In this study, a bi-logistic mixed model is used to describe the dynamics of Tb during the moving event. Data were taken from heifers in pens located at different distances from the heifer work station on four separate summer days under hot …
A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar
A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar
Department of Statistics: Dissertations, Theses, and Student Work
The overall goal of this research, which is common to most spatial studies, is to predict a value of interest at an unsampled location based on measured values at nearby sampled locations. To accomplish this goal, ordinary kriging can be used to obtain the best linear unbiased predictor. However, there is often a large amount of variability surrounding the measurements of environmental variables, and traditional prediction methods, such as ordinary kriging, do not account for an attribute with more than one level of uncertainty. This dissertation addresses this limitation by introducing a new methodology called weighted kriging. This prediction technique …
Estimating Teacher Effects Using Value-Added Models, Jennifer L. Green
Estimating Teacher Effects Using Value-Added Models, Jennifer L. Green
Department of Statistics: Dissertations, Theses, and Student Work
Value-added modeling is an alternative approach to test-based accountability systems based on the proportions of students scoring at or above pre-determined proficiency levels. Value-added modeling techniques provide opportunities to estimate an individual teacher’s effect on student learning, while allowing for the possibility to control for the effect of non-educational factors beyond a school system’s control, such as socioeconomic status. However, numerous considerations exist when using value-added models to estimate teacher effects and defining what the teacher effects really describe. Chapter 2 provides an introduction to value-added methodology by describing several value-added models available for estimating teacher effects and their respective …
Fully Exponential Laplace Approximation Em Algorithm For Nonlinear Mixed Effects Models, Meijian Zhou
Fully Exponential Laplace Approximation Em Algorithm For Nonlinear Mixed Effects Models, Meijian Zhou
Department of Statistics: Dissertations, Theses, and Student Work
Nonlinear mixed effects models provide a flexible and powerful platform for the analysis of clustered data that arise in numerous fields, such as pharmacology, biology, agriculture, forestry, and economics. This dissertation focuses on fitting parametric nonlinear mixed effects models with single- and multi-level random effects. A new, efficient, and accurate method that gives an error of order O(1/n2), fully exponential Laplace approximation EM algorithm (FELA-EM), for obtaining restricted maximum likelihood (REML) estimates in nonlinear mixed effects models is developed. Sample codes for implementing FELA-EM algorithm in R are given. Simulation studies have been conducted to evaluate …
Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang
Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang
Department of Statistics: Dissertations, Theses, and Student Work
This dissertation presents two statistical methodologies developed on multi-order Markov models. First, we introduce an alignment-free sequence comparison method, which represents a sequence using a multi-order transition matrix (MTM). The MTM contains information of multi-order dependencies and provides a comprehensive representation of the heterogeneous composition within a sequence. Based on the MTM, a distance measure is developed for pair-wise comparison of sequences. The new method is compared with the traditional maximum likelihood (ML) method, the complete composition vector (CCV) method and the improved version of the complete composition vector (ICCV) method using simulated sequences. We further illustrate the application of …
Detecting Differentially Expressed Genes While Controlling The False Discovery Rate For Microarray Data, Shuo Jiao
Department of Statistics: Dissertations, Theses, and Student Work
Microarray is an important technology which enables people to investigate the expression levels of thousands of genes at the same time. One common goal of microarray data analysis is to detect differentially expressed genes while controlling the false discovery rate. This dissertation consists with four papers written to address this goal. The dissertation is organized as follows: In Chapter 1, a brief introduction of the Affymetrix GeneChip microarray technology is provided. The concept of differentially expressed genes and the definition of the false discovery rate are also introduced. In Chapter 2, a literature review of the related works on this …
Spatial Clustering Using The Likelihood Function, April Kerby
Spatial Clustering Using The Likelihood Function, April Kerby
Department of Statistics: Dissertations, Theses, and Student Work
Researchers have been using clustering algorithms for many years to group similar observations based on a set of recorded characteristics. The majority of these algorithms maximize the similarity of the observations within a cluster, while at the same time maximize the dissimilarity with observations in other clusters. However, nearly all of the current clustering algorithms do not take into account the actual geographic location of the observation during the clustering process. This dissertation consists of three papers which propose a method to incorporate the geographical location of an observation into the clustering algorithm, known as spatial clustering.
The first paper …