Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Nebraska - Lincoln

Department of Statistics: Dissertations, Theses, and Student Work

Articles 1 - 29 of 29

Full-Text Articles in Physical Sciences and Mathematics

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik Aug 2023

Exploring Experimental Design And Multivariate Analysis Techniques For Evaluating Community Structure Of Bacteria In Microbiome Data, Kelsey Karnik

Department of Statistics: Dissertations, Theses, and Student Work

The gut microbiome plays a crucial role in human health, and by working collaboratively with microbiologists, we aim to further our understanding of the human gut and its impact on human health. Promoting a diverse microbiome is emphasized throughout microbiology literature, and involving a statistician in designing experiments to relate gut bacteria and some measured health outcome is crucial for ensuring valid and accurate results. By adopting new experimental design and analysis methods, researchers can begin to gain a deeper understanding of how the genetics of our food affect the composition of taxa within the gut microbiome. This dissertation is …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Human Perception Of Exponentially Increasing Data Displayed On A Log Scale Evaluated Through Experimental Graphics Tasks, Emily Robinson Aug 2022

Human Perception Of Exponentially Increasing Data Displayed On A Log Scale Evaluated Through Experimental Graphics Tasks, Emily Robinson

Department of Statistics: Dissertations, Theses, and Student Work

Log scales are often used to display data over several orders of magnitude within one graph. We conducted a series of three graphical studies to evaluate the impact displaying data on the log scale has on human perception of exponentially increasing trends compared to displaying data on the linear scale. Each study was related to a different graphical task, each requiring a different level of interaction and cognitive use of the data being presented. The first experiment evaluated whether our ability to perceptually notice differences in exponentially increasing trends is impacted by the choice of scale. Participants were shown a …


Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray Dec 2021

Comparing Machine Learning Techniques With State-Of-The-Art Parametric Prediction Models For Predicting Soybean Traits, Susweta Ray

Department of Statistics: Dissertations, Theses, and Student Work

Soybean is a significant source of protein and oil, and also widely used as animal feed. Thus, developing lines that are superior in terms of yield, protein and oil content is important to feed the ever-growing population. As opposed to the high-cost phenotyping, genotyping is both cost and time efficient for breeders while evaluating new lines in different environments (location-year combinations) can be costly. Several Genomic prediction (GP) methods have been developed to use the marker and environment data effectively to predict the yield or other relevant phenotypic traits of crops. Our study compares a conventional GP method (GBLUP), a …


Factors Influencing Student Outcomes In A Large, Online Simulation-Based Introductory Statistics Course, Ella M. Burnham Aug 2021

Factors Influencing Student Outcomes In A Large, Online Simulation-Based Introductory Statistics Course, Ella M. Burnham

Department of Statistics: Dissertations, Theses, and Student Work

The demand for statistical knowledge and skills is growing in many disciplines, so more students are enrolling in introductory statistics courses (Blair, Kirkman, & Maxwell, 2018). At the same time, institutions are seeking course delivery methods that allow for greater flexibility for students, especially following the onset of the COVID-19 pandemic; therefore, there is more interest in the development and delivery of online introductory statistics courses.

To address this, I collaboratively designed an online introductory statistics course which focuses on simulation-based inference for the University of Nebraska-Lincoln. The course design was informed by the Community of Inquiry framework (Garrison, Anderson, …


Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta Jul 2020

Statistical Methodology To Establish A Benchmark For Evaluating Antimicrobial Resistance Genes Through Real Time Pcr Assay, Enakshy Dutta

Department of Statistics: Dissertations, Theses, and Student Work

Novel diagnostic tests are usually compared with gold standard tests for evaluating diagnostic accuracy. For assessing antimicrobial resistance (AMR) to bovine respiratory disease (BRD) pathogens, phenotypic broth microdilution method is used as gold standard (GS). The objective of the thesis is to evaluate the optimal cycle threshold (Ct) generated by real-time polymerase chain reaction (rtPCR) to genes that confer resistance that will translate to the phenotypic classification of AMR. Data from two different methodologies are assessed to identify Ct that will discriminate between resistance (R) and susceptibility (S). First, the receiver operating characteristic (ROC) curve was used to determine the …


Using Stability To Select A Shrinkage Method, Dean Dustin May 2020

Using Stability To Select A Shrinkage Method, Dean Dustin

Department of Statistics: Dissertations, Theses, and Student Work

Shrinkage methods are estimation techniques based on optimizing expressions to find which variables to include in an analysis, typically a linear regression. The general form of these expressions is the sum of an empirical risk plus a complexity penalty based on the number of parameters. Many shrinkage methods are known to satisfy an ‘oracle’ property meaning that asymptotically they select the correct variables and estimate their coefficients efficiently. In Section 1.2, we show oracle properties in two general settings. The first uses a log likelihood in place of the empirical risk and allows a general class of penalties. The second …


Community Impact On The Home Advantage Within Ncaa Men's Basketball, Erin O'Donnell Apr 2020

Community Impact On The Home Advantage Within Ncaa Men's Basketball, Erin O'Donnell

Department of Statistics: Dissertations, Theses, and Student Work

The home advantage is a commonly accepted truth throughout sports performances. This paper investigates the magnitude of the home advantage among NCAA Men’s Basketball teams. It will then look to draw relationships between the magnitude of the home advantage and community aspects such as attendance, location, past program success, and social media presence. Univariate and Multivariate models will be investigated.

Advisor: Walter S Stroup


Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt Apr 2020

Group Testing Identification: Objective Functions, Implementation, And Multiplex Assays, Brianna D. Hitt

Department of Statistics: Dissertations, Theses, and Student Work

Group testing is the process of combining items into groups to test for a binary characteristic. One of its most widely used applications is infectious disease testing. In this context, specimens (e.g., blood, urine) are amalgamated into groups and tested. For groups that test positive, there are many algorithmic retesting procedures available to identify positive individuals. The appeal of group testing is that the overall number of tests needed is significantly less than for individual testing when disease prevalence is small and an appropriate algorithm is chosen. Group testing has a number of applications beyond infectious disease testing, such as …


Optimal Design For A Causal Structure, Zaher Kmail Aug 2019

Optimal Design For A Causal Structure, Zaher Kmail

Department of Statistics: Dissertations, Theses, and Student Work

Linear models and mixed models are important statistical tools. But in many natural phenomena, there is more than one endogenous variable involved and these variables are related in a sophisticated way. Structural Equation Modeling (SEM) is often used to model the complex relationships between the endogenous and exogenous variables. It was first implemented in research to estimate the strength and direction of direct and indirect effects among variables and to measure the relative magnitude of each causal factor.

Historically, traditional optimal design theory focuses on univariate linear, nonlinear, and mixed models. There is no current literature on the subject of …


Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak Dec 2018

Role Of Misclassification Estimates In Estimating Disease Prevalence And A Non-Linear Approach To Study Synchrony Using Heart Rate Variability In Chickens, Dola Pathak

Department of Statistics: Dissertations, Theses, and Student Work

Infectious disease assays can be imperfect. When estimating disease prevalence, these imperfections are accounted for by incorporating assay sensitivity and specificity into point and variance estimates. Unfortunately, these accuracy measures are often treated as fixed constants, rather than acknowledging that they are estimates from an assay validation process. The purpose of this study is to show the detrimental effect of not taking into account this sampling variability when samples are obtained through group testing (aka, pooled testing). We show that confidence interval coverage can dramatically decline as the sample size increases for the main sample of interest. As a remedy …


A Characterization Of A Value Added Model And A New Multi-Stage Model For Estimating Teacher Effects Within Small School Systems, Julie M. Garai Aug 2017

A Characterization Of A Value Added Model And A New Multi-Stage Model For Estimating Teacher Effects Within Small School Systems, Julie M. Garai

Department of Statistics: Dissertations, Theses, and Student Work

At both the national and state level there is increasing pressure to develop metrics to determine if school systems are meeting educational objectives. All states mandate some form of assessment by standardized tests. One method currently used to model student test scores is Value Added Modeling (VAM), which models student scores as a product of classroom and school environments. One VAM approach is the Tennessee Value Added Assessment System (TVAAS) which models student gains from year to year. Teacher effects are included in this layered model, which estimates the teacher’s added value to a student score through best linear unbiased …


Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells Aug 2016

Methods To Account For Breed Composition In A Bayesian Gwas Method Which Utilizes Haplotype Clusters, Danielle F. Wilson-Wells

Department of Statistics: Dissertations, Theses, and Student Work

In livestock, prediction of an animal’s genetic merit using genomic information is becoming increasingly common. The models used to make these predictions typically assume that we are sampling from a homogeneous population. However, in both commercial and experimental populations the sire and dam of an individual may be a mixture of different breeds. Haplotype models can capture this population structure.

Two models based on breed specific haplotype clusters where developed to account for differences across multiple breeds. The first model utilizes the breed composition of the individual, while the second utilizes the breed composition from the sire and dam. Haplotype …


Simulations Of A New Response-Adaptive Biased Coin Design, Aleksandra Stein Dec 2015

Simulations Of A New Response-Adaptive Biased Coin Design, Aleksandra Stein

Department of Statistics: Dissertations, Theses, and Student Work

Modern medical experiments accrue and treat patients--hence obtain treatment response data--throughout a trial. Designs which prospectively plan to modify patient allocation by leveraging accumulating data are response-adaptive randomization (RAR) designs. Many such designs attempt to balance the desire to bias assignment proportions towards a treatment which is performing better against the need to maintain randomization in the face of continued equipoise.

This dissertation consists of simulated investigations into frequentist and ethical properties of an new RAR biased coin design. Chapter 2 proposes a new adaptive design for phase III clinical trials, a modification of the 2001 Bandyopadhyay and Biswas biased …


Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab Aug 2015

Beta-Binomial Kriging: A New Approach To Modeling Spatially Correlated Proportions, Aimee Schwab

Department of Statistics: Dissertations, Theses, and Student Work

Spatially correlated count data sets appear often in applied data analysis problems, but there is little consensus in the literature about how best to analyze the data. The two prevailing approaches provide accurate parameter estimates and predictions, at the cost of model interpretability and simplicity. This dissertation will present a new approach to modeling spatially correlated binomial observations: beta-binomial kriging. The model proposed here is a modified form of spatial kriging which assumes the data are generated from a correlated beta-binomial distribution. Given this assumption, the spatial parameters and predicted values can be estimated using simple matrix algebra. Beta-binomial kriging …


A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny May 2015

A New Approach To Modeling Multivariate Time Series On Multiple Temporal Scales, Tucker Zeleny

Department of Statistics: Dissertations, Theses, and Student Work

In certain situations, observations are collected on a multivariate time series at a certain temporal scale. However, there may also exist underlying time series behavior on a larger temporal scale that is of interest. Often times, identifying the behavior of the data over the course of the larger scale is the key objective. Because this large scale trend is not being directly observed, describing the trends of the data on this scale can be more difficult. To further complicate matters, the observed data on the smaller time scale may be unevenly spaced from one larger scale time point to the …


Modeling The Dynamic Processes Of Challenge And Recovery (Stress And Strain) Over Time, Fan Yang Jan 2015

Modeling The Dynamic Processes Of Challenge And Recovery (Stress And Strain) Over Time, Fan Yang

Department of Statistics: Dissertations, Theses, and Student Work

A dynamic process with challenge and recovery is an important branch in the family of stochastic processes. The dependent data of such processes are often observed over time, and hence, are time dependent. The purpose of this dissertation is to develop methods to characterize a dynamic process with challenge and recovery under different dimensionalities and error assumptions. In this dissertation, a univariate dynamic process under Gaussian assumption is discussed first and a bi-logistic model is developed by three different methods: compartment, additive, and Bayesian. Then the discussion is extended to a bivariate hysteresis system with challenge and recovery. Three methods: …


A Reduced Bias Method Of Estimating Variance Components In Generalized Linear Mixed Models, Elizabeth A. Claassen May 2014

A Reduced Bias Method Of Estimating Variance Components In Generalized Linear Mixed Models, Elizabeth A. Claassen

Department of Statistics: Dissertations, Theses, and Student Work

In small samples it is well known that the standard methods for estimating variance components in a generalized linear mixed model (GLMM), pseudo-likelihood and maximum likelihood, yield estimates that are biased downward. An important consequence of this is that inferences on fixed effects will have inflated Type I error rates because their precision is overstated. We introduce a new method for estimating parameters in GLMMs that applies a Firth bias adjustment to the maximum likelihood-based GLMM estimating algorithm. We apply this technique to one- and two-treatment logistic regression models with a single random effect. We show simulation results that demonstrate …


New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley Mar 2014

New Statistical Methods For Analysis Of Historical Data From Wildlife Populations, Trevor Hefley

Department of Statistics: Dissertations, Theses, and Student Work

Wildlife biologists, many times with the help of ordinary citizens, have developed and maintained long-term datasets for monitoring the status of wildlife populations. These datasets can range from a collection of citizen-reported sightings of a rare species, to datasets collected by biologists using standardized methods. The commonality is that these datasets span a temporal and spatial scale that is beyond the scope of most scientific studies. Ensuring the continued persistence of wildlife populations requires predictions of the impact of human actions. Regardless if the predictions are quantitative or qualitative, the best we can do is use the past data to …


A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman Jul 2013

A Test For Detecting Changes In Closed Networks Based On The Number Of Communications Between Nodes, Christopher S. Wichman

Department of Statistics: Dissertations, Theses, and Student Work

This dissertation presents a formal method for detecting changes in a closed communications network based on an “abnormal” shift in the number of communications between some of the nodes. The method relies on the analyst’s ability to define the network of interest; capture the number of communications between nodes; and to establish a history of normal communications flow between nodes over fixed intervals of time. A metric multi-dimensional scaling technique is then used to represent the network at each time interval with a k-dimensional (k = 1, 2, …) configuration. The affine bi-dimensional regression coefficient of determination (aR2) …


Informative Retesting For Hierarchical Group Testing, Michael S. Black Jun 2013

Informative Retesting For Hierarchical Group Testing, Michael S. Black

Department of Statistics: Dissertations, Theses, and Student Work

Group testing is the process of pooling samples (e.g., blood, chemical compounds) from multiple sources and testing the pooled material for some binary characteristic. It is used in pathogen screening for humans and animals, drug discovery studies, electrical systems testing, and many other applications. Group testing has traditionally been used for two main types of investigations: 1) the identification of positive specimens and 2) the estimation of a characteristic’s prevalence in a population. This dissertation focuses on the identification process. We propose new identification procedures that exploit the heterogeneity among samples in order to reduce the number of tests needed …


Group Testing Regression Models, Boan Zhang Nov 2012

Group Testing Regression Models, Boan Zhang

Department of Statistics: Dissertations, Theses, and Student Work

Group testing, where groups of individual specimens are composited to test for the presence or absence of a disease (or some other binary characteristic), is a procedure commonly used to reduce the costs of screening a large number of individuals. Statistical research in group testing has traditionally focused on a homogeneous population, where individuals are assumed to have the same probability of having a disease. However, individuals often have different risks of positivity, so recent research has examined regression models that allow for heterogeneity among individuals within the population. This dissertation focuses on two problems involving group testing regression models. …


Studying The Handling Of Heat Stressed Cattle Using The Additive Bi-Logistic Model To Fit Body Temperature, Fan Yang Dec 2011

Studying The Handling Of Heat Stressed Cattle Using The Additive Bi-Logistic Model To Fit Body Temperature, Fan Yang

Department of Statistics: Dissertations, Theses, and Student Work

Daily activities consume the energy of heifers, subsequently causing an elevation of body temperature, depending on the ambient conditions. A better understanding of the dynamics of body temperature (Tb) would be helpful when deciding how to process and handle heifers. It would also lead to specific recommendations on moving heifers under different ambient conditions, especially during the summer. In this study, a bi-logistic mixed model is used to describe the dynamics of Tb during the moving event. Data were taken from heifers in pens located at different distances from the heifer work station on four separate summer days under hot …


A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar May 2011

A Comparison Of Spatial Prediction Techniques Using Both Hard And Soft Data, Megan L. Liedtke Tesar

Department of Statistics: Dissertations, Theses, and Student Work

The overall goal of this research, which is common to most spatial studies, is to predict a value of interest at an unsampled location based on measured values at nearby sampled locations. To accomplish this goal, ordinary kriging can be used to obtain the best linear unbiased predictor. However, there is often a large amount of variability surrounding the measurements of environmental variables, and traditional prediction methods, such as ordinary kriging, do not account for an attribute with more than one level of uncertainty. This dissertation addresses this limitation by introducing a new methodology called weighted kriging. This prediction technique …


Estimating Teacher Effects Using Value-Added Models, Jennifer L. Green Aug 2010

Estimating Teacher Effects Using Value-Added Models, Jennifer L. Green

Department of Statistics: Dissertations, Theses, and Student Work

Value-added modeling is an alternative approach to test-based accountability systems based on the proportions of students scoring at or above pre-determined proficiency levels. Value-added modeling techniques provide opportunities to estimate an individual teacher’s effect on student learning, while allowing for the possibility to control for the effect of non-educational factors beyond a school system’s control, such as socioeconomic status. However, numerous considerations exist when using value-added models to estimate teacher effects and defining what the teacher effects really describe. Chapter 2 provides an introduction to value-added methodology by describing several value-added models available for estimating teacher effects and their respective …


Fully Exponential Laplace Approximation Em Algorithm For Nonlinear Mixed Effects Models, Meijian Zhou Dec 2009

Fully Exponential Laplace Approximation Em Algorithm For Nonlinear Mixed Effects Models, Meijian Zhou

Department of Statistics: Dissertations, Theses, and Student Work

Nonlinear mixed effects models provide a flexible and powerful platform for the analysis of clustered data that arise in numerous fields, such as pharmacology, biology, agriculture, forestry, and economics. This dissertation focuses on fitting parametric nonlinear mixed effects models with single- and multi-level random effects. A new, efficient, and accurate method that gives an error of order O(1/n2), fully exponential Laplace approximation EM algorithm (FELA-EM), for obtaining restricted maximum likelihood (REML) estimates in nonlinear mixed effects models is developed. Sample codes for implementing FELA-EM algorithm in R are given. Simulation studies have been conducted to evaluate …


Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang Nov 2009

Sequence Comparison And Stochastic Model Based On Multi-Order Markov Models, Xiang Fang

Department of Statistics: Dissertations, Theses, and Student Work

This dissertation presents two statistical methodologies developed on multi-order Markov models. First, we introduce an alignment-free sequence comparison method, which represents a sequence using a multi-order transition matrix (MTM). The MTM contains information of multi-order dependencies and provides a comprehensive representation of the heterogeneous composition within a sequence. Based on the MTM, a distance measure is developed for pair-wise comparison of sequences. The new method is compared with the traditional maximum likelihood (ML) method, the complete composition vector (CCV) method and the improved version of the complete composition vector (ICCV) method using simulated sequences. We further illustrate the application of …


Detecting Differentially Expressed Genes While Controlling The False Discovery Rate For Microarray Data, Shuo Jiao Jan 2009

Detecting Differentially Expressed Genes While Controlling The False Discovery Rate For Microarray Data, Shuo Jiao

Department of Statistics: Dissertations, Theses, and Student Work

Microarray is an important technology which enables people to investigate the expression levels of thousands of genes at the same time. One common goal of microarray data analysis is to detect differentially expressed genes while controlling the false discovery rate. This dissertation consists with four papers written to address this goal. The dissertation is organized as follows: In Chapter 1, a brief introduction of the Affymetrix GeneChip microarray technology is provided. The concept of differentially expressed genes and the definition of the false discovery rate are also introduced. In Chapter 2, a literature review of the related works on this …


Spatial Clustering Using The Likelihood Function, April Kerby Jan 2009

Spatial Clustering Using The Likelihood Function, April Kerby

Department of Statistics: Dissertations, Theses, and Student Work

Researchers have been using clustering algorithms for many years to group similar observations based on a set of recorded characteristics. The majority of these algorithms maximize the similarity of the observations within a cluster, while at the same time maximize the dissimilarity with observations in other clusters. However, nearly all of the current clustering algorithms do not take into account the actual geographic location of the observation during the clustering process. This dissertation consists of three papers which propose a method to incorporate the geographical location of an observation into the clustering algorithm, known as spatial clustering.

The first paper …