Open Access. Powered by Scholars. Published by Universities.®

Statistical Models

Series

Institution
Keyword
Publication Year
Publication

Articles 1 - 20 of 20

Full-Text Articles in Design of Experiments and Sample Surveys

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


The Impact Of Truncating Data On The Predictive Ability For Single-Step Genomic Best Linear Unbiased Prediction, Jeremy T. Howard, Thomas A. Rathje, Caitlyn E. Bruns, Danielle F. Wilson-Wells, Stephen D. Kachman, Matthew L. Spangler Jan 2018

The Impact Of Truncating Data On The Predictive Ability For Single-Step Genomic Best Linear Unbiased Prediction, Jeremy T. Howard, Thomas A. Rathje, Caitlyn E. Bruns, Danielle F. Wilson-Wells, Stephen D. Kachman, Matthew L. Spangler

Department of Animal Science: Faculty Publications

Simulated and swine industry data sets were utilized to assess the impact of removing older data on the predictive ability of selection candidate estimated breeding values (EBV) when using single-step genomic best linear unbiased prediction (ssGBLUP). Simulated data included thirty replicates designed to mimic the structure of swine data sets. For the simulated data, varying amounts of data were truncated based on the number of ancestral generations back from the selection candidates. The swine data sets consisted of phenotypic and genotypic records for three traits across two breeds on animals born from 2003 to 2017. Phenotypes and genotypes were iteratively …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone Feb 2015

Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone

Department of Management: Faculty Publications

Survey respondents differ in their levels of attention and effort when responding to items. There are a number of methods researchers may use to identify respondents who fail to exert sufficient effort in order to increase the rigor of analysis and enhance the trustworthiness of study results. Screening techniques are organized into three general categories, which differ in impact on survey design and potential respondent awareness. Assumptions and considerations regarding appropriate use of screening techniques are discussed along with descriptions of each technique. The utility of each screening technique is a function of survey design and administration. Each technique has …


Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh Jun 2013

Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …


Some Ratio Type Estimators Under Measurement Errors, Florentin Smarandache, Mukesh Kumar, Rajesh Singh, Ashish K. Singh Jan 2011

Some Ratio Type Estimators Under Measurement Errors, Florentin Smarandache, Mukesh Kumar, Rajesh Singh, Ashish K. Singh

Branch Mathematics and Statistics Faculty and Staff Publications

This article addresses the problem of estimating the population mean using auxiliary information in the presence of measurement errors.


Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo Jan 2010

Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo

Masters Theses 1911 - February 2014

This research investigates route choice behavior in networks with risky travel times and real-time information. A stated preference survey is conducted in which subjects use a PC-based interactive maps to choose routes link-by-link in various scenarios. The scenarios include two types of maps: the first presenting a choice between one stochastic route and one deterministic route, and the second with real-time information and an available detour. The first type measures the basic risk attitude of the subject. The second type allows for strategic planning, and measures the effect of this opportunity on subjects' choice behavior.

Results from each subject are …


The Effects Of The Use Of Technology In Mathematics Instruction On Student Achievement, Ron Y. Myers Mar 2009

The Effects Of The Use Of Technology In Mathematics Instruction On Student Achievement, Ron Y. Myers

FIU Electronic Theses and Dissertations

The purpose of this study was to examine the effects of the use of technology on students’ mathematics achievement, particularly the Florida Comprehensive Assessment Test (FCAT) mathematics results. Eleven schools within the Miami-Dade County Public School System participated in a pilot program on the use of Geometers Sketchpad (GSP). Three of these schools were randomly selected for this study. Each school sent a teacher to a summer in-service training program on how to use GSP to teach geometry. In each school, the GSP class and a traditional geometry class taught by the same teacher were the study participants. Students’ mathematics …


Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan Jun 2008

Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.

We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …


The Time Invariance Principle, Ecological (Non)Chaos, And A Fundamental Pitfall Of Discrete Modeling, Bo Deng Mar 2007

The Time Invariance Principle, Ecological (Non)Chaos, And A Fundamental Pitfall Of Discrete Modeling, Bo Deng

Department of Mathematics: Faculty Publications

This paper is to show that most discrete models used for population dynamics in ecology are inherently pathological that their predications cannot be independently verified by experiments because they violate a fundamental principle of physics. The result is used to tackle an on-going controversy regarding ecological chaos. Another implication of the result is that all continuous dynamical systems must be modeled by differential equations. As a result it suggests that researches based on discrete modeling must be closely scrutinized and the teaching of calculus and differential equations must be emphasized for students of biology.


New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski May 2005

New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski

COBRA Preprint Series

As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.

Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …


Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch Oct 2004

Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch

The University of Michigan Department of Biostatistics Working Paper Series

Right censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design, the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design, the subcohort is selected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analyzed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the "pseudo-likelihood" estimating equation that has been …


Overlap Bias In The Case-Crossover Design, With Application To Air Pollution Exposures, Holly Janes, Lianne Sheppard, Thomas Lumley Jan 2004

Overlap Bias In The Case-Crossover Design, With Application To Air Pollution Exposures, Holly Janes, Lianne Sheppard, Thomas Lumley

UW Biostatistics Working Paper Series

The case-crossover design uses cases only, and compares exposures just prior to the event times to exposures at comparable control, or “referent” times, in order to assess the effect of short-term exposure on the risk of a rare event. It has commonly been used to study the effect of air pollution on the risk of various adverse health events. Proper selection of referents is crucial, especially with air pollution exposures, which are shared, highly seasonal, and often have a long term time trend. Hence, careful referent selection is important to control for time-varying confounders, and in order to ensure that …


Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin Dec 2003

Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin

The University of Michigan Department of Biostatistics Working Paper Series

The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter …


Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty Dec 2003

Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty

UW Biostatistics Working Paper Series

We extend the marginalized transition model of Heagerty (2002) to accommodate nonignorable monotone dropout. Using a selection model, weakly identified dropout parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40% compared to a likelihood-based marginalized transition model (MTM) with comparable modeling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and nonignorable missing data, and both reduce bias noticeably by improving model fit


To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little Nov 2003

To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …


Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little Aug 2003

Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Inference about the finite population total from probability-proportional-to-size (PPS) samples is considered. In previous work (Zheng and Little, 2003), penalized spline (p-spline) nonparametric model-based estimators were shown to generally outperform the Horvitz-Thompson (HT) and generalized regression (GR) estimators in terms of the root mean squared error. In this article we develop model-based, jackknife and balanced repeated replicate variance estimation methods for the p-spline based estimators. Asymptotic properties of the jackknife method are discussed. Simulations show that p-spline point estimators and their jackknife standard errors lead to inferences that are superior to HT or GR based inferences. This suggests that nonparametric …


Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman May 2003

Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman

The University of Michigan Department of Biostatistics Working Paper Series

The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the …


Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan May 2003

Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan

The University of Michigan Department of Biostatistics Working Paper Series

This review is an attempt to understand the landmark papers of Robins, Rotnitzky, and Zhao (1994) and Robins and Rotnitzky (1992). We revisit their main results and corresponding proofs using the theory outlined in the monograph by Bickel, Klaassen, Ritov, and Wellner (1993). We also discuss an illustrative example to show the details of applying these theoretical results.


Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little Mar 2003

Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little

The University of Michigan Department of Biostatistics Working Paper Series

Samplers often distrust model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator …