Open Access. Powered by Scholars. Published by Universities.®
Design of Experiments and Sample Surveys Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Weighting (2)
- Acheivement (1)
- Acheivement gap (1)
- Air pollution (1)
- Area under the curve (1)
-
- Asymptotic linearity (1)
- Auxiliary variate (1)
- Balanced repeated replication (1)
- Baseball (1)
- Batting Order (1)
- Bayesian methods; design-based inference; sampling weights (1)
- Bernstein's inequality; central limit theorem; confidence interval; influence curve; normal distribution; survey sampling (1)
- Case-cohort design (1)
- Case-crossover design (1)
- Censored linear regression (1)
- Clinical trials (1)
- Clustering (1)
- Counting processes (1)
- Cross-validation (1)
- Crossover (1)
- Data cleaning (1)
- Data mining (1)
- Data quality (1)
- Data reduction (1)
- Double robustness (1)
- Empirical Bayes estimation (1)
- Equivalence trial (1)
- Estimating equations (1)
- E±cient score (1)
- Gender (1)
- Publication Year
Articles 1 - 20 of 20
Full-Text Articles in Design of Experiments and Sample Surveys
Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski
Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski
Honors Scholar Theses
Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …
The Impact Of Truncating Data On The Predictive Ability For Single-Step Genomic Best Linear Unbiased Prediction, Jeremy T. Howard, Thomas A. Rathje, Caitlyn E. Bruns, Danielle F. Wilson-Wells, Stephen D. Kachman, Matthew L. Spangler
The Impact Of Truncating Data On The Predictive Ability For Single-Step Genomic Best Linear Unbiased Prediction, Jeremy T. Howard, Thomas A. Rathje, Caitlyn E. Bruns, Danielle F. Wilson-Wells, Stephen D. Kachman, Matthew L. Spangler
Department of Animal Science: Faculty Publications
Simulated and swine industry data sets were utilized to assess the impact of removing older data on the predictive ability of selection candidate estimated breeding values (EBV) when using single-step genomic best linear unbiased prediction (ssGBLUP). Simulated data included thirty replicates designed to mimic the structure of swine data sets. For the simulated data, varying amounts of data were truncated based on the number of ancestral generations back from the selection candidates. The swine data sets consisted of phenotypic and genotypic records for three traits across two breeds on animals born from 2003 to 2017. Phenotypes and genotypes were iteratively …
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret
UW Biostatistics Working Paper Series
We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …
Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone
Best Practice Recommendations For Data Screening, Justin A. Desimone, Peter D. Harms, Alice J. Desimone
Department of Management: Faculty Publications
Survey respondents differ in their levels of attention and effort when responding to items. There are a number of methods researchers may use to identify respondents who fail to exert sufficient effort in order to increase the rigor of analysis and enhance the trustworthiness of study results. Screening techniques are organized into three general categories, which differ in impact on survey design and potential respondent awareness. Assumptions and considerations regarding appropriate use of screening techniques are discussed along with descriptions of each technique. The utility of each screening technique is a function of survey design and administration. Each technique has …
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
U.C. Berkeley Division of Biostatistics Working Paper Series
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …
Some Ratio Type Estimators Under Measurement Errors, Florentin Smarandache, Mukesh Kumar, Rajesh Singh, Ashish K. Singh
Some Ratio Type Estimators Under Measurement Errors, Florentin Smarandache, Mukesh Kumar, Rajesh Singh, Ashish K. Singh
Branch Mathematics and Statistics Faculty and Staff Publications
This article addresses the problem of estimating the population mean using auxiliary information in the presence of measurement errors.
Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo
Route Choice Behavior In Risky Networks With Real-Time Information, Michael D. Razo
Masters Theses 1911 - February 2014
This research investigates route choice behavior in networks with risky travel times and real-time information. A stated preference survey is conducted in which subjects use a PC-based interactive maps to choose routes link-by-link in various scenarios. The scenarios include two types of maps: the first presenting a choice between one stochastic route and one deterministic route, and the second with real-time information and an available detour. The first type measures the basic risk attitude of the subject. The second type allows for strategic planning, and measures the effect of this opportunity on subjects' choice behavior.
Results from each subject are …
The Effects Of The Use Of Technology In Mathematics Instruction On Student Achievement, Ron Y. Myers
The Effects Of The Use Of Technology In Mathematics Instruction On Student Achievement, Ron Y. Myers
FIU Electronic Theses and Dissertations
The purpose of this study was to examine the effects of the use of technology on students’ mathematics achievement, particularly the Florida Comprehensive Assessment Test (FCAT) mathematics results. Eleven schools within the Miami-Dade County Public School System participated in a pilot program on the use of Geometers Sketchpad (GSP). Three of these schools were randomly selected for this study. Each school sent a teacher to a summer in-service training program on how to use GSP to teach geometry. In each school, the GSP class and a traditional geometry class taught by the same teacher were the study participants. Students’ mathematics …
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.
We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …
The Time Invariance Principle, Ecological (Non)Chaos, And A Fundamental Pitfall Of Discrete Modeling, Bo Deng
The Time Invariance Principle, Ecological (Non)Chaos, And A Fundamental Pitfall Of Discrete Modeling, Bo Deng
Department of Mathematics: Faculty Publications
This paper is to show that most discrete models used for population dynamics in ecology are inherently pathological that their predications cannot be independently verified by experiments because they violate a fundamental principle of physics. The result is used to tackle an on-going controversy regarding ecological chaos. Another implication of the result is that all continuous dynamical systems must be modeled by differential equations. As a result it suggests that researches based on discrete modeling must be closely scrutinized and the teaching of calculus and differential equations must be emphasized for students of biology.
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
New Statistical Paradigms Leading To Web-Based Tools For Clinical/Translational Science, Knut M. Wittkowski
COBRA Preprint Series
As the field of functional genetics and genomics is beginning to mature, we become confronted with new challenges. The constant drop in price for sequencing and gene expression profiling as well as the increasing number of genetic and genomic variables that can be measured makes it feasible to address more complex questions. The success with rare diseases caused by single loci or genes has provided us with a proof-of-concept that new therapies can be developed based on functional genomics and genetics.
Common diseases, however, typically involve genetic epistasis, genomic pathways, and proteomic pattern. Moreover, to better understand the underlying biologi-cal …
Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch
Censored Linear Regression For Case-Cohort Studies, Bin Nan, Menggang Yu, Jack Kalbfleisch
The University of Michigan Department of Biostatistics Working Paper Series
Right censored data from a classical case-cohort design and a stratified case-cohort design are considered. In the classical case-cohort design, the subcohort is obtained as a simple random sample of the entire cohort, whereas in the stratified design, the subcohort is selected by independent Bernoulli sampling with arbitrary selection probabilities. For each design and under a linear regression model, methods for estimating the regression parameters are proposed and analyzed. These methods are derived by modifying the linear ranks tests and estimating equations that arise from full-cohort data using methods that are similar to the "pseudo-likelihood" estimating equation that has been …
Overlap Bias In The Case-Crossover Design, With Application To Air Pollution Exposures, Holly Janes, Lianne Sheppard, Thomas Lumley
Overlap Bias In The Case-Crossover Design, With Application To Air Pollution Exposures, Holly Janes, Lianne Sheppard, Thomas Lumley
UW Biostatistics Working Paper Series
The case-crossover design uses cases only, and compares exposures just prior to the event times to exposures at comparable control, or “referent” times, in order to assess the effect of short-term exposure on the risk of a rare event. It has commonly been used to study the effect of air pollution on the risk of various adverse health events. Proper selection of referents is crucial, especially with air pollution exposures, which are shared, highly seasonal, and often have a long term time trend. Hence, careful referent selection is important to control for time-varying confounders, and in order to ensure that …
Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin
Robust Likelihood-Based Analysis Of Multivariate Data With Missing Values, Rod Little, An Hyonggin
The University of Michigan Department of Biostatistics Working Paper Series
The model-based approach to inference from multivariate data with missing values is reviewed. Regression prediction is most useful when the covariates are predictive of the missing values and the probability of being missing, and in these circumstances predictions are particularly sensitive to model misspecification. The use of penalized splines of the propensity score is proposed to yield robust model-based inference under the missing at random (MAR) assumption, assuming monotone missing data. Simulation comparisons with other methods suggest that the method works well in a wide range of populations, with little loss of efficiency relative to parametric models when the latter …
Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty
Marginalized Transition Models For Longitudinal Binary Data With Ignorable And Nonignorable Dropout, Brenda F. Kurland, Patrick J. Heagerty
UW Biostatistics Working Paper Series
We extend the marginalized transition model of Heagerty (2002) to accommodate nonignorable monotone dropout. Using a selection model, weakly identified dropout parameters are held constant and their effects evaluated through sensitivity analysis. For data missing at random (MAR), efficiency of inverse probability of censoring weighted generalized estimating equations (IPCW-GEE) is as low as 40% compared to a likelihood-based marginalized transition model (MTM) with comparable modeling burden. MTM and IPCW-GEE regression parameters both display misspecification bias for MAR and nonignorable missing data, and both reduce bias noticeably by improving model fit
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
To Model Or Not To Model? Competing Modes Of Inference For Finite Population Sampling, Rod Little
The University of Michigan Department of Biostatistics Working Paper Series
Finite population sampling is perhaps the only area of statistics where the primary mode of analysis is based on the randomization distribution, rather than on statistical models for the measured variables. This article reviews the debate between design and model-based inference. The basic features of the two approaches are illustrated using the case of inference about the mean from stratified random samples. Strengths and weakness of design-based and model-based inference for surveys are discussed. It is suggested that models that take into account the sample design and make weak parametric assumptions can produce reliable and efficient inferences in surveys settings. …
Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little
Inference For The Population Total From Probability-Proportional-To-Size Samples Based On Predictions From A Penalized Spline Nonparametric Model, Hui Zheng, Rod Little
The University of Michigan Department of Biostatistics Working Paper Series
Inference about the finite population total from probability-proportional-to-size (PPS) samples is considered. In previous work (Zheng and Little, 2003), penalized spline (p-spline) nonparametric model-based estimators were shown to generally outperform the Horvitz-Thompson (HT) and generalized regression (GR) estimators in terms of the root mean squared error. In this article we develop model-based, jackknife and balanced repeated replicate variance estimation methods for the p-spline based estimators. Asymptotic properties of the jackknife method are discussed. Simulations show that p-spline point estimators and their jackknife standard errors lead to inferences that are superior to HT or GR based inferences. This suggests that nonparametric …
Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman
Mixtures Of Varying Coefficient Models For Longitudinal Data With Discrete Or Continuous Non-Ignorable Dropout, Joseph W. Hogan, Xihong Lin, Benjamin A. Herman
The University of Michigan Department of Biostatistics Working Paper Series
The analysis of longitudinal repeated measures data is frequently complicated by missing data due to informative dropout. We describe a mixture model for joint distribution for longitudinal repeated measures, where the dropout distribution may be continuous and the dependence between response and dropout is semiparametric. Specifically, we assume that responses follow a varying coefficient random effects model conditional on dropout time, where the regression coefficients depend on dropout time through unspecified nonparametric functions that are estimated using step functions when dropout time is discrete (e.g., for panel data) and using smoothing splines when dropout time is continuous. Inference under the …
Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan
Semiparametric Regression Models With Missing Data: The Mathematics In The Work Of Robins Et Al., Menggang Yu, Bin Nan
The University of Michigan Department of Biostatistics Working Paper Series
This review is an attempt to understand the landmark papers of Robins, Rotnitzky, and Zhao (1994) and Robins and Rotnitzky (1992). We revisit their main results and corresponding proofs using the theory outlined in the monograph by Bickel, Klaassen, Ritov, and Wellner (1993). We also discuss an illustrative example to show the details of applying these theoretical results.
Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little
Penalized Spline Nonparametric Mixed Models For Inference About A Finite Population Mean From Two-Stage Samples, Hui Zheng, Rod Little
The University of Michigan Department of Biostatistics Working Paper Series
Samplers often distrust model-based approaches to survey inference due to concerns about model misspecification when applied to large samples from complex populations. We suggest that the model-based paradigm can work very successfully in survey settings, provided models are chosen that take into account the sample design and avoid strong parametric assumptions. The Horvitz-Thompson (HT) estimator is a simple design-unbiased estimator of the finite population total in probability sampling designs. From a modeling perspective, the HT estimator performs well when the ratios of the outcome values and the inclusion probabilities are exchangeable. When this assumption is not met, the HT estimator …