Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Statistics and Probability

A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin Dec 2008

A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

The attributable risk, often called the population attributable risk, is in many epidemiological contexts a more relevant measure of exposure-disease association than the excess risk, relative risk, or odds ratio. When estimating attributable risk with case-control data and a rare disease, we present a simple correction to the standard approach making it essentially unbiased, and also less noisy. As with analogous corrections given in Jewell (1986) for other measures of association, the adjustment often won't make a substantial difference unless the sample size is very small or point estimates are desired within fine strata, but we discuss the possible utility …


Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans Aug 2008

Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high …


Fdr Controlling Procedure For Multi-Stage Analyses, Catherine Tuglus, Mark J. Van Der Laan Jul 2008

Fdr Controlling Procedure For Multi-Stage Analyses, Catherine Tuglus, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple testing has become an integral component in genomic analyses involving microarray experiments where large number of hypotheses are tested simultaneously. However before applying more computationally intensive methods, it is often desirable to complete an initial truncation of the variable set using a simpler and faster supervised method such as univariate regression. Once such a truncation is completed, multiple testing methods applied to any subsequent analysis no longer control the appropriate Type I error rates. Here we propose a modified marginal Benjamini \& Hochberg step-up FDR controlling procedure for multi-stage analyses (FDR-MSA), which correctly controls Type I error in terms …


Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan Jun 2008

Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance …


Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan Jun 2008

Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.

We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …


Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan May 2008

Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The ecological inference problem is a famous longstanding puzzle that arises in many disciplines. The usual formulation in epidemiology is that we would like to quantify an exposure-disease association by obtaining disease rates among the exposed and unexposed, but only have access to exposure rates and disease rates for several regions. The problem is generally intractable, but can be attacked under the assumptions of King's (1997) extended technique if we can correctly specify a model for a certain conditional distribution. We introduce a procedure that it is a valid approach if either this original model is correct or if we …


The Construction And Analysis Of Adaptive Group Sequential Designs, Mark J. Van Der Laan Mar 2008

The Construction And Analysis Of Adaptive Group Sequential Designs, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In order to answer scientific questions of interest one often carries out an ordered sequence of experiments generating the appropriate data over time. The design of each experiment involves making various decisions such as 1) What variables to measure on the randomly sampled experimental unit?, 2) How regularly to monitor the unit, and for how long?, 3) How to randomly assign a treatment or drug-dose to the unit?, among others. That is, the design of each experiment involves selecting a so called treatment mechanism/monitoring mechanism/ missingness/censoring mechanism, where these mechanisms represent a formally defined conditional distribution of one of these …


Covariate Adjustment For The Intention-To-Treat Parameter With Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan Feb 2008

Covariate Adjustment For The Intention-To-Treat Parameter With Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In randomized experiments, the intention-to-treat parameter is defined as the difference in expected outcomes between groups assigned to treatment and control arms. There is a large literature focusing on how (possibly misspecified) working models can sometimes exploit baseline covariate measurements to gain precision, although covariate adjustment is not strictly necessary. In Rubin and van der Laan (2008), we proposed the technique of empirical efficiency maximization for improving estimation by forming nonstandard fits of such working models. Considering a more realistic randomization scheme than in our original article, we suggest a new class of working models for utilizing covariate information, show …


Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan Jan 2008

Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …