Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- ANCOVA (1)
- Adaptive design (1)
- Bernstein's Inequality; Chi Square Distribution; Confidence Intervals; Gamma Distribution; Negative Binomial Distribution; Serial Analysis of Gene Expression (SAGE). (1)
- Bernstein's inequality; central limit theorem; confidence interval; influence curve; normal distribution; survey sampling (1)
- Causal effect (1)
-
- Causal inference (1)
- Clinical trials (1)
- Clustering; distance matrix; gene expression; IPCW estimators; survival (1)
- Ecological inference; ecological regression; ecological fallacy; double robustness; missing data; marginal structural models (1)
- Empirical efficiency maximization (1)
- Group sequential designs (1)
- Intention-to-treat parameter (1)
- Inverse probability of censoring weighting (1)
- Locally efficient (1)
- Martingale central limit theorem (1)
- Maximum likelihood estimation (1)
- Multiple Testing; False Discovery Rate; Variable Importance (1)
- Randomized trial (1)
- Regression (1)
Articles 1 - 9 of 9
Full-Text Articles in Statistics and Probability
A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin
A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin
U.C. Berkeley Division of Biostatistics Working Paper Series
The attributable risk, often called the population attributable risk, is in many epidemiological contexts a more relevant measure of exposure-disease association than the excess risk, relative risk, or odds ratio. When estimating attributable risk with case-control data and a rare disease, we present a simple correction to the standard approach making it essentially unbiased, and also less noisy. As with analogous corrections given in Jewell (1986) for other measures of association, the adjustment often won't make a substantial difference unless the sample size is very small or point estimates are desired within fine strata, but we discuss the possible utility …
Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans
Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans
U.C. Berkeley Division of Biostatistics Working Paper Series
This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high …
Fdr Controlling Procedure For Multi-Stage Analyses, Catherine Tuglus, Mark J. Van Der Laan
Fdr Controlling Procedure For Multi-Stage Analyses, Catherine Tuglus, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Multiple testing has become an integral component in genomic analyses involving microarray experiments where large number of hypotheses are tested simultaneously. However before applying more computationally intensive methods, it is often desirable to complete an initial truncation of the variable set using a simpler and faster supervised method such as univariate regression. Once such a truncation is completed, multiple testing methods applied to any subsequent analysis no longer control the appropriate Type I error rates. Here we propose a modified marginal Benjamini \& Hochberg step-up FDR controlling procedure for multi-stage analyses (FDR-MSA), which correctly controls Type I error in terms …
Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan
Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance …
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.
We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …
Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan
Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
The ecological inference problem is a famous longstanding puzzle that arises in many disciplines. The usual formulation in epidemiology is that we would like to quantify an exposure-disease association by obtaining disease rates among the exposed and unexposed, but only have access to exposure rates and disease rates for several regions. The problem is generally intractable, but can be attacked under the assumptions of King's (1997) extended technique if we can correctly specify a model for a certain conditional distribution. We introduce a procedure that it is a valid approach if either this original model is correct or if we …
The Construction And Analysis Of Adaptive Group Sequential Designs, Mark J. Van Der Laan
The Construction And Analysis Of Adaptive Group Sequential Designs, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
In order to answer scientific questions of interest one often carries out an ordered sequence of experiments generating the appropriate data over time. The design of each experiment involves making various decisions such as 1) What variables to measure on the randomly sampled experimental unit?, 2) How regularly to monitor the unit, and for how long?, 3) How to randomly assign a treatment or drug-dose to the unit?, among others. That is, the design of each experiment involves selecting a so called treatment mechanism/monitoring mechanism/ missingness/censoring mechanism, where these mechanisms represent a formally defined conditional distribution of one of these …
Covariate Adjustment For The Intention-To-Treat Parameter With Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan
Covariate Adjustment For The Intention-To-Treat Parameter With Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
In randomized experiments, the intention-to-treat parameter is defined as the difference in expected outcomes between groups assigned to treatment and control arms. There is a large literature focusing on how (possibly misspecified) working models can sometimes exploit baseline covariate measurements to gain precision, although covariate adjustment is not strictly necessary. In Rubin and van der Laan (2008), we proposed the technique of empirical efficiency maximization for improving estimation by forming nonstandard fits of such working models. Considering a more realistic randomization scheme than in our original article, we suggest a new class of working models for utilizing covariate information, show …
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …