Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Bernstein's Inequality; Chi Square Distribution; Confidence Intervals; Gamma Distribution; Negative Binomial Distribution; Serial Analysis of Gene Expression (SAGE). (1)
- Bernstein's inequality; central limit theorem; confidence interval; influence curve; normal distribution; survey sampling (1)
- Causal effect (1)
- Clustering; distance matrix; gene expression; IPCW estimators; survival (1)
- Randomized trial (1)
Articles 1 - 4 of 4
Full-Text Articles in Statistical Models
Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans
Confidence Intervals For Negative Binomial Random Variables Of High Dispersion, David Shilane, Alan E. Hubbard, S N. Evans
U.C. Berkeley Division of Biostatistics Working Paper Series
This paper considers the problem of constructing confidence intervals for the mean of a Negative Binomial random variable based upon sampled data. When the sample size is large, we traditionally rely upon a Normal distribution approximation to construct these intervals. However, we demonstrate that the sample mean of highly dispersed Negative Binomials exhibits a slow convergence to the Normal in distribution as a function of the sample size. As a result, standard techniques (such as the Normal approximation and bootstrap) that construct confidence intervals for the mean will typically be too narrow and significantly undercover in the case of high …
Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan
Supervised Distance Matrices: Theory And Applications To Genomics, Katherine S. Pollard, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
We propose a new approach to studying the relationship between a very high dimensional random variable and an outcome. Our method is based on a novel concept, the supervised distance matrix, which quantifies pairwise similarity between variables based on their association with the outcome. A supervised distance matrix is derived in two stages. The first stage involves a transformation based on a particular model for association. In particular, one might regress the outcome on each variable and then use the residuals or the influence curve from each regression as a data transformation. In the second stage, a choice of distance …
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
Confidence Intervals For The Population Mean Tailored To Small Sample Sizes, With Applications To Survey Sampling, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
The validity of standard confidence intervals constructed in survey sampling is based on the central limit theorem. For small sample sizes, the central limit theorem may give a poor approximation, resulting in confidence intervals that are misleading. We discuss this issue and propose methods for constructing confidence intervals for the population mean tailored to small sample sizes.
We present a simple approach for constructing confidence intervals for the population mean based on tail bounds for the sample mean that are correct for all sample sizes. Bernstein's inequality provides one such tail bound. The resulting confidence intervals have guaranteed coverage probability …
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
Using Regression Models To Analyze Randomized Trials: Asymptotically Valid Hypothesis Tests Despite Incorrectly Specified Models, Michael Rosenblum, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Regression models are often used to test for cause-effect relationships from data collected in randomized trials or experiments. This practice has deservedly come under heavy scrutiny, since commonly used models such as linear and logistic regression will often not capture the actual relationships between variables, and incorrectly specified models potentially lead to incorrect conclusions. In this paper, we focus on hypothesis test of whether the treatment given in a randomized trial has any effect on the mean of the primary outcome, within strata of baseline variables such as age, sex, and health status. Our primary concern is ensuring that such …