Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

2006

U.C. Berkeley Division of Biostatistics Working Paper Series

Discipline
Keyword

Articles 1 - 12 of 12

Full-Text Articles in Statistics and Probability

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin Oct 2006

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and …


Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan Sep 2006

Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Inverse probability of treatment weighting (IPTW) is frequently used to estimate the causal effects of treatments and interventions. The consistency of the IPTW estimator relies not only on the well-recognized assumption of no unmeasured confounders (Sequential Randomization Assumption or SRA), but also on the assumption of experimentation in the assignment of treatment (Experimental Treatment Assignment or ETA). In finite samples, violations in the ETA assumption can occur due simply to chance; certain treatments become rare or non-existent for certain strata of the population. Such practical violations of the ETA assumption occur frequently in real data, and can result in significant …


Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan Sep 2006

Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSMs) allow one to form causal inferences from data, by specifying a relationship between a treatment and the marginal distribution of a corresponding counterfactual outcome. Following their introduction in Robins (1997), MSMs have typically been fit after assuming a semiparametric model, and then estimating a finite dimensional parameter. van der Laan and Dudoit (2003) proposed to instead view MSM fitting not as a task of semiparametric parameter estimation, but of nonparametric function approximation. They introduced a class of causal effect estimators based on mapping loss functions suitable for the unavailable counterfactual data to those suitable for the …


Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen Sep 2006

Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider a longitudinal observational or controlled study in which one collects chronological data over time on n randomly sampled subjects. The time-dependent process one observes on each randomly sampled subject contains time-dependent covariates, time-dependent treatment actions, and an outcome process or single final outcome of interest. A statically optimal individualized treatment rule (as introduced in van der Laan, Petersen & Joffe (2005), Petersen & van der Laan (2006)) is a (unknown) treatment rule which at any point in time conditions on a user-supplied subset of the past, computes the future static treatment regimen that maximizes a (conditional) mean future outcome …


Doubly Robust Censoring Unbiased Transformations, Daniel Rubin, Mark J. Van Der Laan Jun 2006

Doubly Robust Censoring Unbiased Transformations, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider random design nonparametric regression when the response variable is subject to right censoring. Following the work of Fan and Gijbels (1994), a common approach to this problem is to apply what has been termed a censoring unbiased transformation to the data to obtain surrogate responses, and then enter these surrogate responses with covariate data into standard smoothing algorithms. Existing censoring unbiased transformations generally depend on either the conditional survival function of the response of interest, or that of the censoring variable. We show that a mapping introduced in another statistical context is in fact a censoring unbiased transformation …


A Method To Increase The Power Of Multiple Testing Procedures Through Sample Splitting, Daniel Rubin, Sandrine Dudoit, Mark J. Van Der Laan Jun 2006

A Method To Increase The Power Of Multiple Testing Procedures Through Sample Splitting, Daniel Rubin, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the ``most powerful'' method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used …


Individualized Treatment Rules: Generating Candidate Clinical Trials, Maya L. Petersen, Steven G. Deeks, Mark J. Van Der Laan May 2006

Individualized Treatment Rules: Generating Candidate Clinical Trials, Maya L. Petersen, Steven G. Deeks, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Statistical methods have rarely been applied to learn individualized treatment rules, or rules for altering treatments over time in response to changes in individual covariates. Termed dynamic treatment regimes in the statistical literature, such individualized treatment rules are of primary importance in the practice of clinical medicine. History-Adjusted Marginal Structural Models (HA-MSM) estimate individualized treatment rules that assign, at each time point, the first action of the future static treatment plan that optimizes expected outcome given a patient's covariates. However, as we discuss here, the optimality of these rules can depend on the way in which treatment was assigned in …


Super Learning: An Application To Prediction Of Hiv-1 Drug Susceptibility, Sandra E. Sinisi, Maya L. Petersen, Mark J. Van Der Laan Apr 2006

Super Learning: An Application To Prediction Of Hiv-1 Drug Susceptibility, Sandra E. Sinisi, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Many statistical methods exist that can be used to learn a predictor based on observed data. Examples include decision trees, neural networks, support vector regression, least angle regression, Logic Regression, and the Deletion/Substitution/Addition algorithm. The optimal algorithm for prediction will vary depending on the underlying data-generating distribution. In this article, we introduce a "super learner," a prediction algorithm that applies any set of candidate learners and uses cross-validation to select among them. Theory shows that asymptotically the super learner performs essentially as well or better than any of the candidate learners. We briefly present the theory behind the super learner, …


Empirical Bayes Approach To Controlling Familywise Error: An Application To Hiv Resistance Data, Rhoderick N. Machekano, Alan E. Hubbard Apr 2006

Empirical Bayes Approach To Controlling Familywise Error: An Application To Hiv Resistance Data, Rhoderick N. Machekano, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Statistical challenges arise in identifying meaningful patterns and structures from high dimensional genomic data sets. Relating HIV genotype (sequence of amino acids) to phenotypic resistance presents a typical problem. When the HIV virus is under antiretroviral drug pressure, unfavorable mutations of the target genes often lead to greatly increased resistance of the virus to drugs, including drugs the virus has not been exposed to. Identification of mutation combinations and their correlation to drug resistance is critical in guiding efficient prescription of HIV drugs. The identification of a subset of codons associated with drug resistance from a set of several hundreds …


Causal Effect Models For Intention To Treat And Realistic Individualized Treatment Rules, Mark J. Van Der Laan Mar 2006

Causal Effect Models For Intention To Treat And Realistic Individualized Treatment Rules, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An important class of models in causal inference are the so-called marginal structural models which model the comparison between counterfactual outcome distributions corresponding with a static treatment intervention, conditional on user supplied baseline covariates, based on observing a longitudinal data structure on a sample of n independent and identically distributed experimental units. Identification of a static treatment regimen specific outcome distribution based on observational data requires beyond the so-called sequential randomization assumption that each experimental unit has positive probability of following the static treatment regimen. The latter assumption is called the experimental treatment assignment assumption (ETA) (which is parameter specific). …


A General Framework For Statistical Performance Comparison Of Evolutionary Computation Algorithms, David Shilane, Jarno Martikainen, Sandrine Dudoit, Seppo Ovaska Mar 2006

A General Framework For Statistical Performance Comparison Of Evolutionary Computation Algorithms, David Shilane, Jarno Martikainen, Sandrine Dudoit, Seppo Ovaska

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper proposes a statistical methodology for comparing the performance of evolutionary computation algorithms. A two-fold sampling scheme for collecting performance data is introduced, and these data are analyzed using bootstrap-based multiple hypothesis testing procedures. The proposed method is sufficiently flexible to allow the researcher to choose how performance is measured, does not rely upon distributional assumptions, and can be extended to analyze many other randomized numeric optimization routines. As a result, this approach offers a convenient, flexible, and reliable technique for comparing algorithms in a wide variety of applications.


Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan Mar 2006

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …