Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

U.C. Berkeley Division of Biostatistics Working Paper Series

Discipline
Keyword
Publication Year

Articles 1 - 30 of 243

Full-Text Articles in Physical Sciences and Mathematics

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen Feb 2017

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral …


Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan Oct 2016

Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to …


Doubly-Robust Nonparametric Inference On The Average Treatment Effect, David Benkeser, Marco Carone, Mark J. Van Der Laan, Peter Gilbert Oct 2016

Doubly-Robust Nonparametric Inference On The Average Treatment Effect, David Benkeser, Marco Carone, Mark J. Van Der Laan, Peter Gilbert

U.C. Berkeley Division of Biostatistics Working Paper Series

Doubly-robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double-robustness does not readily extend to inference. We present a general theoretical study of the behavior of doubly-robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different approaches for constructing such estimators and investigate the extent to which they may be modified to also allow doubly-robust …


Performance-Constrained Binary Classification Using Ensemble Learning: An Application To Cost-Efficient Targeted Prep Strategies, Wenjing Zheng, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Oct 2016

Performance-Constrained Binary Classification Using Ensemble Learning: An Application To Cost-Efficient Targeted Prep Strategies, Wenjing Zheng, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Binary classifications problems are ubiquitous in health and social science applications. In many cases, one wishes to balance two conflicting criteria for an optimal binary classifier. For instance, in resource-limited settings, an HIV prevention program based on offering Pre-Exposure Prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program to deliver. In this article, we consider a general class of performance-constrained binary classification problems wherein the objective function and the …


Practical Targeted Learning From Large Data Sets By Survey Sampling, Patrice Bertail, Antoine Chambaz, Emilien Joly Jul 2016

Practical Targeted Learning From Large Data Sets By Survey Sampling, Patrice Bertail, Antoine Chambaz, Emilien Joly

U.C. Berkeley Division of Biostatistics Working Paper Series

We address the practical construction of asymptotic confidence intervals for smooth (i.e., pathwise differentiable), real-valued statistical
parameters by targeted learning from independent and identically
distributed data in contexts where sample size is so large that it poses
computational challenges. We observe some summary measure of all data and select a sub-sample from the complete data set by Poisson rejective sampling with unequal inclusion probabilities based on the summary measures. Targeted learning is carried out from the easier to handle sub-sample. We derive a central limit theorem for the targeted minimum loss estimator (TMLE) which enables the construction of …


Scalable Collaborative Targeted Learning For High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Antoine Chambaz, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan Jun 2016

Scalable Collaborative Targeted Learning For High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Antoine Chambaz, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the C-TMLE procedure.

The original C-TMLE procedure can be presented as a greedy forward stepwise algorithm. It does not scale well when the number $p$ …


Propensity Score Prediction For Electronic Healthcare Databases Using Super Learner And High-Dimensional Propensity Score Methods, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan Jun 2016

Propensity Score Prediction For Electronic Healthcare Databases Using Super Learner And High-Dimensional Propensity Score Methods, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different databases. While the SL has been shown to perform well in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated …


Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. Van Der Laan Jun 2016

Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider estimation of a causal effect of a possibly continuous treatment when treatment assignment is potentially subject to unmeasured confounding, but an instrumental variable is available. Our focus is on estimating heterogeneous treatment effects, so that the treatment effect can be a function of an arbitrary subset of the observed covariates. One setting where this framework is especially useful is with clinical outcomes. Allowing the causal dose-response curve to depend on a subset of the covariates, we define our parameter of interest to be the projection of the true dose-response curve onto a user-supplied working marginal structural model. We …


Data-Adaptive Inference Of The Optimal Treatment Rule And Its Mean Reward. The Masked Bandit, Antoine Chambaz, Wenjing Zheng, Mark J. Van Der Laan Apr 2016

Data-Adaptive Inference Of The Optimal Treatment Rule And Its Mean Reward. The Masked Bandit, Antoine Chambaz, Wenjing Zheng, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This article studies the data-adaptive inference of an optimal treatment rule. A treatment rule is an individualized treatment strategy in which treatment assignment for a patient is based on her measured baseline covariates. Eventually, a reward is measured on the patient. We also infer the mean reward under the optimal treatment rule. We do so in the so called non-exceptional case, i.e., assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption.

Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually …


One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan, Susan Gruber Mar 2016

One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan, Susan Gruber

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider a study in which one observes n independent and identically distributed random variables whose probability distribution is known to be an element of a particular statistical model, and one is concerned with estimation of a particular real valued pathwise differentiable target parameter of this data probability distribution. The targeted maximum likelihood estimator (TMLE) is an asymptotically efficient substitution estimator obtained by constructing a so called least favorable parametric submodel through an initial estimator with score, at zero fluctuation of the initial estimator, that spans the efficient influence curve, and iteratively maximizing the corresponding parametric likelihood till no more updates …


Marginal Structural Models With Counterfactual Effect Modifiers, Wenjing Zheng, Zhehui Luo, Mark J. Van Der Laan Mar 2016

Marginal Structural Models With Counterfactual Effect Modifiers, Wenjing Zheng, Zhehui Luo, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In health and social sciences, research questions often involve systematic assessment of the modification of treatment causal effect by patient characteristics, in longitudinal settings with time-varying or post-intervention effect modifiers of interest. In this work, we investigate the robust and efficient estimation of the so-called Counterfactual-History-Adjusted Marginal Structural Model (van der Laan and Petersen (2007)), which models the conditional intervention-specific mean outcome given modifier history in an ideal experiment where, possible contrary to fact, the subject was assigned the intervention of interest, including the treatment sequence in the conditioning history. We establish the semiparametric efficiency theory for these models, and …


Evaluating The Impact Of A Hiv Low-Risk Express Care Task-Shifting Program: A Case Study Of The Targeted Learning Roadmap, Linh Tran, Constantin T. Yiannoutsos, Beverly S. Musick, Kara K. Wools-Kaloustian, Abraham Siika, Sylvester Kimaiyo, Mark J. Van Der Laan, Maya L. Petersen Mar 2016

Evaluating The Impact Of A Hiv Low-Risk Express Care Task-Shifting Program: A Case Study Of The Targeted Learning Roadmap, Linh Tran, Constantin T. Yiannoutsos, Beverly S. Musick, Kara K. Wools-Kaloustian, Abraham Siika, Sylvester Kimaiyo, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

In conducting studies on an exposure of interest, a systematic roadmap should be applied for translating causal questions into statistical analyses and interpreting the results. In this paper we describe an application of one such roadmap applied to estimating the joint effect of both time to availability of a nurse-based triage system (low risk express care (LREC)) and individual enrollment in the program among HIV patients in East Africa. Our study population is comprised of 16;513 subjects found eligible for this task-shifting program within 15 clinics in Kenya between 2006 and 2009, with each clinic starting the LREC program between …


Semi-Parametric Estimation And Inference For The Mean Outcome Of The Single Time-Point Intervention In A Causally Connected Population, Oleg Sofrygin, Mark J. Van Der Laan Dec 2015

Semi-Parametric Estimation And Inference For The Mean Outcome Of The Single Time-Point Intervention In A Causally Connected Population, Oleg Sofrygin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We study the framework for semi-parametric estimation and statistical inference for the sample average treatment-specific mean effects in observational settings where data are collected on a single network of connected units (e.g., in the presence of interference or spillover). Despite recent advances, many of the current statistical methods rely on estimation techniques that assume a particular parametric model for the outcome, even though some of the most important statistical assumptions required by these models are most likely violated in the observational network settings, often resulting in invalid and anti-conservative statistical inference. In this manuscript, we rely on the recent methodological …


A Generally Efficient Targeted Minimum Loss Based Estimator, Mark J. Van Der Laan Dec 2015

A Generally Efficient Targeted Minimum Loss Based Estimator, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose we observe n independent and identically distributed observations of a finite dimensional bounded random variable. This article is concerned with the construction of an efficient targeted minimum loss-based estimator (TMLE) of a pathwise differentiable target parameter based on a realistic statistical model.

The canonical gradient of the target parameter at a particular data distribution will depend on the data distribution through an infinite dimensional nuisance parameter which can be defined as the minimizer of the expectation of a loss function (e.g., log-likelihood loss). For many models and target parameters the nuisance parameter can be split up in two components, …


Computerizing Efficient Estimation Of A Pathwise Differentiable Target Parameter, Mark J. Van Der Laan, Marco Carone, Alexander R. Luedtke Jul 2015

Computerizing Efficient Estimation Of A Pathwise Differentiable Target Parameter, Mark J. Van Der Laan, Marco Carone, Alexander R. Luedtke

U.C. Berkeley Division of Biostatistics Working Paper Series

Frangakis et al. (2015) proposed a numerical method for computing the efficient influence function of a parameter in a nonparametric model at a specified distribution and observation (provided such an influence function exists). Their approach is based on the assumption that the efficient influence function is given by the directional derivative of the target parameter mapping in the direction of a perturbation of the data distribution defined as the convex line from the data distribution to a pointmass at the observation. In our discussion paper Luedtke et al. (2015) we propose a regularization of this procedure and establish the validity …


Drawing Valid Targeted Inference When Covariate-Adjusted Response-Adaptive Rct Meets Data-Adaptive Loss-Based Estimation, With An Application To The Lasso, Wenjing Zheng, Antoine Chambaz, Mark J. Van Der Laan Jul 2015

Drawing Valid Targeted Inference When Covariate-Adjusted Response-Adaptive Rct Meets Data-Adaptive Loss-Based Estimation, With An Application To The Lasso, Wenjing Zheng, Antoine Chambaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Adaptive clinical trial design methods have garnered growing attention in the recent years, in large part due to their greater flexibility over their traditional counterparts. One such design is the so-called covariate-adjusted, response-adaptive (CARA) randomized controlled trial (RCT). In a CARA RCT, the treatment randomization schemes are allowed to depend on the patient’s pre-treatment covariates, and the investigators have the opportunity to adjust these schemes during the course of the trial based on accruing information (including previous responses), in order to meet a pre-specified optimality criterion, while preserving the validity of the trial in learning its primary study parameter.

In …


One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan Jun 2015

One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider a study in which one observes n independent and identically distributed random variables whose probability distribution is known to be an element of a particular statistical model, and one is concerned with estimation of a particular real valued pathwise differentiable target parameter of this data probability distribution. The canonical gradient of the pathwise derivative of the target parameter, also called the efficient influence curve, defines an asymptotically efficient estimator as an estimator that is asymptotically linear with influence curve equal to the efficient influence curve.The targeted maximum likelihood estimator is a two stage estimator obtained by constructing a so …


Second Order Inference For The Mean Of A Variable Missing At Random, Ivan Diaz, Marco Carone, Mark J. Van Der Laan May 2015

Second Order Inference For The Mean Of A Variable Missing At Random, Ivan Diaz, Marco Carone, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a second order estimator of the mean of a variable subject to missingness, under the missing at random assumption. The estimator improves upon existing methods by using an approximate second order expansion of the parameter functional, in addition to the first order expansion employed by standard doubly robust methods. This results in weaker assumptions about the convergence rates necessary to establish consistency, local efficiency, and asymptotic linearity. The general estimation strategy is developed under the targeted minimum loss based estimation (TMLE) framework. We present a simulation comparing the sensitivity of the first and second order estimators to the …


Adaptive Pre-Specification In Randomized Trials With And Without Pair-Matching, Laura B. Balzer, Mark J. Van Der Laan, Maya L. Petersen May 2015

Adaptive Pre-Specification In Randomized Trials With And Without Pair-Matching, Laura B. Balzer, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

In randomized trials, adjustment for measured covariates during the analysis can reduce variance and increase power. To avoid misleading inference, the analysis plan must be pre-specified. However, it is unclear a priori which baseline covariates (if any) should be included in the analysis. Consider, for example, the Sustainable East Africa Research in Community Health (SEARCH) trial for HIV prevention and treatment. There are 16 matched pairs of communities and many potential adjustment variables, including region, HIV prevalence, male circumcision coverage and measures of community-level viral load. In this paper, we propose a rigorous procedure to data-adaptively select the adjustment set …


Double Robust Estimation Of Encouragement-Design Intervention Effects Transported Across Sites, Kara E. Rudolph, Mark J. Van Der Laan May 2015

Double Robust Estimation Of Encouragement-Design Intervention Effects Transported Across Sites, Kara E. Rudolph, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We develop double robust targeted maximum likelihood estimators (TMLE) for transporting intervention effects from one population to another. Specifically, we develop TMLE estimators for three transported estimands: intent-to-treat average treatment effect (ATE) and complier ATE, which are relevant for encouragement-design interventions and instrumental variable analyses, and the ATE of the exposure on the outcome, which is applicable to any randomized or observational study. We demonstrate finite sample performance of these TMLE estimators using simulation, including in the presence of practical violations of the positivity assumption. We then apply these methods to the Moving to Opportunity trial, a multi-site, encouragement-design intervention …


Targeted Estimation And Inference For The Sample Average Treatment Effect, Laura B. Balzer, Maya L. Petersen, Mark J. Van Der Laan Mar 2015

Targeted Estimation And Inference For The Sample Average Treatment Effect, Laura B. Balzer, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

While the population average treatment effect has been the subject of extensive methods and applied research, less consideration has been given to the sample average treatment effect: the mean difference in the counterfactual outcomes for the study units. The sample parameter is easily interpretable and is arguably the most relevant when the study units are not representative of a greater population or when the exposure's impact is heterogeneous. Formally, the sample effect is not identifiable from the observed data distribution. Nonetheless, targeted maximum likelihood estimation (TMLE) can provide an asymptotically unbiased and efficient estimate of both the population and sample …


Optimal Dynamic Treatments In Resource-Limited Settings, Alexander R. Luedtke, Mark J. Van Der Laan Jan 2015

Optimal Dynamic Treatments In Resource-Limited Settings, Alexander R. Luedtke, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A dynamic treatment rule (DTR) is a treatment rule which assigns treatments to individuals based on (a subset of) their measured covariates. An optimal DTR is the DTR which maximizes the population mean outcome. Previous works in this area have assumed that treatment is an unlimited resource so that the entire population can be treated if this strategy maximizes the population mean outcome. We consider optimal DTRs in settings where the treatment resource is limited so that there is a maximum proportion of the population which can be treated. We give a general closed-form expression for an optimal stochastic DTR …


Statistical Inference For The Mean Outcome Under A Possibly Non-Unique Optimal Treatment Strategy, Alexander R. Luedtke, Mark J. Van Der Laan Dec 2014

Statistical Inference For The Mean Outcome Under A Possibly Non-Unique Optimal Treatment Strategy, Alexander R. Luedtke, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider challenges that arise in the estimation of the value of an optimal individualized treatment strategy defined as the treatment rule that maximizes the population mean outcome, where the candidate treatment rules are restricted to depend on baseline covariates. We prove a necessary and sufficient condition for the pathwise differentiability of the optimal value, a key condition needed to develop a regular asymptotically linear (RAL) estimator of this parameter. The stated condition is slightly more general than the previous condition implied in the literature. We then describe an approach to obtain root-n rate confidence intervals for the optimal value …


Higher-Order Targeted Minimum Loss-Based Estimation, Marco Carone, Iván Díaz, Mark J. Van Der Laan Dec 2014

Higher-Order Targeted Minimum Loss-Based Estimation, Marco Carone, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Common approaches to parametric statistical inference often encounter difficulties in the context of infinite-dimensional models. The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large …


Online Targeted Learning, Mark J. Van Der Laan, Samuel D. Lendle Sep 2014

Online Targeted Learning, Mark J. Van Der Laan, Samuel D. Lendle

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the case that the data comes in sequentially and can be viewed as sample of independent and identically distributed observations from a fixed data generating distribution. The goal is to estimate a particular path wise target parameter of this data generating distribution that is known to be an element of a particular semi-parametric statistical model. We want our estimator to be asymptotically efficient, but we also want that our estimator can be calculated by updating the current estimator based on the new block of data without having to revisit the past data, so that it is computationally much …


Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan, Alexander R. Luedtke Sep 2014

Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan, Alexander R. Luedtke

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, …


A Novel Targeted Learning Method For Quantitative Trait Loci Mapping, Hui Wang, Zhongyang Zhang, Sherri Rose, Mark J. Van Der Laan Jul 2014

A Novel Targeted Learning Method For Quantitative Trait Loci Mapping, Hui Wang, Zhongyang Zhang, Sherri Rose, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a novel semiparametric method for quantitative trait loci (QTL) mapping in experimental crosses. Conventional genetic mapping methods typically assume parametric models with Gaussian errors and obtain parameter estimates through maximum likelihood estimation. In contrast with univariate regression and interval mapping methods, our model requires fewer assumptions and also accommodates various machine learning algorithms. Estimation is performed with targeted maximum likelihood learning methods. We demonstrate our semiparametric targeted learning approach in a simulation study and a well-studied barley dataset.


Entering The Era Of Data Science: Targeted Learning And The Integration Of Statistics And Computational Data Analysis, Mark J. Van Der Laan, Richard J.C.M. Starmans Jul 2014

Entering The Era Of Data Science: Targeted Learning And The Integration Of Statistics And Computational Data Analysis, Mark J. Van Der Laan, Richard J.C.M. Starmans

U.C. Berkeley Division of Biostatistics Working Paper Series

This outlook article will appear in Advances in Statistics and it reviews the research of Dr. van der Laan's group on Targeted Learning, a subfield of statistics that is concerned with the construction of data adaptive estimators of user-supplied target parameters of the probability distribution of the data and corresponding confidence intervals, aiming to only rely on realistic statistical assumptions. Targeted Learning fully utilizes the state of the art in machine learning tools, while still preserving the important identity of statistics as a field that is concerned with both accurate estimation of the true target parameter value and assessment of …


Super-Learning Of An Optimal Dynamic Treatment Rule, Alexander R. Luedtke, Mark J. Van Der Laan Jul 2014

Super-Learning Of An Optimal Dynamic Treatment Rule, Alexander R. Luedtke, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the estimation of an optimal dynamic two time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to depend only on a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric, beyond possible knowledge about the treatment and censoring mechanisms. We propose data adaptive estimators of this optimal dynamic regime which are defined by sequential loss-based learning under both the blip function and weighted classification frameworks. Rather than \textit{a priori} selecting …


Targeted Learning Of The Mean Outcome Under An Optimal Dynamic Treatment Rule, Mark J. Van Der Laan, Alexander R. Luedtke Jul 2014

Targeted Learning Of The Mean Outcome Under An Optimal Dynamic Treatment Rule, Mark J. Van Der Laan, Alexander R. Luedtke

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider estimation of and inference for the mean outcome under the optimal dynamic two time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, where the candidate rules are restricted to depend only on a user-supplied subset of the baseline and intermediate covariates. This estimation problem is addressed in a statistical model for the data distribution that is nonparametric beyond possible knowledge about the treatment and censoring mechanism. This contrasts from the current literature that relies on parametric assumptions. We establish that the mean of the counterfactual outcome under the optimal dynamic treatment …