Physical Sciences and Mathematics | Open Access Articles

Challenges In Estimating The Causal Effect Of An Intervention With Pre-Post Data (Part 1): Definition & Identification Of The Causal Parameter, Ann M. Weber, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

There is mixed evidence of the effectiveness of interventions operating on a large scale. Although the lack of consistent results is generally attributed to problems of implementation or governance of the program, the failure to find a statistically significant effect (or the success of finding one) may be due to choices made in the evaluation. To demonstrate the potential limitations and pitfalls of the usual analytic methods used for estimating causal effects, we apply the first half of a roadmap for causal inference to a pre-post evaluation of a community-level, national nutrition program. Selection into the program was non-random and …

Go to article

Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper we present prediction and variable importance (VIM) methods for longitudinal data sets containing both continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can thus provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters can be causally interpreted …

Go to article

Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, …

Go to article

Adapting Data Adaptive Methods For Small, But High Dimensional Omic Data: Applications To Gwas/Ewas And More, Sara Kherad Pajouh, Alan E. Hubbard, Martyn T. Smith

U.C. Berkeley Division of Biostatistics Working Paper Series

Exploratory analysis of high dimensional "omics" data has received much attention since the explosion of high-throughput technology allows simultaneous screening of tens of thousands of characteristics (genomics, metabolomics, proteomics, adducts, etc., etc.). Part of this trend has been an increase in the dimension of exposure data in studies of environmental exposure and associated biomarkers. Though some of the general approaches, such as GWAS, are transferable, what has received less focus is 1) how to derive estimation of independent associations in the context of many competing causes, without resorting to a misspecified model, and 2) how to derive accurate small-sample inference …

Go to article

Testing The Relative Performance Of Data Adaptive Prediction Algorithms: A Generalized Test Of Conditional Risk Differences, Benjamin A. Goldstein, Eric Polley, Farren Briggs, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In statistical medicine comparing the predictability or ﬁt of two models can help to determine whether a set of prognostic variables contains additional information about medical outcomes, or whether one of two different model ﬁts (perhaps based on different algorithms, or different set of variables) should be preferred for clinical use. Clinical medicine has tended to rely on comparisons of clinical metrics like C-statistics and more recently reclassiﬁcation. Such metrics rely on the outcome being categorical and utilize a speciﬁc and often obscure loss function. In classical statistics one can use likelihood ratio tests and information based criterion if the …

Go to article

Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …

Go to article

Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny

U.C. Berkeley Division of Biostatistics Working Paper Series

Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be …

Go to article

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This paper describes a targeted maximum likelihood estimator (TMLE) for the parameters of longitudinal static and dynamic marginal structural models. We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome as a function of the intervention, time point, and possibly a subset of baseline covariates. Because …

Go to article

Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with …

Go to article

Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …

Go to article

An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes, Ekaterina Eliseeva, Alan E. Hubbard, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Analyses of epidemiological studies of the association between short-term changes in air pollution and health outcomes have not sufficiently discussed the degree to which the statistical models chosen for these analyses reflect what is actually known about the true data-generating distribution. We present a method to estimate population-level ambient air pollution (NO2) exposure-health (wheeze in children with asthma) response functions that is not dependent on assumptions about the data-generating function that underlies the observed data and which focuses on a specific scientific parameter of interest (the marginal adjusted association of exposure on probability of wheeze, over a grid of possible …

Go to article

Vertically Shifted Mixture Models For Clustering Longitudinal Data By Shape, Brianna C. Heggeseth, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Longitudinal studies play a prominent role in health, social and behavioral sciences as well as in the biological sciences, economics, and marketing. By following subjects over time, temporal changes in an outcome of interest can be directly observed and studied. An important question concerns the existence of distinct trajectory patterns. One way to determine these distinct patterns is through cluster analysis, which seeks to separate objects (subjects, patients, observational units) into homogeneous groups. Many methods have been adapted for longitudinal data, but almost all of them fail to explicitly group trajectories according to distinct pattern shapes. To fulfill the need …

Go to article

Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page

U.C. Berkeley Division of Biostatistics Working Paper Series

In most experimental and observational studies, participants are not followed in continuous time. Instead, data is collected about participants only at certain monitoring times. These monitoring times are random, and often participant specific. As a result, outcomes are only known up to random time intervals, resulting in interval-censored data. In contrast, when estimating variable importance measures on interval-censored outcomes, practitioners often ignore the presence of interval-censoring, and instead treat the data as continuous or right-censored, applying ad-hoc approaches to mask the true interval-censoring. In this paper, we describe Targeted Minimum Loss-based Estimation methods tailored for estimation of variable importance measures …

Go to article

Targeted Data Adaptive Estimation Of The Causal Dose Response Curve, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Challenges In Estimating The Causal Effect Of An Intervention With Pre-Post Data (Part 1): Definition & Identification Of The Causal Parameter, Ann M. Weber, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen

U.C. Berkeley Division of Biostatistics Working Paper Series

Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Adapting Data Adaptive Methods For Small, But High Dimensional Omic Data: Applications To Gwas/Ewas And More, Sara Kherad Pajouh, Alan E. Hubbard, Martyn T. Smith

U.C. Berkeley Division of Biostatistics Working Paper Series

Testing The Relative Performance Of Data Adaptive Prediction Algorithms: A Generalized Test Of Conditional Risk Differences, Benjamin A. Goldstein, Eric Polley, Farren Briggs, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh

U.C. Berkeley Division of Biostatistics Working Paper Series

Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny

U.C. Berkeley Division of Biostatistics Working Paper Series

Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes, Ekaterina Eliseeva, Alan E. Hubbard, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Vertically Shifted Mixture Models For Clustering Longitudinal Data By Shape, Brianna C. Heggeseth, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page

U.C. Berkeley Division of Biostatistics Working Paper Series

Targeted Data Adaptive Estimation Of The Causal Dose Response Curve, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series