Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Asymptotic linearity (2)
- Causal effect (2)
- Causal inference (2)
- Clustering (2)
- Cross-validation (2)
-
- Loss-function (2)
- TMLE (2)
- Variable importance (2)
- Air pollution (1)
- Applied Statistics (1)
- Area under the curve (1)
- Average treatment effect (1)
- Balancing score (1)
- Big Data (1)
- CV-TMLE (1)
- Censoring (1)
- Change score (1)
- Community level intervention (1)
- Confounding (1)
- Cross validated targeted minimum loss based estimator (1)
- Cross-Validation (1)
- Data mining (1)
- Difference-in-differences (1)
- Dynamic regime (1)
- Dynamic treatment (1)
- Empirical process (1)
- Ensemble Methods (1)
- Exposure-response (1)
- Gcomputation (1)
- Influence curve (1)
Articles 1 - 14 of 14
Full-Text Articles in Physical Sciences and Mathematics
Challenges In Estimating The Causal Effect Of An Intervention With Pre-Post Data (Part 1): Definition & Identification Of The Causal Parameter, Ann M. Weber, Mark J. Van Der Laan, Maya L. Petersen
Challenges In Estimating The Causal Effect Of An Intervention With Pre-Post Data (Part 1): Definition & Identification Of The Causal Parameter, Ann M. Weber, Mark J. Van Der Laan, Maya L. Petersen
U.C. Berkeley Division of Biostatistics Working Paper Series
There is mixed evidence of the effectiveness of interventions operating on a large scale. Although the lack of consistent results is generally attributed to problems of implementation or governance of the program, the failure to find a statistically significant effect (or the success of finding one) may be due to choices made in the evaluation. To demonstrate the potential limitations and pitfalls of the usual analytic methods used for estimating causal effects, we apply the first half of a roadmap for causal inference to a pre-post evaluation of a community-level, national nutrition program. Selection into the program was non-random and …
Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen
Variable Importance And Prediction Methods For Longitudinal Problems With Missing Variables, Ivan Diaz, Alan E. Hubbard, Anna Decker, Mitchell Cohen
U.C. Berkeley Division of Biostatistics Working Paper Series
In this paper we present prediction and variable importance (VIM) methods for longitudinal data sets containing both continuous and binary exposures subject to missingness. We demonstrate the use of these methods for prognosis of medical outcomes of severe trauma patients, a field in which current medical practice involves rules of thumb and scoring methods that only use a few variables and ignore the dynamic and high-dimensional nature of trauma recovery. Well-principled prediction and VIM methods can thus provide a tool to make care decisions informed by the high-dimensional patient’s physiological and clinical history. Our VIM parameters can be causally interpreted …
Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan
Targeted Learning Of An Optimal Dynamic Treatment, And Statistical Inference For Its Mean Outcome, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Suppose we observe n independent and identically distributed observations of a time-dependent random variable consisting of baseline covariates, initial treatment and censoring indicator, intermediate covariates, subsequent treatment and censoring indicator, and a final outcome. For example, this could be data generated by a sequentially randomized controlled trial, where subjects are sequentially randomized to a first line and second line treatment, possibly assigned in response to an intermediate biomarker, and are subject to right-censoring. In this article we consider estimation of an optimal dynamic multiple time-point treatment rule defined as the rule that maximizes the mean outcome under the dynamic treatment, …
Adapting Data Adaptive Methods For Small, But High Dimensional Omic Data: Applications To Gwas/Ewas And More, Sara Kherad Pajouh, Alan E. Hubbard, Martyn T. Smith
Adapting Data Adaptive Methods For Small, But High Dimensional Omic Data: Applications To Gwas/Ewas And More, Sara Kherad Pajouh, Alan E. Hubbard, Martyn T. Smith
U.C. Berkeley Division of Biostatistics Working Paper Series
Exploratory analysis of high dimensional "omics" data has received much attention since the explosion of high-throughput technology allows simultaneous screening of tens of thousands of characteristics (genomics, metabolomics, proteomics, adducts, etc., etc.). Part of this trend has been an increase in the dimension of exposure data in studies of environmental exposure and associated biomarkers. Though some of the general approaches, such as GWAS, are transferable, what has received less focus is 1) how to derive estimation of independent associations in the context of many competing causes, without resorting to a misspecified model, and 2) how to derive accurate small-sample inference …
Testing The Relative Performance Of Data Adaptive Prediction Algorithms: A Generalized Test Of Conditional Risk Differences, Benjamin A. Goldstein, Eric Polley, Farren Briggs, Mark J. Van Der Laan
Testing The Relative Performance Of Data Adaptive Prediction Algorithms: A Generalized Test Of Conditional Risk Differences, Benjamin A. Goldstein, Eric Polley, Farren Briggs, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
In statistical medicine comparing the predictability or fit of two models can help to determine whether a set of prognostic variables contains additional information about medical outcomes, or whether one of two different model fits (perhaps based on different algorithms, or different set of variables) should be preferred for clinical use. Clinical medicine has tended to rely on comparisons of clinical metrics like C-statistics and more recently reclassification. Such metrics rely on the outcome being categorical and utilize a specific and often obscure loss function. In classical statistics one can use likelihood ratio tests and information based criterion if the …
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
Statistical Inference For Data Adaptive Target Parameters, Mark J. Van Der Laan, Alan E. Hubbard, Sara Kherad Pajouh
U.C. Berkeley Division of Biostatistics Working Paper Series
Consider one observes n i.i.d. copies of a random variable with a probability distribution that is known to be an element of a particular statistical model. In order to define our statistical target we partition the sample in V equal size sub-samples, and use this partitioning to define V splits in estimation-sample (one of the V subsamples) and corresponding complementary parameter-generating sample that is used to generate a target parameter. For each of the V parameter-generating samples, we apply an algorithm that maps the sample in a target parameter mapping which represent the statistical target parameter generated by that parameter-generating …
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny
Subsemble: An Ensemble Method For Combining Subset-Specific Algorithm Fits, Stephanie Sapp, Mark J. Van Der Laan, John Canny
U.C. Berkeley Division of Biostatistics Working Paper Series
Ensemble methods using the same underlying algorithm trained on different subsets of observations have recently received increased attention as practical prediction tools for massive datasets. We propose Subsemble: a general subset ensemble prediction method, which can be used for small, moderate, or large datasets. Subsemble partitions the full dataset into subsets of observations, fits a specified underlying algorithm on each subset, and uses a clever form of V-fold cross-validation to output a prediction function that combines the subset-specific fits. We give an oracle result that provides a theoretical performance guarantee for Subsemble. Through simulations, we demonstrate that Subsemble can be …
Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. Van Der Laan
Targeted Maximum Likelihood Estimation For Dynamic And Static Longitudinal Marginal Structural Working Models, Maya L. Petersen, Joshua Schwab, Susan Gruber, Nello Blaser, Michael Schomaker, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
This paper describes a targeted maximum likelihood estimator (TMLE) for the parameters of longitudinal static and dynamic marginal structural models. We consider a longitudinal data structure consisting of baseline covariates, time-dependent intervention nodes, intermediate time-dependent covariates, and a possibly time dependent outcome. The intervention nodes at each time point can include a binary treatment as well as a right-censoring indicator. Given a class of dynamic or static interventions, a marginal structural model is used to model the mean of the intervention specific counterfactual outcome as a function of the intervention, time point, and possibly a subset of baseline covariates. Because …
Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. Van Der Laan
Balancing Score Adjusted Targeted Minimum Loss-Based Estimation, Samuel D. Lendle, Bruce Fireman, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Adjusting for a balancing score is sufficient for bias reduction when estimating causal effects including the average treatment effect and effect among the treated. Estimators that adjust for the propensity score in a nonparametric way, such as matching on an estimate of the propensity score, can be consistent when the estimated propensity score is not consistent for the true propensity score but converges to some other balancing score. We call this property the balancing score property, and discuss a class of estimators that have this property. We introduce a targeted minimum loss-based estimator (TMLE) for a treatment specific mean with …
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …
An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes, Ekaterina Eliseeva, Alan E. Hubbard, Ira B. Tager
An Application Of Machine Learning Methods To The Derivation Of Exposure-Response Curves For Respiratory Outcomes, Ekaterina Eliseeva, Alan E. Hubbard, Ira B. Tager
U.C. Berkeley Division of Biostatistics Working Paper Series
Analyses of epidemiological studies of the association between short-term changes in air pollution and health outcomes have not sufficiently discussed the degree to which the statistical models chosen for these analyses reflect what is actually known about the true data-generating distribution. We present a method to estimate population-level ambient air pollution (NO2) exposure-health (wheeze in children with asthma) response functions that is not dependent on assumptions about the data-generating function that underlies the observed data and which focuses on a specific scientific parameter of interest (the marginal adjusted association of exposure on probability of wheeze, over a grid of possible …
Vertically Shifted Mixture Models For Clustering Longitudinal Data By Shape, Brianna C. Heggeseth, Nicholas P. Jewell
Vertically Shifted Mixture Models For Clustering Longitudinal Data By Shape, Brianna C. Heggeseth, Nicholas P. Jewell
U.C. Berkeley Division of Biostatistics Working Paper Series
Longitudinal studies play a prominent role in health, social and behavioral sciences as well as in the biological sciences, economics, and marketing. By following subjects over time, temporal changes in an outcome of interest can be directly observed and studied. An important question concerns the existence of distinct trajectory patterns. One way to determine these distinct patterns is through cluster analysis, which seeks to separate objects (subjects, patients, observational units) into homogeneous groups. Many methods have been adapted for longitudinal data, but almost all of them fail to explicitly group trajectories according to distinct pattern shapes. To fulfill the need …
Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page
Targeted Estimation Of Variable Importance Measures With Interval-Censored Outcomes, Stephanie Sapp, Mark J. Van Der Laan, Kimberly Page
U.C. Berkeley Division of Biostatistics Working Paper Series
In most experimental and observational studies, participants are not followed in continuous time. Instead, data is collected about participants only at certain monitoring times. These monitoring times are random, and often participant specific. As a result, outcomes are only known up to random time intervals, resulting in interval-censored data. In contrast, when estimating variable importance measures on interval-censored outcomes, practitioners often ignore the presence of interval-censoring, and instead treat the data as continuous or right-censored, applying ad-hoc approaches to mask the true interval-censoring. In this paper, we describe Targeted Minimum Loss-based Estimation methods tailored for estimation of variable importance measures …
Targeted Data Adaptive Estimation Of The Causal Dose Response Curve, Iván Díaz, Mark J. Van Der Laan
Targeted Data Adaptive Estimation Of The Causal Dose Response Curve, Iván Díaz, Mark J. Van Der Laan
U.C. Berkeley Division of Biostatistics Working Paper Series
Estimation of the causal dose-response curve is an old problem in statistics. In a non parametric model, if the treatment is continuous, the dose-response curve is not a pathwise differentiable parameter, and no root-n-consistent estimator is available. However, the risk of a candidate algorithm for estimation of the dose response curve is a pathwise differentiable parameter, whose consistent and efficient estimation is possible. In this work, we review the cross validated augmented inverse probability of treatment weighted estimator (CV A-IPTW) of the risk, and present a cross validated targeted minimum loss based estimator (CV-TMLE) counterpart. These estimators are proven consistent …