Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Physical Sciences and Mathematics

Optimal Spatial Prediction Using Ensemble Machine Learning, Molly M. Davies, Mark J. Van Der Laan Dec 2012

Optimal Spatial Prediction Using Ensemble Machine Learning, Molly M. Davies, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Spatial prediction is an important problem in many scientific disciplines. Super Learner is an ensemble prediction approach related to stacked generalization that uses cross-validation to search for the optimal predictor amongst all convex combinations of a heterogeneous candidate set. It has been applied to non-spatial data, where theoretical results demonstrate it will perform asymptotically at least as well as the best candidate under consideration. We review these optimality properties and discuss the assumptions required in order for them to hold for spatial prediction problems. We present results of a simulation study confirming Super Learner works well in practice under a …


Sensitivity Analysis For Causal Inference Under Unmeasured Confounding And Measurement Error Problems, Iván Díaz, Mark J. Van Der Laan Dec 2012

Sensitivity Analysis For Causal Inference Under Unmeasured Confounding And Measurement Error Problems, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper we present a sensitivity analysis for drawing inferences about parameters that are not estimable from observed data without additional assumptions. We present the methodology using two different examples: a causal parameter that is not identifiable due to violations of the randomization assumption, and a parameter that is not estimable in the nonparametric model due to measurement error. Existing methods for tackling these problems assume a parametric model for the type of violation to the identifiability assumption, and require the development of new estimators and inference for every new model. The method we present can be used in …


Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan Dec 2012

Computationally Efficient Confidence Intervals For Cross-Validated Area Under The Roc Curve Estimates, Erin Ledell, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In binary classification problems, the area under the ROC curve (AUC), is an effective means of measuring the performance of your model. Most often, cross-validation is also used, in order to assess how the results will generalize to an independent data set. In order to evaluate the quality of an estimate for cross-validated AUC, we must obtain an estimate for its variance. For massive data sets, the process of generating a single performance estimate can be computationally expensive. Additionally, when using a complex prediction method, calculating the cross-validated AUC on even a relatively small data set can still require a …


Statistical Inference When Using Data Adaptive Estimators Of Nuisance Parameters, Mark J. Van Der Laan Nov 2012

Statistical Inference When Using Data Adaptive Estimators Of Nuisance Parameters, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In order to be concrete we focus on estimation of the treatment specific mean, controlling for all measured baseline covariates, based on observing n independent and identically distributed copies of a random variable consisting of baseline covariates, a subsequently assigned binary treatment, and a final outcome. The statistical model only assumes possible restrictions on the conditional distribution of treatment, given the covariates, the so called propensity score. Estimators of the treatment specific mean involve estimation of the propensity score and/or estimation of the conditional mean of the outcome, given the treatment and covariates. In order to make these estimators asymptotically …


The Impact Of Covariance Misspecification In Multivariate Gaussian Mixtures On Estimation And Inference: An Application To Longitudinal Modeling, Brianna C. Heggeseth, Nicholas P. Jewell Oct 2012

The Impact Of Covariance Misspecification In Multivariate Gaussian Mixtures On Estimation And Inference: An Application To Longitudinal Modeling, Brianna C. Heggeseth, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Multivariate Gaussian mixtures are a class of models that provide a flexible parametric approach for the representation of heterogeneous multivariate outcomes. When the outcome is a vector of repeated measurements taken on the same subject, there is often inherent dependence between observations. However, a common covariance assumption is conditional independence---that is, given the mixture component label, the outcomes for subjects are independent. In this paper, we study, through asymptotic bias calculations and simulation, the impact of covariance misspecification in multivariate Gaussian mixtures. Although maximum likelihood estimators of regression and mixing probability parameters are not consistent under misspecification, they have little …


Causal Inference For Networks, Mark J. Van Der Laan Oct 2012

Causal Inference For Networks, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a population of causally connected units according to a network. On each unit we observe a set of potentially connected units that contains the true connections, and a longitudinal data structure, which includes time-dependent exposure or treatment, time-dependent covariates, a final outcome of interest. The target quantity of interest is defined as the mean outcome for this group of units if the exposures of the units would be probabilistically assigned according to a known specified mechanism, where the latter is called a stochastic intervention. Causal effects of interest are defined as contrasts of the mean of …


Targeted Learning Of The Probability Of Success Of An In Vitro Fertilization Program Controlling For Time-Dependent Confounders, Antoine Chambaz, Sherri Rose, Jean Bouyer, Mark J. Van Der Laan Oct 2012

Targeted Learning Of The Probability Of Success Of An In Vitro Fertilization Program Controlling For Time-Dependent Confounders, Antoine Chambaz, Sherri Rose, Jean Bouyer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Infertility is a global public health issue and various treatments are available. In vitro fertilization (IVF) is an increasingly common treatment method, but accurately assessing the success of IVF programs has proven challenging since they consist of multiple cycles. We present a double robust semiparametric method that incorporates machine learning to estimate the probability of success (i.e., delivery resulting from embryo transfer) of a program of at most four IVF cycles in the French Devenir Apr`es Interruption de la FIV (DAIFI) study and several simulation studies, controlling for time-dependent confounders. We find that the probability of success in the DAIFI …


Assessing The Causal Effect Of Policies: An Approach Based On Stochastic Interventions, Iván Díaz, Mark J. Van Der Laan Oct 2012

Assessing The Causal Effect Of Policies: An Approach Based On Stochastic Interventions, Iván Díaz, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Stochastic interventions are a powerful tool to define parameters that measure the causal effect of a realistic intervention that intends to alter the population distribution of an exposure. In this paper we follow the approach described in D\'iaz and van der Laan (2011) to define and estimate the effect of an intervention that is expected to cause a truncation in the population distribution of the exposure. The observed data parameter that identifies the causal parameter of interest is established, as well as its efficient influence function under the non parametric model. Inverse probability of treatment weighted (IPTW), augmented IPTW and …


Targeted Learning For Causality And Statistical Analysis In Medical Research, Sherri Rose, Richard J.C.M. Starmans, Mark J. Van Der Laan Aug 2012

Targeted Learning For Causality And Statistical Analysis In Medical Research, Sherri Rose, Richard J.C.M. Starmans, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The authors present the use of targeted learning methods for medical research, prepared as a chapter for the upcoming book "Statistics: Discovering Your Future Power." The targeted learning framework involves the explicit specification of the data, model, and parameter. The estimators are double robust and efficient, and can incorporate machine learning procedures such as the super learner.


Adaptive Matching In Randomized Trials And Observational Studies, Mark J. Van Der Laan, Laura Balzer, Maya L. Petersen Jul 2012

Adaptive Matching In Randomized Trials And Observational Studies, Mark J. Van Der Laan, Laura Balzer, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

In many randomized and observational studies the allocation of treatment among a sample of n independent and identically distributed units is a function of the covariates of all sampled units. As a result, the treatment labels among the units are possibly dependent, complicating estimation and posing challenges for statistical inference. For example, cluster randomized trials frequently sample communities from some target population, construct matched pairs of communities from those included in the sample based on some metric of similarity in baseline community characteristics, and then randomly allocate a treatment and a control intervention within each matched pair. In this case, …


Causal Mediation In A Survival Setting With Time-Dependent Mediators, Wenjing Zheng, Mark J. Van Der Laan Jun 2012

Causal Mediation In A Survival Setting With Time-Dependent Mediators, Wenjing Zheng, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The effect of an expsore on an outcome of interest is often mediated by intermediate variables. The goal of causal mediation analysis is to evaluate the role of these intermediate variables (mediators) in the causal effect of the exposure on the outcome. In this paper, we consider causal mediation of a baseline exposure on a survival (or time-to-event) outcome, when the mediator is time-dependent. The challenge in this setting lies in that the event process takes places jointly with the mediator process; in particular, the length of the mediator history depends on the survival time. As a result, we argue …


Avoiding Boundary Estimates In Linear Mixed Models Through Weakly Informative Priors, Yeojin Chung, Sophia Rabe-Hesketh, Andrew Gelman, Jingchen Liu, Vincent Dorie Feb 2012

Avoiding Boundary Estimates In Linear Mixed Models Through Weakly Informative Priors, Yeojin Chung, Sophia Rabe-Hesketh, Andrew Gelman, Jingchen Liu, Vincent Dorie

U.C. Berkeley Division of Biostatistics Working Paper Series

Variance parameters in mixed or multilevel models can be difficult to estimate, especially when the number of groups is small. We propose a maximum penalized likelihood approach which is equivalent to estimating variance parameters by their marginal posterior mode, given a weakly informative prior distribution. By choosing the prior from the gamma family with at least 1 degree of freedom, we ensure that the prior density is zero at the boundary and thus the marginal posterior mode of the group-level variance will be positive. The use of a weakly informative prior allows us to stabilize our estimates while remaining faithful …