Open Access. Powered by Scholars. Published by Universities.®

Medicine and Health Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Methodology

COBRA

U.C. Berkeley Division of Biostatistics Working Paper Series

Articles 1 - 19 of 19

Full-Text Articles in Medicine and Health Sciences

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen Feb 2017

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral …


Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Jan 2014

Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. …


Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard Jul 2011

Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …


Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan Sep 2009

Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A nested case-control study is conducted within a well-defined cohort arising out of a population of interest. This design is often used in epidemiology to reduce the costs associated with collecting data on the full cohort; however, the case control sample within the cohort is a biased sample. Methods for analyzing case-control studies have largely focused on logistic regression models that provide conditional and not marginal causal estimates of the odds ratio. We previously developed a Case-Control Weighted Targeted Maximum Likelihood Estimation (TMLE) procedure for case-control study designs, which relies on the prevalence probability q0. We propose the use of …


A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin Dec 2008

A Small Sample Correction For Estimating Attributable Risk In Case-Control Studies, Daniel B. Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

The attributable risk, often called the population attributable risk, is in many epidemiological contexts a more relevant measure of exposure-disease association than the excess risk, relative risk, or odds ratio. When estimating attributable risk with case-control data and a rare disease, we present a simple correction to the standard approach making it essentially unbiased, and also less noisy. As with analogous corrections given in Jewell (1986) for other measures of association, the adjustment often won't make a substantial difference unless the sample size is very small or point estimates are desired within fine strata, but we discuss the possible utility …


Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan May 2008

Doubly Robust Ecological Inference, Daniel B. Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The ecological inference problem is a famous longstanding puzzle that arises in many disciplines. The usual formulation in epidemiology is that we would like to quantify an exposure-disease association by obtaining disease rates among the exposed and unexposed, but only have access to exposure rates and disease rates for several regions. The problem is generally intractable, but can be attacked under the assumptions of King's (1997) extended technique if we can correctly specify a model for a certain conditional distribution. We introduce a procedure that it is a valid approach if either this original model is correct or if we …


Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan Jul 2007

Empirical Efficiency Maximization, Daniel B. Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

It has long been recognized that covariate adjustment can increase precision, even when it is not strictly necessary. The phenomenon is particularly emphasized in clinical trials, whether using continuous, categorical, or censored time-to-event outcomes. Adjustment is often straightforward when a discrete covariate partitions the sample into a handful of strata, but becomes more involved when modern studies collect copious amounts of baseline information on each subject.

The dilemma helped motivate locally efficient estimation for coarsened data structures, as surveyed in the books of van der Laan and Robins (2003) and Tsiatis (2006). Here one fits a relatively small working model …


Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan Sep 2006

Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSMs) allow one to form causal inferences from data, by specifying a relationship between a treatment and the marginal distribution of a corresponding counterfactual outcome. Following their introduction in Robins (1997), MSMs have typically been fit after assuming a semiparametric model, and then estimating a finite dimensional parameter. van der Laan and Dudoit (2003) proposed to instead view MSM fitting not as a task of semiparametric parameter estimation, but of nonparametric function approximation. They introduced a class of causal effect estimators based on mapping loss functions suitable for the unavailable counterfactual data to those suitable for the …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan Oct 2004

Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. We propose using multiple testing to find codons which have significant univariate associations with replication capacity of the virus. We also propose using a data adaptive multiple regression algorithm to obtain multiple predictions of viral replication capacity based on an entire mutant/non-mutant sequence profile. The data set to which these techniques …


Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan Oct 2004

Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification …


History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen Sep 2004

History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. …


Estimation Of Treatment Effects In Randomized Trials With Noncompliance And A Dichotomous Outcome , Mark J. Van Der Laan, Alan E. Hubbard, Nicholas P. Jewell Sep 2004

Estimation Of Treatment Effects In Randomized Trials With Noncompliance And A Dichotomous Outcome , Mark J. Van Der Laan, Alan E. Hubbard, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a class of estimators of the treatment effect on a dichotomous outcome among the treated subjects within covariate and treatment arm strata in randomized trials with non-compliance. Recent articles by Vansteelandt and Goethebeur (2003) and Robins and Rotnitzky (2004) have presented consistent and asymptotically linear estimators of a causal odds ratio, which rely, beyond correct specification of a model for the causal odds ratio, on a correctly specified model for a potentially high dimensional nuisance parameter. In this article we propose consistent, asymptotically linear and locally efficient estimators of a causal relative risk and a new parameter -- …


Estimation Of Direct And Indirect Causal Effects In Longitudinal Studies, Mark J. Van Der Laan, Maya L. Petersen Aug 2004

Estimation Of Direct And Indirect Causal Effects In Longitudinal Studies, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders, Robins & Greenland (1992) and Pearl (2000), develop two identifiability results for direct and indirect causal effects. They define an …


Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley Jun 2004

Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley

U.C. Berkeley Division of Biostatistics Working Paper Series

Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genome-wide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence …


Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current status observation on survival times has recently been widely studied. An extreme form of interval censoring, this data structure refers to situations where the only available information on a survival random variable, T, is whether or not T exceeds a random independent monitoring time C, a binary random variable, Y. To date, nonparametric analyses of current status data have assumed the availability of i.i.d. random samples of the random variable (Y, C), or a similar random sample at each of a set of fixed monitoring times. In many situations, it is useful to consider a case-control sampling scheme. Here, …


Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Researchers working with survival data are by now adept at handling issues associated with incomplete data, particular those associated with various forms of censoring. An extreme form of interval censoring, known as current status observation, refers to situations where the only available information on a survival random variable T is whether or not T exceeds a random independent monitoring time C. This article contains a brief review of the extensive literature on the analysis of current status data, discussing the implications of response-based sampling on these methods. The majority of the paper introduces some recent extensions of these ideas to …