Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 98

Full-Text Articles in Physical Sciences and Mathematics

Integrated Multiple Mediation Analysis: A Robustness–Specificity Trade-Off In Causal Structure, An-Shun Tai, Sheng-Hsuan Lin May 2020

Integrated Multiple Mediation Analysis: A Robustness–Specificity Trade-Off In Causal Structure, An-Shun Tai, Sheng-Hsuan Lin

Harvard University Biostatistics Working Paper Series

Recent methodological developments in causal mediation analysis have addressed several issues regarding multiple mediators. However, these developed methods differ in their definitions of causal parameters, assumptions for identification, and interpretations of causal effects, making it unclear which method ought to be selected when investigating a given causal effect. Thus, in this study, we construct an integrated framework, which unifies all existing methodologies, as a standard for mediation analysis with multiple mediators. To clarify the relationship between existing methods, we propose four strategies for effect decomposition: two-way, partially forward, partially backward, and complete decompositions. This study reveals how the direct and …


Technical Considerations In The Use Of The E-Value, Tyler J. Vanderweele, Peng Ding, Maya Mathur Feb 2018

Technical Considerations In The Use Of The E-Value, Tyler J. Vanderweele, Peng Ding, Maya Mathur

Harvard University Biostatistics Working Paper Series

The E-value is defined as the minimum strength of association on the risk ratio scale that an unmeasured confounder would have to have with both the exposure and the outcome, conditional on the measured covariates, to explain away the observed exposure-outcome association. We have elsewhere proposed that the reporting of E-values for estimates and for the limit of the confidence interval closest to the null become routine whenever causal effects are of interest. A number of questions have arisen about the use of E-value including questions concerning the interpretation of the relevant confounding association parameters, the nature of the transformation …


Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen Feb 2017

Evaluation Of Progress Towards The Unaids 90-90-90 Hiv Care Cascade: A Description Of Statistical Methods Used In An Interim Analysis Of The Intervention Communities In The Search Study, Laura Balzer, Joshua Schwab, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

WHO guidelines call for universal antiretroviral treatment, and UNAIDS has set a global target to virally suppress most HIV-positive individuals. Accurate estimates of population-level coverage at each step of the HIV care cascade (testing, treatment, and viral suppression) are needed to assess the effectiveness of "test and treat" strategies implemented to achieve this goal. The data available to inform such estimates, however, are susceptible to informative missingness: the number of HIV-positive individuals in a population is unknown; individuals tested for HIV may not be representative of those whom a testing intervention fails to reach, and HIV-positive individuals with a viral …


Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret Jan 2016

Models For Hsv Shedding Must Account For Two Levels Of Overdispersion, Amalia Magaret

UW Biostatistics Working Paper Series

We have frequently implemented crossover studies to evaluate new therapeutic interventions for genital herpes simplex virus infection. The outcome measured to assess the efficacy of interventions on herpes disease severity is the viral shedding rate, defined as the frequency of detection of HSV on the genital skin and mucosa. We performed a simulation study to ascertain whether our standard model, which we have used previously, was appropriately considering all the necessary features of the shedding data to provide correct inference. We simulated shedding data under our standard, validated assumptions and assessed the ability of 5 different models to reproduce the …


Nested Partially-Latent, Class Models For Dependent Binary Data, Estimating Disease Etiology, Zhenke Wu, Maria Deloria-Knoll, Scott L. Zeger Nov 2015

Nested Partially-Latent, Class Models For Dependent Binary Data, Estimating Disease Etiology, Zhenke Wu, Maria Deloria-Knoll, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

The Pneumonia Etiology Research for Child Health (PERCH) study seeks to use modern measurement technology to infer the causes of pneumonia for which gold-standard evidence is unavailable. The paper describes a latent variable model designed to infer from case-control data the etiology distribution for the population of cases, and for an individual case given his or her measurements. We assume each observation is drawn from a mixture model for which each component represents one cause or disease class. The model addresses a major limitation of the traditional latent class approach by taking account of residual dependence among multivariate binary outcome …


A General Framework For Diagnosing Confounding Of Time-Varying And Other Joint Exposures, John W. Jackson May 2015

A General Framework For Diagnosing Confounding Of Time-Varying And Other Joint Exposures, John W. Jackson

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Unification Of Mediation And Interaction: A Four-Way Decomposition, Tyler J. Vanderweele Mar 2014

A Unification Of Mediation And Interaction: A Four-Way Decomposition, Tyler J. Vanderweele

Harvard University Biostatistics Working Paper Series

It is shown that the overall effect of an exposure on an outcome, in the presence of a mediator with which the exposure may interact, can be decomposed into four components: (i) the effect of the exposure in the absence of the mediator, (ii) the interactive effect when the mediator is left to what it would be in the absence of exposure, (iii) a mediated interaction, and (iv) a pure mediated effect. These four components, respectively, correspond to the portion of the effect that is due to neither mediation nor interaction, to just interaction (but not mediation), to both mediation …


Computational Model For Survey And Trend Analysis Of Patients With Endometriosis : A Decision Aid Tool For Ebm, Salvo Reina, Vito Reina, Franco Ameglio, Mauro Costa, Alessandro Fasciani Feb 2014

Computational Model For Survey And Trend Analysis Of Patients With Endometriosis : A Decision Aid Tool For Ebm, Salvo Reina, Vito Reina, Franco Ameglio, Mauro Costa, Alessandro Fasciani

COBRA Preprint Series

Endometriosis is increasingly collecting worldwide attention due to its medical complexity and social impact. The European community has identified this as a “social disease”. A large amount of information comes from scientists, yet several aspects of this pathology and staging criteria need to be clearly defined on a suitable number of individuals. In fact, available studies on endometriosis are not easily comparable due to a lack of standardized criteria to collect patients’ informations and scarce definitions of symptoms. Currently, only retrospective surgical stadiation is used to measure pathology intensity, while the Evidence Based Medicine (EBM) requires shareable methods and correct …


Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Jan 2014

Adaptive Pair-Matching In The Search Trial And Estimation Of The Intervention Effect, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In randomized trials, pair-matching is an intuitive design strategy to protect study validity and to potentially increase study power. In a common design, candidate units are identified, and their baseline characteristics used to create the best n/2 matched pairs. Within the resulting pairs, the intervention is randomized, and the outcomes measured at the end of follow-up. We consider this design to be adaptive, because the construction of the matched pairs depends on the baseline covariates of all candidate units. As consequence, the observed data cannot be considered as n/2 independent, identically distributed (i.i.d.) pairs of units, as current practice assumes. …


Estimating Population Treatment Effects From A Survey Sub-Sample, Kara E. Rudolph, Ivan Diaz, Michael Rosenblum, Elizabeth A. Stuart Jan 2014

Estimating Population Treatment Effects From A Survey Sub-Sample, Kara E. Rudolph, Ivan Diaz, Michael Rosenblum, Elizabeth A. Stuart

Johns Hopkins University, Dept. of Biostatistics Working Papers

We consider the problem of estimating an average treatment effect for a target population from a survey sub-sample. Our motivating example is generalizing a treatment effect estimated in a sub-sample of the National Comorbidity Survey Replication Adolescent Supplement to the population of U.S. adolescents. To address this problem, we evaluate easy-to-implement methods that account for both non-random treatment assignment and a non-random two-stage selection mechanism. We compare the performance of a Horvitz-Thompson estimator using inverse probability weighting (IPW) and two double robust estimators in a variety of scenarios. We demonstrate that the two double robust estimators generally outperform IPW in …


Attributing Effects To Interactions, Tyler J. Vanderweele, Eric J. Tchetgen Tchetgen Jul 2013

Attributing Effects To Interactions, Tyler J. Vanderweele, Eric J. Tchetgen Tchetgen

Harvard University Biostatistics Working Paper Series

A framework is presented which allows an investigator to estimate the portion of the effect of one exposure that is attributable to an interaction with a second exposure. We show that when the two exposures are independent, the total effect of one exposure can be decomposed into a conditional effect of that exposure and a component due to interaction. The decomposition applies on difference or ratio scales. We discuss how the components can be estimated using standard regression models, and how these components can be used to evaluate the proportion of the total effect of the primary exposure attributable to …


Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan May 2013

Estimating Effects On Rare Outcomes: Knowledge Is Power, Laura B. Balzer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Many of the secondary outcomes in observational studies and randomized trials are rare. Methods for estimating causal effects and associations with rare outcomes, however, are limited, and this represents a missed opportunity for investigation. In this article, we construct a new targeted minimum loss-based estimator (TMLE) for the effect of an exposure or treatment on a rare outcome. We focus on the causal risk difference and statistical models incorporating bounds on the conditional risk of the outcome, given the exposure and covariates. By construction, the proposed estimator constrains the predicted outcomes to respect this model knowledge. Theoretically, this bounding provides …


A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman Dec 2012

A Regionalized National Universal Kriging Model Using Partial Least Squares Regression For Estimating Annual Pm2.5 Concentrations In Epidemiology, Paul D. Sampson, Mark Richards, Adam A. Szpiro, Silas Bergen, Lianne Sheppard, Timothy V. Larson, Joel Kaufman

UW Biostatistics Working Paper Series

Many cohort studies in environmental epidemiology require accurate modeling and prediction of fine scale spatial variation in ambient air quality across the U.S. This modeling requires the use of small spatial scale geographic or “land use” regression covariates and some degree of spatial smoothing. Furthermore, the details of the prediction of air quality by land use regression and the spatial variation in ambient air quality not explained by this regression should be allowed to vary across the continent due to the large scale heterogeneity in topography, climate, and sources of air pollution. This paper introduces a regionalized national universal kriging …


Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng Dec 2011

Flexible Distributed Lag Models Using Random Functions With Application To Estimating Mortality Displacement From Heat-Related Deaths, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang Nov 2011

Assessing Association For Bivariate Survival Data With Interval Sampling: A Copula Model Approach With Application To Aids Study, Hong Zhu, Mei-Cheng Wang

Johns Hopkins University, Dept. of Biostatistics Working Papers

In disease surveillance systems or registries, bivariate survival data are typically collected under interval sampling. It refers to a situation when entry into a registry is at the time of the first failure event (e.g., HIV infection) within a calendar time interval, the time of the initiating event (e.g., birth) is retrospectively identified for all the cases in the registry, and subsequently the second failure event (e.g., death) is observed during the follow-up. Sampling bias is induced due to the selection process that the data are collected conditioning on the first failure event occurs within a time interval. Consequently, the …


A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman Sep 2011

A Regularization Corrected Score Method For Nonlinear Regression Models With Covariate Error, David M. Zucker, Malka Gorfine, Yi Li, Donna Spiegelman

Harvard University Biostatistics Working Paper Series

No abstract provided.


Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard Jul 2011

Variable Importance Analysis With The Multipim R Package, Stephan J. Ritter, Nicholas P. Jewell, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

We describe the R package multiPIM, including statistical background, functionality and user options. The package is for variable importance analysis, and is meant primarily for analyzing data from exploratory epidemiological studies, though it could certainly be applied in other areas as well. The approach taken to variable importance comes from the causal inference field, and is different from approaches taken in other R packages. By default, multiPIM uses a double robust targeted maximum likelihood estimator (TMLE) of a parameter akin to the attributable risk. Several regression methods/machine learning algorithms are available for estimating the nuisance parameters of the models, including …


Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng Jul 2011

Reduced Bayesian Hierarchical Models: Estimating Health Effects Of Simultaneous Exposure To Multiple Pollutants, Jennifer F. Bobb, Francesca Dominici, Roger D. Peng

Johns Hopkins University, Dept. of Biostatistics Working Papers

Quantifying the health effects associated with simultaneous exposure to many air pollutants is now a research priority of the US EPA. Bayesian hierarchical models (BHM) have been extensively used in multisite time series studies of air pollution and health to estimate health effects of a single pollutant adjusted for potential confounding of other pollutants and other time-varying factors. However, when the scientific goal is to estimate the impacts of many pollutants jointly, a straightforward application of BHM is challenged by the need to specify a random-effect distribution on a high-dimensional vector of nuisance parameters, which often do not have an …


Threshold Regression Models Adapted To Case-Control Studies, And The Risk Of Lung Cancer Due To Occupational Exposure To Asbestos In France, Antoine Chambaz, Dominique Choudat, Catherine Huber, Jean-Claude Pairon, Mark J. Van Der Laan Mar 2011

Threshold Regression Models Adapted To Case-Control Studies, And The Risk Of Lung Cancer Due To Occupational Exposure To Asbestos In France, Antoine Chambaz, Dominique Choudat, Catherine Huber, Jean-Claude Pairon, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Asbestos has been known for many years as a powerful carcinogen. Our purpose is quantify the relationship between an occupational exposure to asbestos and an increase of the risk of lung cancer. Furthermore, we wish to tackle the very delicate question of the evaluation, in subjects suffering from a lung cancer, of how much the amount of exposure to asbestos explains the occurrence of the cancer. For this purpose, we rely on a recent French case-control study. We build a large collection of threshold regression models, data-adaptively select a better model in it by multi-fold likelihood-based cross-validation, then fit the …


Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel Nov 2010

Minimum Description Length And Empirical Bayes Methods Of Identifying Snps Associated With Disease, Ye Yang, David R. Bickel

COBRA Preprint Series

The goal of determining which of hundreds of thousands of SNPs are associated with disease poses one of the most challenging multiple testing problems. Using the empirical Bayes approach, the local false discovery rate (LFDR) estimated using popular semiparametric models has enjoyed success in simultaneous inference. However, the estimated LFDR can be biased because the semiparametric approach tends to overestimate the proportion of the non-associated single nucleotide polymorphisms (SNPs). One of the negative consequences is that, like conventional p-values, such LFDR estimates cannot quantify the amount of information in the data that favors the null hypothesis of no disease-association.

We …


Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations, Lu Wang, Andrea Rotnitzky, Xihong Lin Apr 2010

Nonparametric Regression With Missing Outcomes Using Weighted Kernel Estimating Equations, Lu Wang, Andrea Rotnitzky, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Survival Analysis With Error-Prone Time-Varying Covariates: A Risk Set Calibration Approach, Xiaomei Liao, David M. Zucker, Yi Li, Donna Spiegelman Nov 2009

Survival Analysis With Error-Prone Time-Varying Covariates: A Risk Set Calibration Approach, Xiaomei Liao, David M. Zucker, Yi Li, Donna Spiegelman

Harvard University Biostatistics Working Paper Series

No abstract provided.


Causal Inference In Epidemiological Studies With Strong Confounding, Kelly L. Moore, Romain S. Neugebauer, Mark J. Van Der Laan, Ira B. Tager Oct 2009

Causal Inference In Epidemiological Studies With Strong Confounding, Kelly L. Moore, Romain S. Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

One of the identifiabilty assumptions of causal effects defined by marginal structural model (MSM) parameters is the experimental treatment assignment (ETA) assumption. Practical violations of this assumption frequently occur in data analysis, when certain exposures are rarely observed within some strata of the population. The inverse probability of treatment weighted (IPTW) estimator is particularly sensitive to violations of this assumption, however, we demonstrate that this is a problem for all estimators of causal effects. This is due to the fact that the ETA assumption is about information (or lack thereof) in the data. A new class of causal models, causal …


Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan Sep 2009

Causal Inference For Nested Case-Control Studies Using Targeted Maximum Likelihood Estimation, Sherri Rose, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A nested case-control study is conducted within a well-defined cohort arising out of a population of interest. This design is often used in epidemiology to reduce the costs associated with collecting data on the full cohort; however, the case control sample within the cohort is a biased sample. Methods for analyzing case-control studies have largely focused on logistic regression models that provide conditional and not marginal causal estimates of the odds ratio. We previously developed a Case-Control Weighted Targeted Maximum Likelihood Estimation (TMLE) procedure for case-control study designs, which relies on the prevalence probability q0. We propose the use of …


A Spatio-Temporal Approach For Estimating Chronic Effects Of Air Pollution, Sonja Greven, Francesca Dominici, Scott L. Zeger Jun 2009

A Spatio-Temporal Approach For Estimating Chronic Effects Of Air Pollution, Sonja Greven, Francesca Dominici, Scott L. Zeger

Johns Hopkins University, Dept. of Biostatistics Working Papers

Estimating the health risks associated with air pollution exposure is of great importance in public health. In air pollution epidemiology, two study designs have been used mainly. Time series studies estimate acute risk associated with short-term exposure. They compare day-to-day variation of pollution concentrations and mortality rates, and have been criticized for potential confounding by time-varying covariates. Cohort studies estimate chronic effects associated with long-term exposure. They compare long-term average pollution concentrations and time-to-death across cities, and have been criticized for potential confounding by individual risk factors or city-level characteristics.

We propose a new study design and a statistical model, …


Spatial Cluster Detection For Repeatedly Measured Outcomes While Accounting For Residential History, Andrea J. Cook, Diane Gold, Yi Li Jun 2009

Spatial Cluster Detection For Repeatedly Measured Outcomes While Accounting For Residential History, Andrea J. Cook, Diane Gold, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Spatial Cluster Detection For Weighted Outcomes Using Cumulative Geographic Residuals, Andrea J. Cook, Yi Li, David Arterburn, Ram C. Tiwari Jun 2009

Spatial Cluster Detection For Weighted Outcomes Using Cumulative Geographic Residuals, Andrea J. Cook, Yi Li, David Arterburn, Ram C. Tiwari

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Machine-Learning Algorithm For Estimating And Ranking The Impact Of Environmental Risk Factors In Exploratory Epidemiological Studies, Jessica G. Young, Alan E. Hubbard, B Eskenazi, Nicholas P. Jewell Jun 2009

A Machine-Learning Algorithm For Estimating And Ranking The Impact Of Environmental Risk Factors In Exploratory Epidemiological Studies, Jessica G. Young, Alan E. Hubbard, B Eskenazi, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

No abstract provided.


The Importance Of Scale For Spatial-Confounding Bias And Precision Of Spatial Regression Estimators, Christopher J. Paciorek Mar 2009

The Importance Of Scale For Spatial-Confounding Bias And Precision Of Spatial Regression Estimators, Christopher J. Paciorek

Harvard University Biostatistics Working Paper Series

Increasingly, regression models are used when residuals are spatially correlated. Prominent examples include studies in environmental epidemiology to understand the chronic health effects of pollutants. I consider the effects of residual spatial structure on the bias and precision of regression coefficients, developing a simple framework in which to understand the key issues and derive informative analytic results. When the spatial residual is induced by an unmeasured confounder, regression models with spatial random effects and closely-related models such as kriging and penalized splines are biased, even when the residual variance components are known. Analytic and simulation results show how the bias …


Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei Mar 2009

Analysis Of Randomized Comparative Clinical Trial Data For Personalized Treatment Selections, Tianxi Cai, Lu Tian, Peggy H. Wong, L. J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.