Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

COBRA

Series

2016

Discipline
Keyword
Publication

Articles 1 - 30 of 49

Full-Text Articles in Physical Sciences and Mathematics

Improving Power In Group Sequential, Randomized Trials By Adjusting For Prognostic Baseline Variables And Short-Term Outcomes, Tianchen Qian, Michael Rosenblum, Huitong Qiu Dec 2016

Improving Power In Group Sequential, Randomized Trials By Adjusting For Prognostic Baseline Variables And Short-Term Outcomes, Tianchen Qian, Michael Rosenblum, Huitong Qiu

Johns Hopkins University, Dept. of Biostatistics Working Papers

In group sequential designs, adjusting for baseline variables and short-term outcomes can lead to increased power and reduced sample size. We derive formulas for the precision gain from such variable adjustment using semiparametric estimators for the average treatment effect, and give new results on what conditions lead to substantial power gains and sample size reductions. The formulas reveal how the impact of prognostic variables on the precision gain is modified by the number of pipeline participants, analysis timing, enrollment rate, and treatment effect heterogeneity, when the semiparametric estimator uses correctly specified models. Given set prognostic value of baseline variables and …


Stochastic Optimization Of Adaptive Enrichment Designs For Two Subpopulations, Aaron Fisher, Michael Rosenblum Dec 2016

Stochastic Optimization Of Adaptive Enrichment Designs For Two Subpopulations, Aaron Fisher, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

An adaptive enrichment design is a randomized trial that allows enrollment criteria to be modified at interim analyses, based on a preset decision rule. When there is prior uncertainty regarding treatment effect heterogeneity, these trial designs can provide improved power for detecting treatment effects in subpopulations. We present a simulated annealing approach to search over the space of decision rules and other parameters for an adaptive enrichment design. The goal is to minimize the expected number enrolled or expected duration, while preserving the appropriate power and Type I error rate. We also explore the benefits of parallel computation in the …


Efficiency Of Two Sample Tests Via The T-Mean Survival Time For Analyzing Event Time Observations, Lu Tian, Haoda Fu, Stephen J. Ruberg, Hajime Uno, Lj Wei Nov 2016

Efficiency Of Two Sample Tests Via The T-Mean Survival Time For Analyzing Event Time Observations, Lu Tian, Haoda Fu, Stephen J. Ruberg, Hajime Uno, Lj Wei

Harvard University Biostatistics Working Paper Series

In comparing two treatments with the event time observations, the hazard ratio (HR) estimate is routinely used to quantify the treatment difference. However, this model dependent estimate may be difficult to interpret clinically especially when the proportional hazards (PH) assumption is violated. An alternative estimation procedure for treatment efficacy based on the restricted means survival time or t-year mean survival time (t-MST) has been discussed extensively in the statistical and clinical literature. On the other hand, a statistical test 1 via the HR or its asymptotically equivalent counterpart, the logrank test, is asymptotically distribution-free. In this paper, we assess the …


Using Sensitivity Analyses For Unobserved Confounding To Address Covariate Measurement Error In Propensity Score Methods, Kara E. Rudolph, Elizabeth A. Stuart Nov 2016

Using Sensitivity Analyses For Unobserved Confounding To Address Covariate Measurement Error In Propensity Score Methods, Kara E. Rudolph, Elizabeth A. Stuart

Johns Hopkins University, Dept. of Biostatistics Working Papers

Propensity score methods are a popular tool to control for confounding in observational data, but their bias-reduction properties are threatened by covariate measurement error. There are few easy-to-implement methods to correct for such bias. We describe and demonstrate how existing sensitivity analyses for unobserved confounding---propensity score calibration, Vanderweele and Arah's bias formulas, and Rosenbaum's sensitivity analysis---can be adapted to address this problem. In a simulation study, we examined the extent to which these sensitivity analyses can correct for several measurement error structures: classical, systematic differential, and heteroscedastic covariate measurement error. We then apply these approaches to address covariate measurement error …


Confidence Intervals For Heritability Via Haseman-Elston Regression, Tamar Sofer Nov 2016

Confidence Intervals For Heritability Via Haseman-Elston Regression, Tamar Sofer

UW Biostatistics Working Paper Series

Heritability is the proportion of phenotypic variance in a population that is attributable to individual genotypes. Heritability is considered an important measure in both evolutionary biology and in medicine, and is routinely estimated and reported in genetic epidemiology studies. In population-based genome-wide association studies (GWAS), mixed models are used to estimate variance components, from which a heritability estimate is obtained. The estimated heritability is the proportion of the model's total variance that is due to the genetic relatedness matrix (kinship measured from genotypes). Current practice is to use bootstrapping, which is slow, or normal asymptotic approximation to estimate the precision …


Robust Alternatives To Ancova For Estimating The Treatment Effect Via A Randomized Comparative Study, Fei Jiang, Lu Tian, Haoda Fu, Takahiro Hasegawa, Marc Alan Pfeffer, L. J. Wei Nov 2016

Robust Alternatives To Ancova For Estimating The Treatment Effect Via A Randomized Comparative Study, Fei Jiang, Lu Tian, Haoda Fu, Takahiro Hasegawa, Marc Alan Pfeffer, L. J. Wei

Harvard University Biostatistics Working Paper Series

In comparing two treatments via a randomized clinical trial, the analysis of covari- ance technique is often utilized to estimate an overall treatment effect. The ANCOVA is generally perceived as a more efficient procedure than its simple two sample estima- tion counterpart. Unfortunately when the ANCOVA model is not correctly specified, the resulting estimator is generally not consistent especially when the model is nonlin- ear. Recently various nonparametric alternatives, such as the augmentation methods, to ANCOVA have been proposed to estimate the treatment effect by adjusting the covariates. However, the properties of these alternatives have not been studied in the …


Censoring Unbiased Regression Trees And Ensembles, Jon Arni Steingrimsson, Liqun Diao, Robert L. Strawderman Oct 2016

Censoring Unbiased Regression Trees And Ensembles, Jon Arni Steingrimsson, Liqun Diao, Robert L. Strawderman

Johns Hopkins University, Dept. of Biostatistics Working Papers

This paper proposes a novel approach to building regression trees and ensemble learning in survival analysis. By first extending the theory of censoring unbiased transformations, we construct observed data estimators of full data loss functions in cases where responses can be right censored. This theory is used to construct two specific classes of methods for building regression trees and regression ensembles that respectively make use of Buckley-James and doubly robust estimating equations for a given full data risk function. For the particular case of squared error loss, we further show how to implement these algorithms using existing software (e.g., CART, …


Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan Oct 2016

Online Cross-Validation-Based Ensemble Learning, David Benkeser, Samuel D. Lendle, Cheng Ju, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Online estimators update a current estimate with a new incoming batch of data without having to revisit past data thereby providing streaming estimates that are scalable to big data. We develop flexible, ensemble-based online estimators of an infinite-dimensional target parameter, such as a regression function, in the setting where data are generated sequentially by a common conditional data distribution given summary measures of the past. This setting encompasses a wide range of time-series models and as special case, models for independent and identically distributed data. Our estimator considers a large library of candidate online estimators and uses online cross-validation to …


Doubly-Robust Nonparametric Inference On The Average Treatment Effect, David Benkeser, Marco Carone, Mark J. Van Der Laan, Peter Gilbert Oct 2016

Doubly-Robust Nonparametric Inference On The Average Treatment Effect, David Benkeser, Marco Carone, Mark J. Van Der Laan, Peter Gilbert

U.C. Berkeley Division of Biostatistics Working Paper Series

Doubly-robust estimators are widely used to draw inference about the average effect of a treatment. Such estimators are consistent for the effect of interest if either one of two nuisance parameters is consistently estimated. However, if flexible, data-adaptive estimators of these nuisance parameters are used, double-robustness does not readily extend to inference. We present a general theoretical study of the behavior of doubly-robust estimators of an average treatment effect when one of the nuisance parameters is inconsistently estimated. We contrast different approaches for constructing such estimators and investigate the extent to which they may be modified to also allow doubly-robust …


Performance-Constrained Binary Classification Using Ensemble Learning: An Application To Cost-Efficient Targeted Prep Strategies, Wenjing Zheng, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan Oct 2016

Performance-Constrained Binary Classification Using Ensemble Learning: An Application To Cost-Efficient Targeted Prep Strategies, Wenjing Zheng, Laura Balzer, Maya L. Petersen, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Binary classifications problems are ubiquitous in health and social science applications. In many cases, one wishes to balance two conflicting criteria for an optimal binary classifier. For instance, in resource-limited settings, an HIV prevention program based on offering Pre-Exposure Prophylaxis (PrEP) to select high-risk individuals must balance the sensitivity of the binary classifier in detecting future seroconverters (and hence offering them PrEP regimens) with the total number of PrEP regimens that is financially and logistically feasible for the program to deliver. In this article, we consider a general class of performance-constrained binary classification problems wherein the objective function and the …


Matching The Efficiency Gains Of The Logistic Regression Estimator While Avoiding Its Interpretability Problems, In Randomized Trials, Michael Rosenblum, Jon Arni Steingrimsson Oct 2016

Matching The Efficiency Gains Of The Logistic Regression Estimator While Avoiding Its Interpretability Problems, In Randomized Trials, Michael Rosenblum, Jon Arni Steingrimsson

Johns Hopkins University, Dept. of Biostatistics Working Papers

Adjusting for prognostic baseline variables can lead to improved power in randomized trials. For binary outcomes, a logistic regression estimator is commonly used for such adjustment. This has resulted in substantial efficiency gains in practice, e.g., gains equivalent to reducing the required sample size by 20-28% were observed in a recent survey of traumatic brain injury trials. Robinson and Jewell (1991) proved that the logistic regression estimator is guaranteed to have equal or better asymptotic efficiency compared to the unadjusted estimator (which ignores baseline variables). Unfortunately, the logistic regression estimator has the following dangerous vulnerabilities: it is only interpretable when …


Model Averaged Double Robust Estimation, Matthew Cefalu, Francesca Dominici, Nils D. Arvold Md, Giovanni Parmigiani Sep 2016

Model Averaged Double Robust Estimation, Matthew Cefalu, Francesca Dominici, Nils D. Arvold Md, Giovanni Parmigiani

Harvard University Biostatistics Working Paper Series

Existing methods in causal inference do not account for the uncertainty in the selection of confounders. We propose a new class of estimators for the average causal effect, the model averaged double robust estimators, that formally account for model uncertainty in both the propensity score and outcome model through the use of Bayesian model averaging. These estimators build on the desirable double robustness property by only requiring the true propensity score model or the true outcome model be within a specified class of models to maintain consistency. We provide asymptotic results and conduct a large scale simulation study that indicates …


Distance-Based Analysis Of Variance For Brain Connectivity, Russell T. Shinohara, Haochang Shou, Marco Carone, Robert Schultz, Birkan Tunc, Drew Parker, Ragini Verma Aug 2016

Distance-Based Analysis Of Variance For Brain Connectivity, Russell T. Shinohara, Haochang Shou, Marco Carone, Robert Schultz, Birkan Tunc, Drew Parker, Ragini Verma

UPenn Biostatistics Working Papers

The field of neuroimaging dedicated to mapping connections in the brain is increasingly being recognized as key for understanding neurodevelopment and pathology. Networks of these connections are quantitatively represented using complex structures including matrices, functions, and graphs, which require specialized statistical techniques for estimation and inference about developmental and disorder-related changes. Unfortunately, classical statistical testing procedures are not well suited to high-dimensional testing problems. In the context of global or regional tests for differences in neuroimaging data, traditional analysis of variance (ANOVA) is not directly applicable without first summarizing the data into univariate or low-dimensional features, a process that may …


The Use Of Permutation Tests For The Analysis Of Parallel And Stepped-Wedge Cluster Randomized Trials, Rui Wang, Victor Degruttola Aug 2016

The Use Of Permutation Tests For The Analysis Of Parallel And Stepped-Wedge Cluster Randomized Trials, Rui Wang, Victor Degruttola

Harvard University Biostatistics Working Paper Series

We investigate the use of permutation tests for the analysis of parallel and stepped-wedge cluster randomized trials. Permutation tests for parallel designs with exponential family endpoints have been extensively studied. The optimal permutation tests developed for exponential family alternatives require information on intraclass correlation, a quantity not yet defined for time-to-event endpoints. Therefore, it is unclear how efficient permutation tests can be constructed for cluster-randomized trials with such endpoints. We consider a class of test statistics formed by a weighted average of pair-specific treatment effect estimates and offer practical guidance on the choice of weights to improve efficiency. We apply …


Improving Precision By Adjusting For Baseline Variables In Randomized Trials With Binary Outcomes, Without Regression Model Assumptions, Jon Arni Steingrimsson, Daniel F. Hanley, Michael Rosenblum Aug 2016

Improving Precision By Adjusting For Baseline Variables In Randomized Trials With Binary Outcomes, Without Regression Model Assumptions, Jon Arni Steingrimsson, Daniel F. Hanley, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

In randomized clinical trials with baseline variables that are prognostic for the primary outcome, there is potential to improve precision and reduce sample size by appropriately adjusting for these variables. A major challenge is that there are multiple statistical methods to adjust for baseline variables, but little guidance on which is best to use in a given context. The choice of method can have important consequences. For example, one commonly used method leads to uninterpretable estimates if there is any treatment effect heterogeneity, which would jeopardize the validity of trial conclusions. We give practical guidance on how to avoid this …


Mediation Analysis For A Survival Outcome With Time-Varying Exposures, Mediators, And Confounders, Sheng-Hsuan Lin, Jessica G. Young, Roger Logan, Tyler J. Vanderweele Aug 2016

Mediation Analysis For A Survival Outcome With Time-Varying Exposures, Mediators, And Confounders, Sheng-Hsuan Lin, Jessica G. Young, Roger Logan, Tyler J. Vanderweele

Harvard University Biostatistics Working Paper Series

We propose an approach to conduct mediation analysis for survival data with time-varying exposures, mediators, and confounders. We identify certain interventional direct and indirect effects through a survival mediational g-formula and describe the required assumptions. We also provide a feasible parametric approach along with an algorithm and software to estimate these effects. We apply this method to analyze the Framingham Heart Study data to investigate the causal mechanism of smoking on mortality through coronary artery disease. The risk ratio of smoking 30 cigarettes per day for ten years compared with no smoking on mortality is 2.34 (95 % CI = …


Sensitivity Of Trial Performance To Delay Outcomes, Accrual Rates, And Prognostic Variables Based On A Simulated Randomized Trial With Adaptive Enrichment, Tiachen Qian, Elizabeth Colantuoni, Aaron Fisher, Michael Rosenblum Aug 2016

Sensitivity Of Trial Performance To Delay Outcomes, Accrual Rates, And Prognostic Variables Based On A Simulated Randomized Trial With Adaptive Enrichment, Tiachen Qian, Elizabeth Colantuoni, Aaron Fisher, Michael Rosenblum

Johns Hopkins University, Dept. of Biostatistics Working Papers

Adaptive enrichment designs involve rules for restricting enrollment to a subset of the population during the course of an ongoing trial. This can be used to target those who benefit from the experimental treatment. To leverage prognostic information in baseline variables and short-term outcomes, we use a semiparametric, locally efficient estimator, and investigate its strengths and limitations compared to standard estimators. Through simulation studies, we assess how sensitive the trial performance (Type I error, power, expected sample size, trial duration) is to different design characteristics. Our simulation distributions mimic features of data from the Alzheimer’s Disease Neuroimaging Initiative, and involve …


Variable Selection For Estimating The Optimal Treatment Regimes In The Presence Of A Large Number Of Covariate, Baqun Zhang, Min Zhang Jul 2016

Variable Selection For Estimating The Optimal Treatment Regimes In The Presence Of A Large Number Of Covariate, Baqun Zhang, Min Zhang

The University of Michigan Department of Biostatistics Working Paper Series

Most of existing methods for optimal treatment regimes, with few exceptions, focus on estimation and are not designed for variable selection with the objective of optimizing treatment decisions. In clinical trials and observational studies, often numerous baseline variables are collected and variable selection is essential for deriving reliable optimal treatment regimes. Although many variable selection methods exist, they mostly focus on selecting variables that are important for prediction (predictive variables) instead of variables that have a qualitative interaction with treatment (prescriptive variables) and hence are important for making treatment decisions. We propose a variable selection method within a general classification …


Practical Targeted Learning From Large Data Sets By Survey Sampling, Patrice Bertail, Antoine Chambaz, Emilien Joly Jul 2016

Practical Targeted Learning From Large Data Sets By Survey Sampling, Patrice Bertail, Antoine Chambaz, Emilien Joly

U.C. Berkeley Division of Biostatistics Working Paper Series

We address the practical construction of asymptotic confidence intervals for smooth (i.e., pathwise differentiable), real-valued statistical
parameters by targeted learning from independent and identically
distributed data in contexts where sample size is so large that it poses
computational challenges. We observe some summary measure of all data and select a sub-sample from the complete data set by Poisson rejective sampling with unequal inclusion probabilities based on the summary measures. Targeted learning is carried out from the easier to handle sub-sample. We derive a central limit theorem for the targeted minimum loss estimator (TMLE) which enables the construction of …


Scalable Collaborative Targeted Learning For High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Antoine Chambaz, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan Jun 2016

Scalable Collaborative Targeted Learning For High-Dimensional Data, Cheng Ju, Susan Gruber, Samuel D. Lendle, Antoine Chambaz, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robust inference of a low-dimensional parameter in a large semi-parametric model relies on external estimators of infinite-dimensional features of the distribution of the data. Typically, only one of the latter is optimized for the sake of constructing a well behaved estimator of the low-dimensional parameter of interest. Optimizing more than one of them for the sake of achieving a better bias-variance trade-off in the estimation of the parameter of interest is the core idea driving the C-TMLE procedure.

The original C-TMLE procedure can be presented as a greedy forward stepwise algorithm. It does not scale well when the number $p$ …


Propensity Score Prediction For Electronic Healthcare Databases Using Super Learner And High-Dimensional Propensity Score Methods, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan Jun 2016

Propensity Score Prediction For Electronic Healthcare Databases Using Super Learner And High-Dimensional Propensity Score Methods, Cheng Ju, Mary Combs, Samuel D. Lendle, Jessica M. Franklin, Richard Wyss, Sebastian Schneeweiss, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The optimal learner for prediction modeling varies depending on the underlying data-generating distribution. Super Learner (SL) is a generic ensemble learning algorithm that uses cross-validation to select among a "library" of candidate prediction models. The SL is not restricted to a single prediction model, but uses the strengths of a variety of learning algorithms to adapt to different databases. While the SL has been shown to perform well in a number of settings, it has not been thoroughly evaluated in large electronic healthcare databases that are common in pharmacoepidemiology and comparative effectiveness research. In this study, we applied and evaluated …


Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. Van Der Laan Jun 2016

Tmle For Marginal Structural Models Based On An Instrument, Boriska Toth, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider estimation of a causal effect of a possibly continuous treatment when treatment assignment is potentially subject to unmeasured confounding, but an instrumental variable is available. Our focus is on estimating heterogeneous treatment effects, so that the treatment effect can be a function of an arbitrary subset of the observed covariates. One setting where this framework is especially useful is with clinical outcomes. Allowing the causal dose-response curve to depend on a subset of the covariates, we define our parameter of interest to be the projection of the true dose-response curve onto a user-supplied working marginal structural model. We …


A Powerful Statistical Framework For Generalization Testing In Gwas, With Application To The Hchs/Sol, Tamar Sofer, Ruth Heller, Marina Bogomolov, Christy L. Avery, Mariaelisa Graff, Kari E. North, Alex Reiner, Timothy A. Thornton, Kenneth Rice, Yoav Benjamini, Cathy C. Laurie, Kathleen F. Kerr Jun 2016

A Powerful Statistical Framework For Generalization Testing In Gwas, With Application To The Hchs/Sol, Tamar Sofer, Ruth Heller, Marina Bogomolov, Christy L. Avery, Mariaelisa Graff, Kari E. North, Alex Reiner, Timothy A. Thornton, Kenneth Rice, Yoav Benjamini, Cathy C. Laurie, Kathleen F. Kerr

UW Biostatistics Working Paper Series

In GWAS, “generalization” is the replication of genotype-phenotype association in a population with different ancestry than the population in which it was first identified. The standard for reporting findings from a GWAS requires a two-stage design, in which discovered associations are replicated in an independent follow-up study. Current practices for declaring generalizations rely on testing associations while controlling the Family Wise Error Rate (FWER) in the discovery study, then separately controlling error measures in the follow-up study. While this approach limits false generalizations, we show that it does not guarantee control over the FWER or False Discovery Rate (FDR) of …


Facets: Allele-Specific Copy Number And Clonal Heterogeneity Analysis Tool Estimates For High-Throughput Dna Sequencing, Ronglai Shen, Venkatraman Seshan May 2016

Facets: Allele-Specific Copy Number And Clonal Heterogeneity Analysis Tool Estimates For High-Throughput Dna Sequencing, Ronglai Shen, Venkatraman Seshan

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Allele-specific copy number analysis (ASCN) from next generation sequenc- ing (NGS) data can greatly extend the utility of NGS beyond the iden- tification of mutations to precisely annotate the genome for the detection of homozygous/heterozygous deletions, copy-neutral loss-of-heterozygosity (LOH), allele-specific gains/amplifications. In addition, as targeted gene panels are increasingly used in clinical sequencing studies for the detection of “actionable” mutations and copy number alterations to guide treatment decisions, accurate, tumor purity-, ploidy-, and clonal heterogeneity-adjusted integer copy number calls are greatly needed to more reliably interpret NGS- based cancer gene copy number data in the context of clinical sequencing. We …


Interpretable High-Dimensional Inference Via Score Maximization With An Application In Neuroimaging, Simon N. Vandekar, Philip T. Reiss, Russell T. Shinohara May 2016

Interpretable High-Dimensional Inference Via Score Maximization With An Application In Neuroimaging, Simon N. Vandekar, Philip T. Reiss, Russell T. Shinohara

UPenn Biostatistics Working Papers

In the fields of neuroimaging and genetics a key goal is testing the association of a single outcome with a very high-dimensional imaging or genetic variable. Oftentimes summary measures of the high-dimensional variable are created to sequentially test and localize the association with the outcome. In some cases, the results for summary measures are significant, but subsequent tests used to localize differences are underpowered and do not identify regions associated with the outcome. We propose a generalization of Rao's score test based on maximizing the score statistic in a linear subspace of the parameter space. If the test rejects the …


Methods For Dealing With Death And Missing Data, And For Standardizing Different Health Variables In Longitudinal Datasets: The Cardiovascular Health Study, Paula Diehr Apr 2016

Methods For Dealing With Death And Missing Data, And For Standardizing Different Health Variables In Longitudinal Datasets: The Cardiovascular Health Study, Paula Diehr

UW Biostatistics Working Paper Series

Longitudinal studies of older adults usually need to account for deaths and missing data. The study databases often include multiple health-related variables, whose trends over time are hard to compare because they were measured on different scales. Here we present a unified approach to these three problems that was developed and used in the Cardiovascular Health Study. Data were first transformed to a new scale that had integer/ratio properties, and on which “dead” logically takes the value zero. Missing data were then imputed on this new scale, using each person’s own data over time. Imputation could thus be informed by …


Data-Adaptive Inference Of The Optimal Treatment Rule And Its Mean Reward. The Masked Bandit, Antoine Chambaz, Wenjing Zheng, Mark J. Van Der Laan Apr 2016

Data-Adaptive Inference Of The Optimal Treatment Rule And Its Mean Reward. The Masked Bandit, Antoine Chambaz, Wenjing Zheng, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

This article studies the data-adaptive inference of an optimal treatment rule. A treatment rule is an individualized treatment strategy in which treatment assignment for a patient is based on her measured baseline covariates. Eventually, a reward is measured on the patient. We also infer the mean reward under the optimal treatment rule. We do so in the so called non-exceptional case, i.e., assuming that there is no stratum of the baseline covariates where treatment is neither beneficial nor harmful, and under a companion margin assumption.

Our pivotal estimator, whose definition hinges on the targeted minimum loss estimation (TMLE) principle, actually …


A Weighted Instrumental Variable Estimator To Control For Instrument-Outcome Confounders, Douglas Lehmann, Yun Li, Rajiv Saran, Yi Li Apr 2016

A Weighted Instrumental Variable Estimator To Control For Instrument-Outcome Confounders, Douglas Lehmann, Yun Li, Rajiv Saran, Yi Li

The University of Michigan Department of Biostatistics Working Paper Series

No abstract provided.


Recommendation To Use Exact P-Values In Biomarker Discovery Research, Margaret Sullivan Pepe, Matthew F. Buas, Christopher I. Li, Garnet L. Anderson Apr 2016

Recommendation To Use Exact P-Values In Biomarker Discovery Research, Margaret Sullivan Pepe, Matthew F. Buas, Christopher I. Li, Garnet L. Anderson

UW Biostatistics Working Paper Series

Background: In biomarker discovery studies, markers are ranked for validation using P-values. Standard P-value calculations use normal approximations that may not be valid for small P-values and small sample sizes common in discovery research.

Methods: We compared exact P-values, valid by definition, with normal and logit-normal approximations in a simulated study of 40 cases and 160 controls. The key measure of biomarker performance was sensitivity at 90% specificity. Data for 3000 uninformative markers and 30 true markers were generated randomly, with 10 replications of the simulation. We also analyzed real data on 2371 antibody array markers …


One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan, Susan Gruber Mar 2016

One-Step Targeted Minimum Loss-Based Estimation Based On Universal Least Favorable One-Dimensional Submodels, Mark J. Van Der Laan, Susan Gruber

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider a study in which one observes n independent and identically distributed random variables whose probability distribution is known to be an element of a particular statistical model, and one is concerned with estimation of a particular real valued pathwise differentiable target parameter of this data probability distribution. The targeted maximum likelihood estimator (TMLE) is an asymptotically efficient substitution estimator obtained by constructing a so called least favorable parametric submodel through an initial estimator with score, at zero fluctuation of the initial estimator, that spans the efficient influence curve, and iteratively maximizing the corresponding parametric likelihood till no more updates …