Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistical Theory

COBRA

2006

Keyword
Publication

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins Nov 2006

Properties Of Monotonic Effects, Tyler J. Vanderweele, James M. Robins

COBRA Preprint Series

Various relationships are shown hold between monotonic effects and weak monotonic effects and the monotonicity of certain conditional expectations. This relationship is considered for both binary and non-binary variables. Counterexamples are provide to show that the results do not hold under less restrictive conditions. The ideas of monotonic effects are furthermore used to relate signed edges on a directed acyclic graph to qualitative effect modification.


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley Oct 2006

Large Cluster Asymptotics For Gee: Working Correlation Models, Hyoju Chung, Thomas Lumley

UW Biostatistics Working Paper Series

This paper presents large cluster asymptotic results for generalized estimating equations. The complexity of working correlation model is characterized in terms of the number of working correlation components to be estimated. When the cluster size is relatively large, we may encounter a situation where a high-dimensional working correlation matrix is modeled and estimated from the data. In the present asymptotic setting, the cluster size and the complexity of working correlation model grow with the number of independent clusters. We show the existence, weak consistency and asymptotic normality of marginal regression parameter estimators using the results of empirical process theory and …


Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg Oct 2006

Bayesian Hidden Markov Modeling Of Array Cgh Data, Subharup Guha, Yi Li, Donna Neuberg

Harvard University Biostatistics Working Paper Series

Genomic alterations have been linked to the development and progression of cancer. The technique of Comparative Genomic Hybridization (CGH) yields data consisting of fluorescence intensity ratios of test and reference DNA samples. The intensity ratios provide information about the number of copies in DNA. Practical issues such as the contamination of tumor cells in tissue specimens and normalization errors necessitate the use of statistics for learning about the genomic alterations from array-CGH data. As increasing amounts of array CGH data become available, there is a growing need for automated algorithms for characterizing genomic profiles. Specifically, there is a need for …


Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin Oct 2006

Targeted Maximum Likelihood Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose one observes a sample of independent and identically distributed observations from a particular data generating distribution. Suppose that one has available an estimate of the density of the data generating distribution such as a maximum likelihood estimator according to a given or data adaptively selected model. Suppose that one is concerned with estimation of a particular pathwise differentiable Euclidean parameter. A substitution estimator evaluating the parameter of the density estimator is typically too biased and might not even converge at the parametric rate: that is, the density estimator was targeted to be a good estimator of the density and …


Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li Sep 2006

Spatial Cluster Detection For Censored Outcome Data, Andrea J. Cook, Diane Gold, Yi Li

Harvard University Biostatistics Working Paper Series

No abstract provided.


Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan Sep 2006

Diagnosing Bias In The Inverse Probability Of Treatment Weighted Estimator Resulting From Violation Of Experimental Treatment Assignment, Yue Wang, Maya L. Petersen, David Bangsberg, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Inverse probability of treatment weighting (IPTW) is frequently used to estimate the causal effects of treatments and interventions. The consistency of the IPTW estimator relies not only on the well-recognized assumption of no unmeasured confounders (Sequential Randomization Assumption or SRA), but also on the assumption of experimentation in the assignment of treatment (Experimental Treatment Assignment or ETA). In finite samples, violations in the ETA assumption can occur due simply to chance; certain treatments become rare or non-existent for certain strata of the population. Such practical violations of the ETA assumption occur frequently in real data, and can result in significant …


Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan Sep 2006

Extending Marginal Structural Models Through Local, Penalized, And Additive Learning, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSMs) allow one to form causal inferences from data, by specifying a relationship between a treatment and the marginal distribution of a corresponding counterfactual outcome. Following their introduction in Robins (1997), MSMs have typically been fit after assuming a semiparametric model, and then estimating a finite dimensional parameter. van der Laan and Dudoit (2003) proposed to instead view MSM fitting not as a task of semiparametric parameter estimation, but of nonparametric function approximation. They introduced a class of causal effect estimators based on mapping loss functions suitable for the unavailable counterfactual data to those suitable for the …


Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen Sep 2006

Statistical Learning Of Origin-Specific Statically Optimal Individualized Treatment Rules, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider a longitudinal observational or controlled study in which one collects chronological data over time on n randomly sampled subjects. The time-dependent process one observes on each randomly sampled subject contains time-dependent covariates, time-dependent treatment actions, and an outcome process or single final outcome of interest. A statically optimal individualized treatment rule (as introduced in van der Laan, Petersen & Joffe (2005), Petersen & van der Laan (2006)) is a (unknown) treatment rule which at any point in time conditions on a user-supplied subset of the past, computes the future static treatment regimen that maximizes a (conditional) mean future outcome …


Predicting Future Responses Based On Possibly Misspecified Working Models, Tianxi Cai, Lu Tian, Scott D. Solomon, L.J. Wei Aug 2006

Predicting Future Responses Based On Possibly Misspecified Working Models, Tianxi Cai, Lu Tian, Scott D. Solomon, L.J. Wei

Harvard University Biostatistics Working Paper Series

No abstract provided.


The Combination Of Ecological And Case-Control Data, Sebastien Haneuse, Jon Wakefield Jul 2006

The Combination Of Ecological And Case-Control Data, Sebastien Haneuse, Jon Wakefield

UW Biostatistics Working Paper Series

Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special …


The Combination Of Ecological And Case-Control Data, Sebastien Haneuse, Jon Wakefield Jul 2006

The Combination Of Ecological And Case-Control Data, Sebastien Haneuse, Jon Wakefield

UW Biostatistics Working Paper Series

Ecological studies, in which data are available at the level of the group, rather than at the level of the individual, are susceptible to a range of biases due to their inability to characterize within-group variability in exposures and confounders. In order to overcome these biases, we propose a hybrid design in which ecological data are supplemented with a sample of individual-level case-control data. We develop the likelihood for this design and illustrate its benefits via simulation, both in bias reduction when compared to an ecological study, and in efficiency gains relative to a conventional case-control study. An interesting special …


Doubly Robust Censoring Unbiased Transformations, Daniel Rubin, Mark J. Van Der Laan Jun 2006

Doubly Robust Censoring Unbiased Transformations, Daniel Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider random design nonparametric regression when the response variable is subject to right censoring. Following the work of Fan and Gijbels (1994), a common approach to this problem is to apply what has been termed a censoring unbiased transformation to the data to obtain surrogate responses, and then enter these surrogate responses with covariate data into standard smoothing algorithms. Existing censoring unbiased transformations generally depend on either the conditional survival function of the response of interest, or that of the censoring variable. We show that a mapping introduced in another statistical context is in fact a censoring unbiased transformation …


A Method To Increase The Power Of Multiple Testing Procedures Through Sample Splitting, Daniel Rubin, Sandrine Dudoit, Mark J. Van Der Laan Jun 2006

A Method To Increase The Power Of Multiple Testing Procedures Through Sample Splitting, Daniel Rubin, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider the standard multiple testing problem where many hypotheses are to be tested, each hypothesis is associated with a test statistic, and large test statistics provide evidence against the null hypotheses. One proposal to provide probabilistic control of Type-I errors is the use of procedures ensuring that the expected number of false positives does not exceed a user-supplied threshold. Among such multiple testing procedures, we derive the ``most powerful'' method, meaning the test statistic cutoffs that maximize the expected number of true positives. Unfortunately, these optimal cutoffs depend on the true unknown data generating distribution, so could never be used …


Estimating The Integrated Likelihood Via Posterior Simulation Using The Harmonic Mean Identity, Adrian E. Raftery, Michael A. Newton, Jaya M. Satagopan, Pavel N. Krivitsky Apr 2006

Estimating The Integrated Likelihood Via Posterior Simulation Using The Harmonic Mean Identity, Adrian E. Raftery, Michael A. Newton, Jaya M. Satagopan, Pavel N. Krivitsky

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

The integrated likelihood (also called the marginal likelihood or the normalizing constant) is a central quantity in Bayesian model selection and model averaging. It is defined as the integral over the parameter space of the likelihood times the prior density. The Bayes factor for model comparison and Bayesian testing is a ratio of integrated likelihoods, and the model weights in Bayesian model averaging are proportional to the integrated likelihoods. We consider the estimation of the integrated likelihood from posterior simulation output, aiming at a generic method that uses only the likelihoods from the posterior simulation iterations. The key is the …


The Two-Sample Problem For Failure Rates Depending On A Continuous Mark: An Application To Vaccine Efficacy, Peter B. Gilbert, Ian W. Mckeague, Yanqing Sun Mar 2006

The Two-Sample Problem For Failure Rates Depending On A Continuous Mark: An Application To Vaccine Efficacy, Peter B. Gilbert, Ian W. Mckeague, Yanqing Sun

UW Biostatistics Working Paper Series

The efficacy of an HIV vaccine to prevent infection is likely to depend on the genetic variation of the exposing virus. This paper addresses the problem of using data on the HIV sequences that infect vaccine efficacy trial participants to 1) test for vaccine efficacy more powerfully than procedures that ignore the sequence data; and 2) evaluate the dependence of vaccine efficacy on the divergence of infecting HIV strains from the HIV strain that is contained in the vaccine. Because hundreds of amino acid sites in each HIV genome are sequenced, it is natural to treat the divergence (defined in …


Evaluating Prediction Rules For T-Year Survivors With Censored Regression Models, Hajime Uno, Tianxi Cai, Lu Tian, L.J. Wei Mar 2006

Evaluating Prediction Rules For T-Year Survivors With Censored Regression Models, Hajime Uno, Tianxi Cai, Lu Tian, L.J. Wei

Harvard University Biostatistics Working Paper Series

Suppose that we are interested in establishing simple, but reliable rules for predicting future t-year survivors via censored regression models. In this article, we present inference procedures for evaluating such binary classification rules based on various prediction precision measures quantified by the overall misclassification rate, sensitivity and specificity, and positive and negative predictive values. Specifically, under various working models we derive consistent estimators for the above measures via substitution and cross validation estimation procedures. Furthermore, we provide large sample approximations to the distributions of these nonsmooth estimators without assuming that the working model is correctly specified. Confidence intervals, for example, …


Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan Mar 2006

Multiple Tests Of Association With Biological Annotation Metadata, Sandrine Dudoit, Sunduz Keles, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a general and formal statistical framework for the multiple tests of associations between known fixed features of a genome and unknown parameters of the distribution of variable features of this genome in a population of interest. The known fixed gene-annotation profiles, corresponding to the fixed features of the genome, may concern Gene Ontology (GO) annotation, pathway membership, regulation by particular transcription factors, nucleotide sequences, or protein sequences. The unknown gene-parameter profiles, corresponding to the variable features of the genome, may be, for example, regression coefficients relating genome-wide transcript levels or DNA copy numbers to possibly censored biological and …


Regression Analysis For The Partial Area Under The Roc Curve, Tianxi Cai, Lori E. Dodd Feb 2006

Regression Analysis For The Partial Area Under The Roc Curve, Tianxi Cai, Lori E. Dodd

Harvard University Biostatistics Working Paper Series

No abstract provided.