Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 24 of 24

Full-Text Articles in Physical Sciences and Mathematics

Multiple Testing Procedures For Controlling Tail Probability Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Merrill D. Birkner Dec 2004

Multiple Testing Procedures For Controlling Tail Probability Error Rates, Sandrine Dudoit, Mark J. Van Der Laan, Merrill D. Birkner

U.C. Berkeley Division of Biostatistics Working Paper Series

The present article discusses and compares multiple testing procedures (MTP) for controlling Type I error rates defined as tail probabilities for the number (gFWER) and proportion (TPPFP) of false positives among the rejected hypotheses. Specifically, we consider the gFWER- and TPPFP-controlling MTPs proposed recently by Lehmann & Romano (2004) and in a series of four articles by Dudoit et al. (2004), van der Laan et al. (2004b,a), and Pollard & van der Laan (2004). The former Lehmann & Romano (2004) procedures are marginal, in the sense that they are based solely on the marginal distributions of the test statistics, i.e., …


Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan Dec 2004

Multiple Testing Procedures: R Multtest Package And Applications To Genomics, Katherine S. Pollard, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

The Bioconductor R package multtest implements widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics. The current version of multtest provides MTPs for tests concerning means, differences in means, and regression parameters in linear and Cox proportional hazards models. Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of false positives and rejected hypotheses. These error rates include tail probabilities …


Choice Of Monitoring Mechanism For Optimal Nonparametric Functional Estimation For Binary Data, Nicholas P. Jewell, Mark J. Van Der Laan, Stephen Shiboski Nov 2004

Choice Of Monitoring Mechanism For Optimal Nonparametric Functional Estimation For Binary Data, Nicholas P. Jewell, Mark J. Van Der Laan, Stephen Shiboski

U.C. Berkeley Division of Biostatistics Working Paper Series

Optimal designs of dose levels in order to estimate parameters from a model for binary response data have a long and rich history. These designs are based on parametric models. Here we consider fully nonparametric models with interest focused on estimation of smooth functionals using plug-in estimators based on the nonparametric maximum likelihood estimator. An important application of the results is the derivation of the optimal choice of the monitoring time distribution function for current status observation of a survival distribution. The optimal choice depends in a simple way on the dose response function and the form of the functional. …


Deletion/Substitution/Addition Algorithm For Partitioning The Covariate Space In Prediction, Annette Molinaro, Mark J. Van Der Laan Nov 2004

Deletion/Substitution/Addition Algorithm For Partitioning The Covariate Space In Prediction, Annette Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a new method for predicting censored (and non-censored) clinical outcomes from a highly-complex covariate space. Previously we suggested a unified strategy for predictor construction, selection, and performance assessment. Here we introduce a new algorithm which generates a piecewise constant estimation sieve of candidate predictors based on an intensive and comprehensive search over the entire covariate space. This algorithm allows us to elucidate interactions and correlation patterns in addition to main effects.


Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan Oct 2004

Multiple Testing And Data Adaptive Regression: An Application To Hiv-1 Sequence Data, Merrill D. Birkner, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Analysis of viral strand sequence data and viral replication capacity could potentially lead to biological insights regarding the replication ability of HIV-1. Determining specific target codons on the viral strand will facilitate the manufacturing of target specific antiretrovirals. Various algorithmic and analysis techniques can be applied to this application. We propose using multiple testing to find codons which have significant univariate associations with replication capacity of the virus. We also propose using a data adaptive multiple regression algorithm to obtain multiple predictions of viral replication capacity based on an entire mutant/non-mutant sequence profile. The data set to which these techniques …


Gllamm Manual, Sophia Rabe-Hesketh, Anders Skrondal, Andrew Pickles Oct 2004

Gllamm Manual, Sophia Rabe-Hesketh, Anders Skrondal, Andrew Pickles

U.C. Berkeley Division of Biostatistics Working Paper Series

This manual describes a Stata program gllamm that can estimate Generalized Linear Latent and Mixed Models (GLLAMMs). GLLAMMs are a class of multilevel latent variable models for (multivariate) responses of mixed type including continuous responses, counts, duration/survival data, dichotomous, ordered and unordered categorical responses and rankings. The latent variables (common factors or random effects) can be assumed to be discrete or to have a multivariate normal distribution. Examples of models in this class are multilevel generalized linear models or generalized linear mixed models, multilevel factor or latent trait models, item response models, latent class models and multilevel structural equation models. …


Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan Oct 2004

Data Adaptive Estimation Of The Treatment Specific Mean, Yue Wang, Oliver Bembom, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

An important problem in epidemiology and medical research is the estimation of the causal effect of a treatment action at a single point in time on the mean of an outcome, possibly within strata of the target population defined by a subset of the baseline covariates. Current approaches to this problem are based on marginal structural models, i.e., parametric models for the marginal distribution of counterfactural outcomes as a function of treatment and effect modifiers. The various estimators developed in this context furthermore each depend on a high-dimensional nuisance parameter whose estimation currently also relies on parametric models. Since misspecification …


History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen Sep 2004

History-Adjusted Marginal Structural Models And Statically-Optimal Dynamic Treatment Regimes, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a treatment. These models, introduced by Robins, model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at a final time point. However, the utility of these models for some applications has been limited by their inability to incorporate modification of the causal effect of treatment by time-varying covariates. …


Estimating A Survival Distribution With Current Status Data And High-Dimensional Covariates, Mark J. Van Der Laan, Aad Van Der Vaart Sep 2004

Estimating A Survival Distribution With Current Status Data And High-Dimensional Covariates, Mark J. Van Der Laan, Aad Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the inverse problem of estimating a survival distribution when the survival times are only observed to be in one of the intervals of a random bisection of the time axis. We are particularly interested in the case that high-dimensional and/or time-dependent covariates are available, and/or the survival events and censoring times are only conditionally independent given the covariate process. The method of estimation consists of regularizing the survival distribution by taking the primitive function or smoothing, estimating the regularized parameter by using estimating equations, and finally recovering an estimator for the parameter of interest.


Estimation Of Treatment Effects In Randomized Trials With Noncompliance And A Dichotomous Outcome , Mark J. Van Der Laan, Alan E. Hubbard, Nicholas P. Jewell Sep 2004

Estimation Of Treatment Effects In Randomized Trials With Noncompliance And A Dichotomous Outcome , Mark J. Van Der Laan, Alan E. Hubbard, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a class of estimators of the treatment effect on a dichotomous outcome among the treated subjects within covariate and treatment arm strata in randomized trials with non-compliance. Recent articles by Vansteelandt and Goethebeur (2003) and Robins and Rotnitzky (2004) have presented consistent and asymptotically linear estimators of a causal odds ratio, which rely, beyond correct specification of a model for the causal odds ratio, on a correctly specified model for a potentially high dimensional nuisance parameter. In this article we propose consistent, asymptotically linear and locally efficient estimators of a causal relative risk and a new parameter -- …


Estimation Of Direct And Indirect Causal Effects In Longitudinal Studies, Mark J. Van Der Laan, Maya L. Petersen Aug 2004

Estimation Of Direct And Indirect Causal Effects In Longitudinal Studies, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders, Robins & Greenland (1992) and Pearl (2000), develop two identifiability results for direct and indirect causal effects. They define an …


Linear Life Expectancy Regression With Censored Data, Ying Qing Chen, Su-Chun Cheng Aug 2004

Linear Life Expectancy Regression With Censored Data, Ying Qing Chen, Su-Chun Cheng

U.C. Berkeley Division of Biostatistics Working Paper Series

Life expectancy, i.e., mean residual life function, has been of important practical and scientific interests to characterise the distribution of residual life. Regression models are often needed to model the association between life expectancy and its covariates. In this article, we consider a linear mean residual life model and further developed some inference procedures in presence of censoring. The new model and proposed inference procedure will be demonstrated by numerical examples and application to the well-known Stanford heart transplant data. Additional semiparametric efficiency calculation and information bound are also considered.


A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao Jul 2004

A Note On Empirical Likelihood Inference Of Residual Life Regression, Ying Qing Chen, Yichuan Zhao

U.C. Berkeley Division of Biostatistics Working Paper Series

Mean residual life function, or life expectancy, is an important function to characterize distribution of residual life. The proportional mean residual life model by Oakes and Dasu (1990) is a regression tool to study the association between life expectancy and its associated covariates. Although semiparametric inference procedures have been proposed in the literature, the accuracy of such procedures may be low when the censoring proportion is relatively large. In this paper, the semiparametric inference procedures are studied with an empirical likelihood ratio method. An empirical likelihood confidence region is constructed for the regression parameters. The proposed method is further compared …


Semiparametric Quantitative-Trait-Locus Mapping: I. On Functional Growth Curves, Ying Qing Chen, Rongling Wu Jul 2004

Semiparametric Quantitative-Trait-Locus Mapping: I. On Functional Growth Curves, Ying Qing Chen, Rongling Wu

U.C. Berkeley Division of Biostatistics Working Paper Series

The genetic study of certain quantitative traits in growth curves as a function of time has recently been of major scientific interest to explore the developmental evolution processes of biological subjects. Various parametric approaches in the statistical literature have been proposed to study the quantitative-trait-loci (QTL) mapping of the growth curves as multivariate outcomes. In this article, we view the growth curves as functional quantitative traits and propose some semiparametric models to relax the strong parametric assumptions which may not be always practical in reality. Appropriate inference procedures are developed to estimate the parameters of interest which characterise the possible …


Semiparametric Quantitative-Trait-Locus Mapping: Ii. On Censored Age-At-Onset, Ying Qing Chen, Chengcheng Hu, Rongling Wu Jul 2004

Semiparametric Quantitative-Trait-Locus Mapping: Ii. On Censored Age-At-Onset, Ying Qing Chen, Chengcheng Hu, Rongling Wu

U.C. Berkeley Division of Biostatistics Working Paper Series

In genetic studies, the variation in genotypes may not only affect different inheritance patterns in qualitative traits, but may also affect the age-at-onset as quantitative trait. In this article, we use standard cross designs, such as backcross or F2, to propose some hazard regression models, namely, the additive hazards model in quantitative trait loci mapping for age-at-onset, although the developed method can be extended to more complex designs. With additive invariance of the additive hazards models in mixture probabilities, we develop flexible semiparametric methodologies in interval regression mapping without heavy computing burden. A recently developed multiple comparison procedures is adapted …


Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit Jun 2004

Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distance-based LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than …


Mean Response Models Of Repeated Measurements In Presence Of Varying Effectiveness Onset, Ying Qing Chen, Su-Chun Cheng Jun 2004

Mean Response Models Of Repeated Measurements In Presence Of Varying Effectiveness Onset, Ying Qing Chen, Su-Chun Cheng

U.C. Berkeley Division of Biostatistics Working Paper Series

Repeated measurements are often collected over time to evaluate treatment efficacy in clinical trials. Most of the statistical models of the repeated measurements have been focusing on their mean response as function of time. These models usually assume that the treatment has persistent effect of constant additivity or multiplicity on the mean response functions throughout the observation period of time. In reality, however, such assumption may be confounded by the potential existence of the so-called effectiveness action onset, although they are often unobserved or difficult to obtain. Instead of including nonparametric time-varying coefficients in the mean response models, we propose …


Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley Jun 2004

Multiple Testing Methods For Chip-Chip High Density Oligonucleotide Array Data, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Simon E. Cawley

U.C. Berkeley Division of Biostatistics Working Paper Series

Cawley et al. (2004) have recently mapped the locations of binding sites for three transcription factors along human chromosomes 21 and 22 using ChIP-Chip experiments. ChIP-Chip experiments are a new approach to the genome-wide identification of transcription factor binding sites and consist of chromatin (Ch) immunoprecipitation (IP) of transcription factor-bound genomic DNA followed by high density oligonucleotide hybridization (Chip) of the IP-enriched DNA. We investigate the ChIP-Chip data structure and propose methods for inferring the location of transcription factor binding sites from these data. The proposed methods involve testing for each probe whether it is part of a bound sequence …


Semiparametric Regression Analysis Of Mean Residual Life With Censored Survival Data, Ying Qing Chen, Su-Chun Cheng May 2004

Semiparametric Regression Analysis Of Mean Residual Life With Censored Survival Data, Ying Qing Chen, Su-Chun Cheng

U.C. Berkeley Division of Biostatistics Working Paper Series

As a function of time t, mean residual life is the remaining life expectancy of a subject given survival up to t. The proportional mean residual life model, proposed by Oakes & Dasu (1990), provides an alternative to the Cox proportional hazards model to study the association between survival times and covariates. In the presence of censoring, we develop semiparametric inference procedures for the regression coefficients of the Oakes-Dasu model using martingale theory for counting processes. We also present simulation studies and an application to the Veterans' Administration lung cancer data.


Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe Mar 2004

Regulatory Motif Finding By Logic Regression, Sunduz Keles, Mark J. Van Der Laan, Chris Vulpe

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple transcription factors coordinately control transcriptional regulation of genes in eukaryotes. Although multiple computational methods consider the identification of individual transcription factor binding sites (TFBSs), very few focus on the interactions between these sites. We consider finding transcription factor binding sites and their context specific interactions using microarray gene expression data. We devise a hybrid approach called LogicMotif composed of a TFBS identification method combined with the new regression methodology logic regression of Ruczinski et al. (2003). LogicMotif has two steps: First potential binding sites are identified from transcription control regions of genes of interest. Various available methods can be …


A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan Mar 2004

A Statistical Method For Constructing Transcriptional Regulatory Networks Using Gene Expression And Sequence Data , Biao Xing, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Transcriptional regulation is one of the most important means of gene regulation. Uncovering transcriptional regulatory network helps us to understand the complex cellular process. In this paper, we describe a comprehensive statistical approach for constructing the transcriptional regulatory network using data of gene expression, promoter sequence, and transcription factor binding sites. Our simulation studies show that the overall and false positive error rates in the estimated transcriptional regulatory network are expected to be small if the systematic noise in the constructed feature matrix is small. Our analysis based on 658 microarray experiments on yeast gene expression programs and 46 transcription …


Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan Mar 2004

Loss-Based Cross-Validated Deletion/Substitution/Addition Algorithms In Estimation, Sandra E. Sinisi, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In van der Laan and Dudoit (2003) we propose and theoretically study a unified loss function based statistical methodology, which provides a road map for estimation and performance assessment. Given a parameter of interest which can be described as the minimizer of the population mean of a loss function, the road map involves as important ingredients cross-validation for estimator selection and minimizing over subsets of basis functions the empirical risk of the subset-specific estimator of the parameter of interest, where the basis functions correspond to a parameterization of a specified subspace of the complete parameter space. In this article we …


The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart Feb 2004

The Cross-Validated Adaptive Epsilon-Net Estimator, Mark J. Van Der Laan, Sandrine Dudoit, Aad W. Van Der Vaart

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Assume that the parameter of interest can be defined as the minimizer, over a suitably defined parameter space, of the expectation (with respect to the distribution of the random variable) of a particular (loss) function of a candidate parameter value and the random variable. Examples of commonly used loss functions are the squared error loss function in regression and the negative log-density loss function in density estimation. Minimizing the empirical risk (i.e., the empirical mean of the loss function) over the entire parameter space …


Multiple Testing. Part Iii. Procedures For Control Of The Generalized Family-Wise Error Rate And Proportion Of False Positives, Mark J. Van Der Laan, Sandrine Dudoit, Katherine S. Pollard Jan 2004

Multiple Testing. Part Iii. Procedures For Control Of The Generalized Family-Wise Error Rate And Proportion Of False Positives, Mark J. Van Der Laan, Sandrine Dudoit, Katherine S. Pollard

U.C. Berkeley Division of Biostatistics Working Paper Series

The accompanying articles by Dudoit et al. (2003b) and van der Laan et al. (2003) provide single-step and step-down resampling-based multiple testing procedures that asymptotically control the family-wise error rate (FWER) for general null hypotheses and test statistics. The proposed procedures fundamentally differ from existing approaches in the choice of null distribution for deriving cut-offs for the test statistics and are shown to provide asymptotic control of the FWER under general data generating distributions, without the need for conditions such as subset pivotality. In this article, we show that any multiple testing procedure (asymptotically) controlling the FWER at level alpha …