Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

COBRA

U.C. Berkeley Division of Biostatistics Working Paper Series

2005

Discipline
Keyword

Articles 1 - 29 of 29

Full-Text Articles in Physical Sciences and Mathematics

Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith Dec 2005

Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith

U.C. Berkeley Division of Biostatistics Working Paper Series

A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of "interesting" proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth …


Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard Nov 2005

Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. Methods based on marginal null distributions (i.e., marginal p-values) are attractive since the marginal p-values can be based on a user supplied choice of marginal null distributions and they are computationally trivial, but they, by necessity, are known to either be conservative or to rely on assumptions about the dependence structure between the test-statistics. Resampling based multiple testing (Westfall and Young, 1993) involves sampling from a joint null …


Data Adaptive Pathway Testing, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Nov 2005

Data Adaptive Pathway Testing, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

A majority of diseases are caused by a combination of factors, for example, composite genetic mutation profiles have been found in many cases to predict a deleterious outcome. There are several statistical techniques that have been used to analyze these types of biological data. This article implements a general strategy which uses data adaptive regression methods to build a specific pathway model, thus predicting a disease outcome by a combination of biological factors and assesses the significance of this model, or pathway, by using a permutation based null distribution. We also provide several simulation comparisons with other techniques. In addition, …


Correspondences Between Regression Models For Complex Binary Outcomes And Those For Structured Multivariate Survival Analyses, Nicholas P. Jewell Nov 2005

Correspondences Between Regression Models For Complex Binary Outcomes And Those For Structured Multivariate Survival Analyses, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

Doksum and Gasko [5] described a one-to-one correspondence between regression models for binary outcomes and those for continuous time survival analyses. This correspondence has been exploited heavily in the analysis of current status data (Jewell and van der Laan [11], Shiboski [18]). Here, we explore similar correspondences for complex survival models and categorical regression models for polytomous data. We include discussion of competing risks and progressive multi-state survival random variables.


Application Of A Variable Importance Measure Method To Hiv-1 Sequence Data, Merrill D. Birkner, Mark J. Van Der Laan Nov 2005

Application Of A Variable Importance Measure Method To Hiv-1 Sequence Data, Merrill D. Birkner, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

van der Laan (2005) proposed a method to construct variable importance measures and provided the respective statistical inference. This technique involves determining the importance of a variable in predicting an outcome. This method can be applied as an inverse probability of treatment weighted (IPTW) or double robust inverse probability of treatment weighted (DR-IPTW) estimator. A respective significance of the estimator is determined by estimating the influence curve and hence determining the corresponding variance and p-value. This article applies the van der Laan (2005) variable importance measures and corresponding inference to HIV-1 sequence data. In this data application, protease and reverse …


A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan Nov 2005

A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the random design nonparametric regression problem when the response variable is subject to a general mode of missingness or censoring. A traditional approach to such problems is imputation, in which the missing or censored responses are replaced by well-chosen values, and then the resulting covariate/response data are plugged into algorithms designed for the uncensored setting. We present a general methodology for imputation with the property of double robustness, in that the method works well if either a parameter of the full data distribution (covariate and response distribution) or a parameter of the censoring mechanism is well approximated. These …


Efficacy Studies Of Malaria Treatments In Africa: Efficient Estimation With Missing Indicators Of Failure, Rhoderick N. Machekano, Grant Dorsey, Alan E. Hubbard Nov 2005

Efficacy Studies Of Malaria Treatments In Africa: Efficient Estimation With Missing Indicators Of Failure, Rhoderick N. Machekano, Grant Dorsey, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Efficacy studies of malaria treatments can be plagued by indeterminate outcomes for some patients. The study motivating this paper defines the outcome of interest (treatment failure) as recrudescence and for some subjects, it is unclear whether a recurrence of malaria is due to that or new infection. This results in a specific kind of missing data. The effect of missing data in causal inference problems is widely recognized. Methods that adjust for possible bias from missing data include a variety of imputation procedures (extreme case analysis, hot-deck, single and multiple imputation), inverse weighting methods, and likelihood based methods (data augmentation, …


A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit Oct 2005

A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have enabled the recent completion of the International HapMap Project (Phase I), which has stimulated much interest in studying genome-wide linkage disequilibrium (LD) patterns. Conventional LD measures, such as D' and r-square, are two-point measurements, and their relationship with physical distance is highly noisy. We propose a new LD measure, defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of the new LD measure, which takes into consideration the dependence structure of the observed data, is developed and …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Cross-Validated Bagged Prediction Of Survival, Sandra E. Sinisi, Romain Neugebauer, Mark J. Van Der Laan Sep 2005

Cross-Validated Bagged Prediction Of Survival, Sandra E. Sinisi, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this article, we show how to apply our previously proposed Deletion/Substitution/Addition algorithm in the context of right-censoring for the prediction of survival. Furthermore, we introduce how to incorporate bagging into the algorithm to obtain a cross-validated bagged estimator. The method is used for predicting the survival time of patients with diffuse large B-cell lymphoma based on gene expression variables.


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Statistical Inference For Variable Importance, Mark J. Van Der Laan Aug 2005

Statistical Inference For Variable Importance, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Many statistical problems involve the learning of an importance/effect of a variable for predicting an outcome of interest based on observing a sample of n independent and identically distributed observations on a list of input variables and an outcome. For example, though prediction/machine learning is, in principle, concerned with learning the optimal unknown mapping from input variables to an outcome from the data, the typical reported output is a list of importance measures for each input variable. The typical approach in prediction has been to learn the unknown optimal predictor from the data and derive, for each of the input …


Survival Point Estimate Prediction In Matched And Non-Matched Case-Control Subsample Designed Studies, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore, Karla Kerlikowske Aug 2005

Survival Point Estimate Prediction In Matched And Non-Matched Case-Control Subsample Designed Studies, Annette M. Molinaro, Mark J. Van Der Laan, Dan H. Moore, Karla Kerlikowske

U.C. Berkeley Division of Biostatistics Working Paper Series

Providing information about the risk of disease and clinical factors that may increase or decrease a patient's risk of disease is standard medical practice. Although case-control studies can provide evidence of strong associations between diseases and risk factors, clinicians need to be able to communicate to patients the age-specific risks of disease over a defined time interval for a set of risk factors.

An estimate of absolute risk cannot be determined from case-control studies because cases are generally chosen from a population whose size is not known (necessary for calculation of absolute risk) and where duration of follow-up is not …


Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Aug 2005

Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing multiple hypotheses is important in high-dimensional biological studies. In these situations, one is often interested in controlling the Type-I error rate, such as the proportion of false positives to total rejections (TPPFP) at a specific level, alpha. This article will present an application of the E-Bayes/Bootstrap TPPFP procedure, presented in van der Laan et al. (2005), which controls the tail probability of the proportion of false positives (TPPFP), on two biological datasets. The two data applications include firstly, the application to a mass-spectrometry dataset of two leukemia subtypes, AML and ALL. The protein data measurements include intensity and …


Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan Aug 2005

Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan Jul 2005

G-Computation Estimation Of Nonparametric Causal Effects On Time-Dependent Mean Outcomes In Longitudinal Studies, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Two approaches to Causal Inference based on Marginal Structural Models (MSM) have been proposed. They provide different representations of causal effects with distinct causal parameters. Initially, a parametric MSM approach to Causal Inference was developed: it relies on correct specification of a parametric MSM. Recently, a new approach based on nonparametric MSM was introduced. This later approach does not require the assumption of a correctly specified MSM and thus is more realistic if one believes that correct specification of a parametric MSM is unlikely in practice. However, this approach was described only for investigating causal effects on mean outcomes collected …


A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan Jun 2005

A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robins' causal inference theory assumes existence of treatment specific counterfactual variables so that the observed data augmented by the counterfactual data will satisfy a consistency and a randomization assumption. In this paper we provide an explicit function that maps the observed data into a counterfactual variable which satisfies the consistency and randomization assumptions. This offers a practically useful imputation method for counterfactuals. Gill & Robins [2001]'s construction of counterfactuals can be used as an imputation method in principle, but it is very hard to implement in practice. Robins [1987] shows that the counterfactual distribution can be identified from the observed …


Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen Jun 2005

Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, Breiman (1996a) introduced bootstrap aggregating (bagging) as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying the estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to deal with the curse of dimensionality, typical practice has been to …


Estimating Function Based Cross-Validation And Learning, Mark J. Van Der Laan, Daniel Rubin May 2005

Estimating Function Based Cross-Validation And Learning, Mark J. Van Der Laan, Daniel Rubin

U.C. Berkeley Division of Biostatistics Working Paper Series

Suppose that we observe a sample of independent and identically distributed realizations of a random variable. Given a model for the data generating distribution, assume that the parameter of interest can be characterized as the parameter value which makes the population mean of a possibly infinite dimensional estimating function equal to zero. Given a collection of candidate estimators of this parameter, and specification of the vector estimating function, we propose cross-validation criteria for selecting among these estimators. This cross-validation criteria is defined as the Euclidean norm of the empirical mean over the validation sample of the estimating function at the …


Prognosis Of Stage Ii Colon Cancer By Non-Neoplastic Mucosa Gene Expresssion Profiling, Alain Barrier, Sandrine Dudoit, Et Al. May 2005

Prognosis Of Stage Ii Colon Cancer By Non-Neoplastic Mucosa Gene Expresssion Profiling, Alain Barrier, Sandrine Dudoit, Et Al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Aims. This study assessed the possibility to build a prognosis predictor, based on non-neoplastic mucosa microarray gene expression measures, in stage II colon cancer patients. Materials and Methods. Non-neoplastic colonic mucosa mRNA samples from 24 patients (10 with a metachronous metastasis, 14 with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. The k-nearest neighbor method was used for prognosis prediction using microarray gene expression measures. Leave-one-out cross-validation was used to select the number of neighbors and number of informative genes to include in the predictor. Based on this information, a prognosis predictor was proposed and its accuracy estimated by …


Colon Cancer Prognosis Prediction By Gene Expression Profiling, Alain Barrier, Sandrine Dudoit, Et Al. May 2005

Colon Cancer Prognosis Prediction By Gene Expression Profiling, Alain Barrier, Sandrine Dudoit, Et Al.

U.C. Berkeley Division of Biostatistics Working Paper Series

Aims. This study assessed the possibility to build a prognosis predictor, based on microarray gene expression measures, in stage II and III colon cancer patients. Materials and Methods. Tumour (T) and non-neoplastic mucosa (NM) mRNA samples from 18 patients (9 with a recurrence, 9 with no recurrence) were profiled using the Affymetrix HGU133A GeneChip. The k-nearest neighbour method was used for prognosis prediction using T and NM gene expression measures. Six-fold cross-validation was applied to select the number of neighbours and the number of informative genes to include in the predictors. Based on this information, one T-based and one NM-based …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan Apr 2005

Survival Ensembles, Torsten Hothorn, Peter Buhlmann, Sandrine Dudoit, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a unified and flexible framework for ensemble learning in the presence of censoring. For right-censored data, we introduce a random forest algorithm and a generic gradient boosting algorithm for the construction of prognostic models. The methodology is utilized for predicting the survival time of patients suffering from acute myeloid leukemia based on clinical and genetic covariates. Furthermore, we compare the diagnostic capabilities of the proposed censored data random forest and boosting methods applied to the recurrence free survival time of node positive breast cancer patients with previously published findings.


Nonparametric Estimation Of The Case Fatality Ratio With Competing Risks Data: An Application To Severe Acute Respiratory Syndome (Sars) , Nicholas P. Jewell, Xiudong Lei, A. C. Ghani, C. A. Donnelly, G. M. Leung, L. M. Ho, B. Cowling, A. J. Hedley Apr 2005

Nonparametric Estimation Of The Case Fatality Ratio With Competing Risks Data: An Application To Severe Acute Respiratory Syndome (Sars) , Nicholas P. Jewell, Xiudong Lei, A. C. Ghani, C. A. Donnelly, G. M. Leung, L. M. Ho, B. Cowling, A. J. Hedley

U.C. Berkeley Division of Biostatistics Working Paper Series

For diseases with some level of associated mortality, the case fatality ratio measures the proportion of diseased individuals who die from the disease. In principle, it is straightforward to estimate this quantity from individual follow-up data that provides times from onset to death or recovery. In particular, in a competing risks context, the case fatality ratio is defined by the limiting value of the sub-distribution function, associated with death, at infinity. When censoring is present, however, estimation of this quantity is complicated by the possibility of little information in the right tail of of the sub-distribution function, requiring use of …


Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard Mar 2005

Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new resampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of …


A Causal Inference Approach For Constructing Transcriptional Regulatory Networks, Biao Xing, Mark J. Van Der Laan Mar 2005

A Causal Inference Approach For Constructing Transcriptional Regulatory Networks, Biao Xing, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Transcriptional regulatory networks specify the interactions among regulatory genes and between regulatory genes and their target genes. Discovering transcriptional regulatory networks helps us to understand the underlying mechanism of complex cellular processes and responses. In this paper, we describe a causal inference approach for constructing transcriptional regulatory networks using gene expression data, promoter sequences and information on transcription factor binding sites. The method rst identies active transcription factors under each individual experiment using a feature selection approach similar to Bussemaker et al. (2001), Keles et al. (2002) and Conlon et al. (2003). Transcription factors are viewed as `treatments' and gene …


Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan Jan 2005

Cluster Analysis Of Genomic Data With Applications In R, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In this paper, we provide an overview of existing partitioning and hierarchical clustering algorithms in R. We discuss statistical issues and methods in choosing the number of clusters, the choice of clustering algorithm, and the choice of dissimilarity matrix. In particular, we illustrate how the bootstrap can be employed as a statistical method in cluster analysis to establish the reproducibility of the clusters and the overall variability of the followed procedure. We also show how to visualize a clustering result by plotting ordered dissimilarity matrices in R. We present a new R package, hopach, which implements the hybrid clustering method, …


Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Jan 2005

Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

This chapter proposes widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; van der Laan et al., 2004a,b; Pollard and van der Laan, 2004; Pollard et al., 2005). Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of Type I errors, V_n, and rejected hypotheses, R_n. These error rates include: …