Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith Dec 2005

Issues Of Processing And Multiple Testing Of Seldi-Tof Ms Proteomic Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan, Christine F. Skibola, Christine M. Hegedus, Martyn T. Smith

U.C. Berkeley Division of Biostatistics Working Paper Series

A new data filtering method for SELDI-TOF MS proteomic spectra data is described. We examined technical repeats (2 per subject) of intensity versus m/z (mass/charge) of bone marrow cell lysate for two groups of childhood leukemia patients: acute myeloid leukemia (AML) and acute lymphoblastic leukemia (ALL). As others have noted, the type of data processing as well as experimental variability can have a disproportionate impact on the list of "interesting" proteins (see Baggerly et al. (2004)). We propose a list of processing and multiple testing techniques to correct for 1) background drift; 2) filtering using smooth regression and cross-validated bandwidth …


Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard Nov 2005

Quantile-Function Based Null Distribution In Resampling Based Multiple Testing, Mark J. Van Der Laan, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. Methods based on marginal null distributions (i.e., marginal p-values) are attractive since the marginal p-values can be based on a user supplied choice of marginal null distributions and they are computationally trivial, but they, by necessity, are known to either be conservative or to rely on assumptions about the dependence structure between the test-statistics. Resampling based multiple testing (Westfall and Young, 1993) involves sampling from a joint null …


A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan Nov 2005

A General Imputation Methodology For Nonparametric Regression With Censored Data, Dan Rubin, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We consider the random design nonparametric regression problem when the response variable is subject to a general mode of missingness or censoring. A traditional approach to such problems is imputation, in which the missing or censored responses are replaced by well-chosen values, and then the resulting covariate/response data are plugged into algorithms designed for the uncensored setting. We present a general methodology for imputation with the property of double robustness, in that the method works well if either a parameter of the full data distribution (covariate and response distribution) or a parameter of the censoring mechanism is well approximated. These …


A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit Oct 2005

A Fine-Scale Linkage Disequilibrium Measure Based On Length Of Haplotype Sharing, Yan Wang, Lue Ping Zhao, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

High-throughput genotyping technologies for single nucleotide polymorphisms (SNP) have enabled the recent completion of the International HapMap Project (Phase I), which has stimulated much interest in studying genome-wide linkage disequilibrium (LD) patterns. Conventional LD measures, such as D' and r-square, are two-point measurements, and their relationship with physical distance is highly noisy. We propose a new LD measure, defined in terms of the correlation coefficient for shared haplotype lengths around two loci, thereby borrowing information from multiple loci. A U-statistic-based estimator of the new LD measure, which takes into consideration the dependence structure of the observed data, is developed and …


Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan Oct 2005

Population Intervention Models In Causal Inference, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Marginal structural models (MSM) provide a powerful tool for estimating the causal effect of a] treatment variable or risk variable on the distribution of a disease in a population. These models, as originally introduced by Robins (e.g., Robins (2000a), Robins (2000b), van der Laan and Robins (2002)), model the marginal distributions of treatment-specific counterfactual outcomes, possibly conditional on a subset of the baseline covariates, and its dependence on treatment. Marginal structural models are particularly useful in the context of longitudinal data structures, in which each subject's treatment and covariate history are measured over time, and an outcome is recorded at …


Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen Aug 2005

Direct Effect Models, Mark J. Van Der Laan, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

The causal effect of a treatment on an outcome is generally mediated by several intermediate variables. Estimation of the component of the causal effect of a treatment that is mediated by a given intermediate variable (the indirect effect of the treatment), and the component that is not mediated by that intermediate variable (the direct effect of the treatment) is often relevant to mechanistic understanding and to the design of clinical and public health interventions. Under the assumption of no-unmeasured confounders for treatment and the intermediate variable, Robins & Greenland (1992) define an individual direct effect as the counterfactual effect of …


Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan Aug 2005

Application Of A Multiple Testing Procedure Controlling The Proportion Of False Positives To Protein And Bacterial Data, Merrill D. Birkner, Alan E. Hubbard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing multiple hypotheses is important in high-dimensional biological studies. In these situations, one is often interested in controlling the Type-I error rate, such as the proportion of false positives to total rejections (TPPFP) at a specific level, alpha. This article will present an application of the E-Bayes/Bootstrap TPPFP procedure, presented in van der Laan et al. (2005), which controls the tail probability of the proportion of false positives (TPPFP), on two biological datasets. The two data applications include firstly, the application to a mass-spectrometry dataset of two leukemia subtypes, AML and ALL. The protein data measurements include intensity and …


Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan Aug 2005

Cross-Validating And Bagging Partitioning Algorithms With Variable Importance, Annette M. Molinaro, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

We present a cross-validated bagging scheme in the context of partitioning algorithms. To explore the benefits of the various bagging scheme, we compare via simulations the predictive ability of single Classification and Regression (CART) Tree with several previously suggested bagging schemes and with our proposed approach. Additionally, a variable importance measure is explained and illustrated.


Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit Jul 2005

Test Statistics Null Distributions In Multiple Testing: Simulation Studies And Applications To Genomics, Katherine S. Pollard, Merrill D. Birkner, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Multiple hypothesis testing problems arise frequently in biomedical and genomic research, for instance, when identifying differentially expressed or co-expressed genes in microarray experiments. We have developed generally applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for control of a broad class of Type I error rates, defined as tail probabilities and expected values for arbitrary functions of the numbers of false positives and rejected hypotheses (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; Pollard and van der Laan, 2004; van der Laan et al., 2005, 2004a,b). As argued in the early article of Pollard and van der …


A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan Jun 2005

A Note On The Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robins' causal inference theory assumes existence of treatment specific counterfactual variables so that the observed data augmented by the counterfactual data will satisfy a consistency and a randomization assumption. In this paper we provide an explicit function that maps the observed data into a counterfactual variable which satisfies the consistency and randomization assumptions. This offers a practically useful imputation method for counterfactuals. Gill & Robins [2001]'s construction of counterfactuals can be used as an imputation method in principle, but it is very hard to implement in practice. Robins [1987] shows that the counterfactual distribution can be identified from the observed …


Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen Jun 2005

Cross-Validated Bagged Learning, Mark J. Van Der Laan, Sandra E. Sinisi, Maya L. Petersen

U.C. Berkeley Division of Biostatistics Working Paper Series

Many applications aim to learn a high dimensional parameter of a data generating distribution based on a sample of independent and identically distributed observations. For example, the goal might be to estimate the conditional mean of an outcome given a list of input variables. In this prediction context, Breiman (1996a) introduced bootstrap aggregating (bagging) as a method to reduce the variance of a given estimator at little cost to bias. Bagging involves applying the estimator to multiple bootstrap samples, and averaging the result across bootstrap samples. In order to deal with the curse of dimensionality, typical practice has been to …


Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager Apr 2005

Causal Inference In Longitudinal Studies With History-Restricted Marginal Structural Models, Romain Neugebauer, Mark J. Van Der Laan, Ira B. Tager

U.C. Berkeley Division of Biostatistics Working Paper Series

Causal Inference based on Marginal Structural Models (MSMs) is particularly attractive to subject-matter investigators because MSM parameters provide explicit representations of causal effects. We introduce History-Restricted Marginal Structural Models (HRMSMs) for longitudinal data for the purpose of defining causal parameters which may often be better suited for Public Health research. This new class of MSMs allows investigators to analyze the causal effect of a treatment on an outcome based on a fixed, shorter and user-specified history of exposure compared to MSMs. By default, the latter represents the treatment causal effect of interest based on a treatment history defined by the …


Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard Mar 2005

Resampling Based Multiple Testing Procedure Controlling Tail Probability Of The Proportion Of False Positives, Mark J. Van Der Laan, Merrill D. Birkner, Alan E. Hubbard

U.C. Berkeley Division of Biostatistics Working Paper Series

Simultaneously testing a collection of null hypotheses about a data generating distribution based on a sample of independent and identically distributed observations is a fundamental and important statistical problem involving many applications. In this article we propose a new resampling based multiple testing procedure asymptotically controlling the probability that the proportion of false positives among the set of rejections exceeds q at level alpha, where q and alpha are user supplied numbers. The procedure involves 1) specifying a conditional distribution for a guessed set of true null hypotheses, given the data, which asymptotically is degenerate at the true set of …


Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Jan 2005

Multiple Testing Procedures And Applications To Genomics, Merrill D. Birkner, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

This chapter proposes widely applicable resampling-based single-step and stepwise multiple testing procedures (MTP) for controlling a broad class of Type I error rates, in testing problems involving general data generating distributions (with arbitrary dependence structures among variables), null hypotheses, and test statistics (Dudoit and van der Laan, 2005; Dudoit et al., 2004a,b; van der Laan et al., 2004a,b; Pollard and van der Laan, 2004; Pollard et al., 2005). Procedures are provided to control Type I error rates defined as tail probabilities for arbitrary functions of the numbers of Type I errors, V_n, and rejected hypotheses, R_n. These error rates include: …