Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 91 - 114 of 114

Full-Text Articles in Physical Sciences and Mathematics

Double Robust Estimation In Longitudinal Marginal Structural Models, Zhuo Yu, Mark J. Van Der Laan Jun 2003

Double Robust Estimation In Longitudinal Marginal Structural Models, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Consider estimation of causal parameters in a marginal structural model for the discrete intensity of the treatment specific counting process (e.g. hazard of a treatment specific survival time) based on longitudinal observational data on treatment, covariates and survival. We assume the sequential randomization assumption (SRA) on the treatment assignment mechanism and the so called experimental treatment assignment assumption which is needed to identify the causal parameters from the observed data distribution. Under SRA, the likelihood of the observed data structure factorizes in the auxiliary treatment mechanism and the partial likelihood consisting of the product over time of conditional distributions of …


Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen May 2003

Supervised Detection Of Regulatory Motifs In Dna Sequences, Sunduz Keles, Mark J. Van Der Laan, Sandrine Dudoit, Biao Xing, Michael B. Eisen

U.C. Berkeley Division of Biostatistics Working Paper Series

Identification of transcription factor binding sites (regulatory motifs) is a major interest in contemporary biology. We propose a new likelihood based method, COMODE, for identifying structural motifs in DNA sequences. Commonly used methods (e.g. MEME, Gibbs sampler) model binding sites as families of sequences described by a position weight matrix (PWM) and identify PWMs that maximize the likelihood of observed sequence data under a simple multinomial mixture model. This model assumes that the positions of the PWM correspond to independent multinomial distributions with four cell probabilities. We address supervising the search for DNA binding sites using the information derived from …


A Semiparametric Model Selection Criterion With Applications To The Marginal Structural Model, M. Alan Brookhart, Mark J. Van Der Laan Mar 2003

A Semiparametric Model Selection Criterion With Applications To The Marginal Structural Model, M. Alan Brookhart, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimators for the parameter of interest in semiparametric models often depend on a guessed model for the nuisance parameter. The choice of the model for the nuisance parameter can affect both the finite sample bias and efficiency of the resulting estimator of the parameter of interest. In this paper we propose a finite sample criterion based on cross validation that can be used to select a nuisance parameter model from a list of candidate models. We show that expected value of this criterion is minimized by the nuisance parameter model that yields the estimator of the parameter of interest with …


Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit Feb 2003

Ibd Configuration Transition Matrices And Linkage Score Tests For Unilineal Relative Pairs, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Properties of transition matrices between IBD configurations are derived for four general classes of unilineal relative pairs obtained from the grand-parent/ grand-child, half-sib, avuncular, and cousin relationships. In this setting, IBD configurations are defined as orbits of groups acting on a set of inheritance vectors. Properties of the transition matrix between IBD configurations at two linked loci are derived by relating its infinitesimal generator to the adjacency matrix of a quotient graph. The second largest eigenvalue of the infinitesimal generator and its multiplicity are key in determining the form of the transition matrix and of likelihood-based linkage tests such as …


Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles Feb 2003

Asymptotic Optimality Of Likelihood Based Cross-Validation, Mark J. Van Der Laan, Sandrine Dudoit, Sunduz Keles

U.C. Berkeley Division of Biostatistics Working Paper Series

Likelihood-based cross-validation is a statistical tool for selecting a density estimate based on n i.i.d. observations from the true density among a collection of candidate density estimators. General examples are the selection of a model indexing a maximum likelihood estimator, and the selection of a bandwidth indexing a nonparametric (e.g. kernel) density estimator. In this article, we establish asymptotic optimality of a general class of likelihood based cross-validation procedures (as indexed by the type of sample splitting used, e.g. V-fold cross-validation), in the sense that the cross-validation selector performs asymptotically as well (w.r.t. to the Kullback-Leibler distance to the true …


Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan Feb 2003

Asymptotics Of Cross-Validated Risk Estimation In Estimator Selection And Performance Assessment, Sandrine Dudoit, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Risk estimation is an important statistical question for the purposes of selecting a good estimator (i.e., model selection) and assessing its performance (i.e., estimating generalization error). This article introduces a general framework for cross-validation and derives distributional properties of cross-validated risk estimators in the context of estimator selection and performance assessment. Arbitrary classes of estimators are considered, including density estimators and predictors for both continuous and polychotomous outcomes. Results are provided for general full data loss functions (e.g., absolute and squared error, indicator, negative log density). A broad definition of cross-validation is used in order to cover leave-one-out cross-validation, V-fold …


Recurrent Events Analysis In The Presence Of Time Dependent Covariates And Dependent Censoring, Maja Miloslavsky, Sunduz Keles, Mark J. Van Der Laan, Steve Butler Dec 2002

Recurrent Events Analysis In The Presence Of Time Dependent Covariates And Dependent Censoring, Maja Miloslavsky, Sunduz Keles, Mark J. Van Der Laan, Steve Butler

U.C. Berkeley Division of Biostatistics Working Paper Series

Recurrent events models have lately received a lot of attention in the literature. The majority of approaches discussed show the consistency of parameter estimates under the assumption that censoring is independent of the recurrent events process of interest conditional on the covariates included into the model. We provide an overview of available recurrent events analysis methods, and present an inverse probability of censoring weighted estimator for the regression parameters in the Andersen-Gill model that is commonly used for recurrent event analysis. This estimator remains consistent under informative censoring if the censoring mechanism is estimated consistently, and generally improves on the …


Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan Dec 2002

Construction Of Counterfactuals And The G-Computation Formula, Zhuo Yu, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Robins' causal inference theory assumes existence of treatment specific counterfactual variables so that the observed data augmented by the counterfactual data will satisfy a consistency and a randomization assumption. Gill and Robins [2001] show that the consistency and randomization assumptions do not add any restrictions to the observed data distribution. In particular, they provide a construction of counterfactuals as a function of the observed data distribution. In this paper we provide a construction of counterfactuals as a function of the observed data itself. Our construction provides a new statistical tool for estimation of counterfactual distributions. Robins [1987b] shows that the …


Locally Efficient Estimation With Bivariate Right Censored Data , Christopher M. Quale, Mark J. Van Der Laan, James M. Robins Oct 2002

Locally Efficient Estimation With Bivariate Right Censored Data , Christopher M. Quale, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

Estimation for bivariate right censored data is a problem that has had much study over the past 15 years. In this paper we propose a new class of estimators for the bivariate survivor function based on locally efficient estimation. The locally efficient estimator takes bivariate estimators Fn and Gn of the distributions of the time variables T1,T2 and the censoring variables C1,C2, respectively, and maps them to the resulting estimator. If Fn and Gn are consistent estimators of F and G, respectively, then the resulting estimator will be nonparametrically efficient (thus the term ``locally efficient''). However, if either Fn or …


Accelerated Hazards Model: Method, Theory And Applications, Ying Qing Chen, Nicholas P. Jewell, Jingrong Yang Sep 2002

Accelerated Hazards Model: Method, Theory And Applications, Ying Qing Chen, Nicholas P. Jewell, Jingrong Yang

U.C. Berkeley Division of Biostatistics Working Paper Series

In an accelerated hazards model, the hazard functions of a failure time are related through the time scale-change, which is often a function of covariates and associated parameters. When the hazard functions have special properties, such as monotonicity in time, the parameters may be clinically meaningful in measuring a treatment effect. This paper reviews methodological and theoretical development of this model. Applications of the accelerated hazards model including sample size calculation in clinical trials, are also explored.


Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins Sep 2002

Locally Efficient Estimation Of Regression Parameters Using Current Status Data, Chris Andrews, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

In biostatistics applications interest often focuses on the estimation of the distribution of a time-variable T. If one only observes whether or not T exceeds an observed monitoring time C, then the data structure is called current status data, also known as interval censored data, case I. We consider this data structure extended to allow the presence of both time-independent covariates and time-dependent covariate processes that are observed until the monitoring time. We assume that the monitoring process satisfies coarsening at random.

Our goal is to estimate the regression parameter beta of the regression model T = Z*beta+epsilon where the …


Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Case-Control Current Status Data, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current status observation on survival times has recently been widely studied. An extreme form of interval censoring, this data structure refers to situations where the only available information on a survival random variable, T, is whether or not T exceeds a random independent monitoring time C, a binary random variable, Y. To date, nonparametric analyses of current status data have assumed the availability of i.i.d. random samples of the random variable (Y, C), or a similar random sample at each of a set of fixed monitoring times. In many situations, it is useful to consider a case-control sampling scheme. Here, …


Why Prefer Double Robust Estimates? Illustration With Causal Point Treatment Studies, Romain Neugebauer, Mark J. Van Der Laan Sep 2002

Why Prefer Double Robust Estimates? Illustration With Causal Point Treatment Studies, Romain Neugebauer, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

In point treatment marginal structural models with treatment A, outcome Y and covariates W, causal parameters can be estimated under the assumption of no unobserved confounders. Three estimates can be used: the G-computation, Inverse Probability of Treatment Weighted (IPTW) or Double Robust (DR) estimates. The properties of the IPTW and DR estimates are known under an assumption on the treatment mechanism that we name "Experimental Treatment Assignment" (ETA) assumption. We show that the DR estimating function is unbiased when the ETA assumption is violated if the model used to regress Y on A and W is correctly specified. The practical …


Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell Sep 2002

Bivariate Current Status Data, Mark J. Van Der Laan, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

In many applications, it is often of interest to estimate a bivariate distribution of two survival random variables. Complete observation of such random variables is often incomplete. If one only observes whether or not each of the individual survival times exceeds a common observed monitoring time C, then the data structure is referred to as bivariate current status data (Wang and Ding, 2000). For such data, we show that the identifiable part of the joint distribution is represented by three univariate cumulative distribution functions, namely the two marginal cumulative distribution functions, and the bivariate cumulative distribution function evaluated on the …


Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan Sep 2002

Current Status Data: Review, Recent Developments And Open Problems, Nicholas P. Jewell, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Researchers working with survival data are by now adept at handling issues associated with incomplete data, particular those associated with various forms of censoring. An extreme form of interval censoring, known as current status observation, refers to situations where the only available information on a survival random variable T is whether or not T exceeds a random independent monitoring time C. This article contains a brief review of the extensive literature on the analysis of current status data, discussing the implications of response-based sampling on these methods. The majority of the paper introduces some recent extensions of these ideas to …


Semiparametric Regression Analysis On Longitudinal Pattern Of Recurrent Gap Times, Ying Qing Chen, Mei-Cheng Wang, Yijian Huang Aug 2002

Semiparametric Regression Analysis On Longitudinal Pattern Of Recurrent Gap Times, Ying Qing Chen, Mei-Cheng Wang, Yijian Huang

U.C. Berkeley Division of Biostatistics Working Paper Series

In longitudinal studies, individual subjects may experience recurrent events of the same type over a relatively long period of time. The longitudinal pattern of the gaps between the successive recurrent events is often of great research interest. In this article, the probability structure of the recurrent gap times is first explored in the presence of censoring. According to the discovered structure, we introduce the proportional reverse-time hazards models with unspecified baseline functions to accommodate heterogeneous individual underlying distributions, when the ongitudinal pattern parameter is of main interest. Inference procedures are proposed and studied by way of proper riskset construction. The …


Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick Aug 2002

Multiple Hypothesis Testing In Microarray Experiments, Sandrine Dudoit, Juliet Popper Shaffer, Jennifer C. Boldrick

U.C. Berkeley Division of Biostatistics Working Paper Series

DNA microarrays are a new and promising biotechnology which allows the monitoring of expression levels in cells for thousands of genes simultaneously. An important and common question in microarray experiments is the identification of differentially expressed genes, i.e., genes whose expression levels are associated with a response or covariate of interest. The biological question of differential expression can be restated as a problem in multiple hypothesis testing: the simultaneous test for each gene of the null hypothesis of no association between the expression levels and the responses or covariates. As a typical microarray experiment measures expression levels for thousands of …


Estimation Of The Bivariate Survival Function With Generalized Bivariate Right Censored Data Structures, Sunduz Keles, Mark J. Van Der Laan, James M. Robins Aug 2002

Estimation Of The Bivariate Survival Function With Generalized Bivariate Right Censored Data Structures, Sunduz Keles, Mark J. Van Der Laan, James M. Robins

U.C. Berkeley Division of Biostatistics Working Paper Series

We propose a bivariate survival function estimator for a general right censored data structure that includes a time dependent covariate process. Firstly, an initial estimator that generalizes Dabrowska's (1988) estimator is introduced. We obtain this estimator by a general methodology of constructing estimating functions in censored data models. The initial estimator is guaranteed to improve on Dabrowska's estimator and remains consistent and asymptotically linear under informative censoring schemes if the censoring mechanism is estimated consistently. We then construct an orthogonalized estimating function which results in a more robust and efficient estimator than our initial estimator. A simulation study demonstrates the …


Inference For Proportional Mean Residual Life Model In The Presence Of Censoring, Ying Q. Chen, Nicholas P. Jewell May 2002

Inference For Proportional Mean Residual Life Model In The Presence Of Censoring, Ying Q. Chen, Nicholas P. Jewell

U.C. Berkeley Division of Biostatistics Working Paper Series

As a function of time t, mean residual life is defined as remaining life expectancy of a subject given its survival to t. It plays an important role in many research areas to characterise stochastic behavior of survival over time. Similar to the Cox proportional hazard model, the proportional mean residual life model were proposed in statistical literature to study association between the mean residual life and individual subject's explanatory covariates. In this article, we will study this model and develop appropriate inference procedures in presence of censoring. Numerical studies including simulation and real data analysis are presented as well.


A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Apr 2002

A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster …


A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan Feb 2002

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan

U.C. Berkeley Division of Biostatistics Working Paper Series

Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this …


Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen Nov 2001

Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen

U.C. Berkeley Division of Biostatistics Working Paper Series

Recurrent event data typically exhibit the phenomenon of intra-individual correlation, owing to not only observed covariates but also random effects. In many applications, the population can be reasonably postulated as a heterogeneous mixture of individual renewal processes, and the inference of interest is the effect of individual-level covariates. In this article, we suggest and investigate a marginal proportional hazards model for gaps between recurrent events. A connection is established between observed gap times and clustered survival data, however, with informative cluster size. We then derive a novel and general inference procedure for the latter, based on a functional formulation of …


Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch Oct 2001

Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch

U.C. Berkeley Division of Biostatistics Working Paper Series

The pool-adjacent violator-algorithm (Ayer, et al., 1955) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see Barlow et al., 1972). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of ‘ordered’ multinomial parameters. By making use of variants of the pool adjacent violator algorithm, we obtain a simple algorithm to compute the maximum likelihood estimator and demonstrate …


Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jul 2001

Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …