Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 331 - 336 of 336

Full-Text Articles in Statistics and Probability

A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Apr 2002

A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster …


A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan Feb 2002

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan

U.C. Berkeley Division of Biostatistics Working Paper Series

Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this …


Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen Nov 2001

Marginal Regression Of Gaps Between Recurrent Events, Yijian Huang, Ying Qing Chen

U.C. Berkeley Division of Biostatistics Working Paper Series

Recurrent event data typically exhibit the phenomenon of intra-individual correlation, owing to not only observed covariates but also random effects. In many applications, the population can be reasonably postulated as a heterogeneous mixture of individual renewal processes, and the inference of interest is the effect of individual-level covariates. In this article, we suggest and investigate a marginal proportional hazards model for gaps between recurrent events. A connection is established between observed gap times and clustered survival data, however, with informative cluster size. We then derive a novel and general inference procedure for the latter, based on a functional formulation of …


Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch Oct 2001

Maximum Likelihood Estimation Of Ordered Multinomial Parameters, Nicholas P. Jewell, John D. Kalbfleisch

U.C. Berkeley Division of Biostatistics Working Paper Series

The pool-adjacent violator-algorithm (Ayer, et al., 1955) has long been known to give the maximum likelihood estimator of a series of ordered binomial parameters, based on an independent observation from each distribution (see Barlow et al., 1972). This result has immediate application to estimation of a survival distribution based on current survival status at a set of monitoring times. This paper considers an extended problem of maximum likelihood estimation of a series of ‘ordered’ multinomial parameters. By making use of variants of the pool adjacent violator algorithm, we obtain a simple algorithm to compute the maximum likelihood estimator and demonstrate …


Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Jul 2001

Statistical Inference For Simultaneous Clustering Of Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Current methods for analysis of gene expression data are mostly based on clustering and classification of either genes or samples. We offer support for the idea that more complex patterns can be identified in the data if genes and samples are considered simultaneously. We formalize the approach and propose a statistical framework for two-way clustering. A simultaneous clustering parameter is defined as a function of the true data generating distribution, and an estimate is obtained by applying this function to the empirical distribution. We illustrate that a wide range of clustering procedures, including generalized hierarchical methods, can be defined as …


Assessing The Accuracy Of A New Diagnostic Test When A Gold Standard Does Not Exist, Todd A. Alonzo, Margaret S. Pepe Oct 1998

Assessing The Accuracy Of A New Diagnostic Test When A Gold Standard Does Not Exist, Todd A. Alonzo, Margaret S. Pepe

UW Biostatistics Working Paper Series

Often the accuracy of a new diagnostic test must be assessed when a perfect gold standard does not exist. Use of an imperfect test biases the accuracy estimates of the new test. This paper reviews existing approaches to this problem including discrepant resolution and latent class analysis. Deficiencies with these approaches are identified. A new approach is proposed that combines the results of several imperfect reference tests to define a better reference standard. We call this the composite reference standard (CRS). Using the CRS, accuracy can be assessed using multistage sampling designs. Maximum likelihood estimates of accuracy and expressions for …