Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Statistics and Probability

2002

Gene expression

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan Apr 2002

A Method To Identify Significant Clusters In Gene Expression Data, Katherine S. Pollard, Mark J. Van Der Laan

U.C. Berkeley Division of Biostatistics Working Paper Series

Clustering algorithms have been widely applied to gene expression data. For both hierarchical and partitioning clustering algorithms, selecting the number of significant clusters is an important problem and many methods have been proposed. Existing methods for selecting the number of clusters tend to find only the global patterns in the data (e.g.: the over and under expressed genes). We have noted the need for a better method in the gene expression context, where small, biologically meaningful clusters can be difficult to identify. In this paper, we define a new criteria, Mean Split Silhouette (MSS), which is a measure of cluster …


A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan Feb 2002

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan

U.C. Berkeley Division of Biostatistics Working Paper Series

Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this …