Genetics and Genomics | Open Access Articles

The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self Jan 2005

The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self

UW Biostatistics Working Paper Series

Identification of differentially expressed genes and clustering of genes are two important and complementary objectives addressed with gene expression data. For the differential expression question, many "per-gene" analytic methods have been proposed. These methods can generally be characterized as using a regression function to independently model the observations for each gene; various adjustments for multiplicity are then used to interpret the statistical significance of these per-gene regression models over the collection of genes analyzed. Motivated by this common structure of per-gene models, we propose a new model-based clustering method -- the clustering of regression models method, which groups genes that …

Go to article

Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit Jun 2004

Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Classical measures of linkage disequilibrium (LD) between two loci, based only on the joint distribution of alleles at these loci, present noisy patterns. In this paper, we propose a new distance-based LD measure, R, which takes into account multilocus haplotypes around the two loci in order to exploit information from neighboring loci. The LD measure R yields a matrix of pairwise distances between markers, based on the correlation between the lengths of shared haplotypes among chromosomes around these markers. Data analysis demonstrates that visualization of LD patterns through the R matrix reveals more deterministic patterns, with much less noise, than …

Go to article

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan Feb 2002

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan

U.C. Berkeley Division of Biostatistics Working Paper Series

Kaufman & Rousseeuw (1990) proposed a clustering algorithm Partitioning Around Medoids (PAM) which maps a distance matrix into a specified number of clusters. A particularly nice property is that PAM allows clustering with respect to any specified distance metric. In addition, the medoids are robust representations of the cluster centers, which is particularly important in the common context that many elements do not belong well to any cluster. Based on our experience in clustering gene expression data, we have noticed that PAM does have problems recognizing relatively small clusters in situations where good partitions around medoids clearly exist. In this …

Go to article

Genetics and Genomics Commons^™

Full-Text Articles in Genetics and Genomics

The Clustering Of Regression Models Method With Applications In Gene Expression Data, Li-Xuan Qin, Steven G. Self

UW Biostatistics Working Paper Series

Quantification And Visualization Of Ld Patterns And Identification Of Haplotype Blocks, Yan Wang, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

A New Partitioning Around Medoids Algorithm, Mark J. Van Der Laan, Katherine S. Pollard, Jennifer Bryan

U.C. Berkeley Division of Biostatistics Working Paper Series