Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey Nov 2005

Optimal Feature Selection For Nearest Centroid Classifiers, With Applications To Gene Expression Microarrays, Alan R. Dabney, John D. Storey

UW Biostatistics Working Paper Series

Nearest centroid classifiers have recently been successfully employed in high-dimensional applications. A necessary step when building a classifier for high-dimensional data is feature selection. Feature selection is typically carried out by computing univariate statistics for each feature individually, without consideration for how a subset of features performs as a whole. For subsets of a given size, we characterize the optimal choice of features, corresponding to those yielding the smallest misclassification rate. Furthermore, we propose an algorithm for estimating this optimal subset in practice. Finally, we investigate the applicability of shrinkage ideas to nearest centroid classifiers. We use gene-expression microarrays for …


A Platform-Independent Software Suite For Statistical Analysis Of High Dimensional Biology Data, David B. Allison, Jacob P. L. Brand, Jode W. Edwards, Gary L. Gadbury, Kyoungmi Kim, Tapan Mehta, Grier P. Page, Amit Patki, Vinodh Srinivasasainagendra, Prinal Trivedi, Jelai Wang, Stanislav O. Zakharkin Jan 2005

A Platform-Independent Software Suite For Statistical Analysis Of High Dimensional Biology Data, David B. Allison, Jacob P. L. Brand, Jode W. Edwards, Gary L. Gadbury, Kyoungmi Kim, Tapan Mehta, Grier P. Page, Amit Patki, Vinodh Srinivasasainagendra, Prinal Trivedi, Jelai Wang, Stanislav O. Zakharkin

Mathematics and Statistics Faculty Research & Creative Works

Many efforts in microarray data analysis are focused on providing tools and methods for the qualitative analysis of microarray data. HDBStat! (High-Dimensional Biology-Statistics) is a software package designed for analysis of high dimensional biology data such as microarray data. It was initially developed for the analysis of microarray gene expression data, but it can also be used for some applications in proteomics and other aspects of genomics. HDBStat! provides statisticians and biologists a flexible and easy-to-use interface to analyze complex microarray data using a variety of methods for data preprocessing, quality control analysis and hypothesis testing.