Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Life Sciences

Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen Nov 2015

Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen

Henry M. Rowan College of Engineering Faculty Scholarship

BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …