Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Life Sciences

Error Correction In Next Generation Dna Sequencing Data, Michael Z. Molnar Dec 2012

Error Correction In Next Generation Dna Sequencing Data, Michael Z. Molnar

Electronic Thesis and Dissertation Repository

Motivation: High throughput Next Generation Sequencing (NGS) technologies can sequence the genome of a species quickly and cheaply. Errors that are introduced by NGS technologies limit the full potential of the applications that rely on their data. Current techniques used to correct these errors are not sufficient, and a more efficient and accurate program is needed to correct errors.

Results: We have designed and implemented RACER (Rapid Accurate Correction of Errors in Reads), an error correction program that targets the Illumina genome sequencer, which is currently the dominant NGS technology. RACER combines advanced data structures with an intricate analysis of …


Optimal Clustering Techniques For Metagenomic Sequencing Data, Erik T. Cameron Aug 2012

Optimal Clustering Techniques For Metagenomic Sequencing Data, Erik T. Cameron

Electronic Thesis and Dissertation Repository

Metagenomic sequencing techniques have made it possible to determine the composition of bacterial microbiota of the human body. Clustering algorithms have been used to search for core microbiota types in the vagina, but results have been inconsistent, possibly due to methodological differences. We performed an extensive comparison of six commonly-used clustering algorithms and four distance metrics, using clinical data from 777 vaginal samples across 5 studies, and 36,000 synthetic datasets based on these clinical data. We found that centroid-based clustering algorithms (K-means and Partitioning around Medoids), with Euclidean or Manhattan distance metrics, performed well. They were best at correctly clustering …