Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Life Sciences
Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes
Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes
Electronic Thesis and Dissertation Repository
Supervised classification of genomic sequences is a challenging, well-studied problem with a variety of important applications. We propose an open-source, supervised, alignment-free, highly general method for sequence classification that operates on k-mer proportions of DNA sequences. This method was implemented in a fully standalone general-purpose software package called Kameris, publicly available under a permissive open-source license. Compared to competing software, ours provides key advantages in terms of data security and privacy, transparency, and reproducibility. We perform a detailed study of its accuracy and performance on a wide variety of classification tasks, including virus subtyping, taxonomic classification, and human haplogroup assignment. …