Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

PDF

Boise State University

Theses/Dissertations

2012

Machine learning

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt Aug 2012

On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt

Boise State University Theses and Dissertations

A regular expression and region-specific filtering system for biological records at the National Center for Biotechnology database is integrated into an object oriented sequence counting application, and a statistical software suite is designed and deployed to interpret the resulting k-mer frequencies|with a priority focus on nullomers. The proteome k-mer frequency spectra of ten model organisms and the genome k-mer frequency spectra of two bacteria and virus strains for the coding and non-coding regions are comparatively scrutinized. We observe that the naturally-evolved (NCBI/organism) and the artificially-biased (randomly-generated) sequences exhibit a clear deviation from the artificially-unbiased (randomly-generated) histogram distributions. …