Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt
On The K-Mer Frequency Spectra Of Organism Genome And Proteome Sequences With A Preliminary Machine Learning Assessment Of Prime Predictability, Nathan O. Schmidt
Boise State University Theses and Dissertations
A regular expression and region-specific filtering system for biological records at the National Center for Biotechnology database is integrated into an object oriented sequence counting application, and a statistical software suite is designed and deployed to interpret the resulting k-mer frequencies|with a priority focus on nullomers. The proteome k-mer frequency spectra of ten model organisms and the genome k-mer frequency spectra of two bacteria and virus strains for the coding and non-coding regions are comparatively scrutinized. We observe that the naturally-evolved (NCBI/organism) and the artificially-biased (randomly-generated) sequences exhibit a clear deviation from the artificially-unbiased (randomly-generated) histogram distributions. …