Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

Electronic Thesis and Dissertation Repository

Theses/Dissertations

Comparative genomics

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Life Sciences

Selection Pressure On Surface Exposed Virus Proteins, Sareh Bagherichimeh Dec 2022

Selection Pressure On Surface Exposed Virus Proteins, Sareh Bagherichimeh

Electronic Thesis and Dissertation Repository

Viral infection requires the interaction between virus surface-exposed (SE) proteins and host cell receptors. This can result in an “arms race” that is assumed to drive accelerated rates of evolution, and some well known examples of diversifying selection involve surface pro- teins (HIV-1 env, influenza hemagglutinin). We conducted a systematic analysis to determine whether this is truly a distinctive feature of SE virus proteins, in comparison to non-SE proteins encoded by the same genomes.

We obtained reference and all neighbour genomes of 52 human viruses from the NCBI Viral Genomes database. The coding sequences (CDS) of each genome extracted by …


Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes Oct 2018

Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes

Electronic Thesis and Dissertation Repository

Supervised classification of genomic sequences is a challenging, well-studied problem with a variety of important applications. We propose an open-source, supervised, alignment-free, highly general method for sequence classification that operates on k-mer proportions of DNA sequences. This method was implemented in a fully standalone general-purpose software package called Kameris, publicly available under a permissive open-source license. Compared to competing software, ours provides key advantages in terms of data security and privacy, transparency, and reproducibility. We perform a detailed study of its accuracy and performance on a wide variety of classification tasks, including virus subtyping, taxonomic classification, and human haplogroup assignment. …


A Quantitative Method For Measuring And Visualizing Species' Relatedness In A Two-Dimensional Euclidean Space., Abu Sadat Md. Sayem Apr 2013

A Quantitative Method For Measuring And Visualizing Species' Relatedness In A Two-Dimensional Euclidean Space., Abu Sadat Md. Sayem

Electronic Thesis and Dissertation Repository

Representing DNA sequences graphically and evaluating, as well as displaying, species’ relationships have been considered to be an important aspect of molecular biology research. A novel approach is proposed in this thesis that combines three methods: a) Chaos Game Representation (CGR), to portray quantitative characteristics of a DNA sequence as a black-and -white image, b) Structural Similarity (SSIM) index, an image comparison method, to compute pair-wise distances between these images, and c) Multidimensional Scaling (MDS), to visually display each sequence as a point in a two-dimensional Euclidean space. The proposed method produces a visual representation called Genome Distance Map (GDM) …