Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Computer Engineering
Clustering And Classification Of Multi-Domain Proteins, Neethu Shah
Clustering And Classification Of Multi-Domain Proteins, Neethu Shah
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Rapid development of next-generation sequencing technology has led to an unprecedented growth in protein sequence data repositories over the last decade. Majority of these proteins lack structural and functional characterization. This necessitates design and development of fast, efficient, and sensitive computational tools and algorithms that can classify these proteins into functionally coherent groups.
Domains are fundamental units of protein structure and function. Multi-domain proteins are extremely complex as opposed to proteins that have single or no domains. They exhibit network-like complex evolutionary events such as domain shuffling, domain loss, and domain gain. These events therefore, cannot be represented in the …
Biological Sequence Simulation For Testing Complex Evolutionary Hypotheses: Indel-Seq-Gen Version 2.0, Cory L. Strope
Biological Sequence Simulation For Testing Complex Evolutionary Hypotheses: Indel-Seq-Gen Version 2.0, Cory L. Strope
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
Reconstructing the evolutionary history of biological sequences will provide a better understanding of mechanisms of sequence divergence and functional evolution. Long-term sequence evolution includes not only substitutions of residues but also more dynamic changes such as insertion, deletion, and long-range rearrangements. Such dynamic changes make reconstructing sequence evolution history difficult and affect the accuracy of molecular evolutionary methods, such as multiple sequence alignments (MSAs) and phylogenetic methods. In order to test the accuracy of these methods, benchmark datasets are required. However, currently available benchmark datasets have limitations in their sizes and evolutionary histories of the included sequences are unknown. These …