Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 70

Full-Text Articles in Entire DC Network

State-Of-The-Art Approaches For Sequencing, Assembling And Annotating Naphthenic Acid Degrading Bacterial Metagenomes, Henry H. Say Aug 2023

State-Of-The-Art Approaches For Sequencing, Assembling And Annotating Naphthenic Acid Degrading Bacterial Metagenomes, Henry H. Say

Electronic Thesis and Dissertation Repository

Naphthenic acids (NAs) are the main toxic component of oil refinery wastewater and require special processes to be removed. Harnessing bacterial biodegradation for NA removal has the potential to be effective, yet NA-degrading bacteria and pathways are poorly understood and uncharacterized. To improve our understanding of NA degradation, I characterize the metagenomes of novel NA-degrading bacterial communities seeded in NA-enriched granulated activated carbon (GAC) filters. I demonstrate methods that maximize the throughput of extraction, sequencing, and annotation of novel metagenomes - producing 72 MAGs and other 5432 circular contigs - 226 of which were putative phages. I also include state-of-the-art …


Evolution Of Overlapping Reading Frames In Virus Genomes, Laura Muñoz Baena Aug 2023

Evolution Of Overlapping Reading Frames In Virus Genomes, Laura Muñoz Baena

Electronic Thesis and Dissertation Repository

Viruses are formidable pathogens that represent the majority of biological entities in our planet, and their genomes are a source of interesting enigmas. One feature in which virus genomes are usually rich, is the presence of overlapping reading frames (OvRFs) — portions of the genome where the same nucleotide sequence encodes more than one protein. OvRFs are hypothesized to be used by viruses to encode proteins more compactly and to regulate transcription. In addition, OvRFs might be a source of gene novelty, facilitating the creation of new open reading frames (ORF) within the transcriptional context of existing ones.

To characterize …


Decoy-Target Database Strategy And False Discovery Rate Analysis For Glycan Identification, Xiaoou Li Jul 2023

Decoy-Target Database Strategy And False Discovery Rate Analysis For Glycan Identification, Xiaoou Li

Electronic Thesis and Dissertation Repository

In recent years, the technology of glycopeptide sequencing through MS/MS mass spectrometry data has achieved remarkable progress. Various software tools have been developed and widely used for protein identification. Estimation of false discovery rate (FDR) has become an essential method for evaluating the performance of glycopeptide scoring algorithms. The target-decoy strategy, which involves constructing decoy databases, is currently the most popular utilized method for FDR calculation. In this study, we applied various decoy construction algorithms to generate decoy glycan databases and proposed a novel approach to calculate the FDR by using the EM algorithm and mixture model.


Exploration Of The Immune Landscape Of Ebv-Associated Gastric Cancers, Mikhail Salnikov Jun 2023

Exploration Of The Immune Landscape Of Ebv-Associated Gastric Cancers, Mikhail Salnikov

Electronic Thesis and Dissertation Repository

Epstein–Barr virus (EBV) is a gammaherpesvirus associated with 9% of all gastric cancers (GCs). EBV-associated GCs (EBVaGCs) are pathologically and clinically distinct entities from EBV-negative GCs (EBVnGCs), with EBVaGCs exhibiting differential molecular pathology and patient prognosis. The purpose of this thesis is to investigate the tumor microenvironment (TME) of EBVaGCs, which has not been explored in-depth. We hypothesize that EBVaGCs and EBVnGCs are also distinct in terms of the molecular immune landscape. We employed over 400 stomach adenocarcinoma (STAD) samples from The Cancer Genome Atlas (TCGA), as well as a single cell dataset, for the construction of a web suite …


De Novo Sequencing Of Multiple Tandem Mass Spectra Of Peptide Containing Silac Labeling, Fang Han Mar 2023

De Novo Sequencing Of Multiple Tandem Mass Spectra Of Peptide Containing Silac Labeling, Fang Han

Electronic Thesis and Dissertation Repository

The systematic studies of proteins has gradually become fundamental in the research related to molecular biology. Shotgun proteomics use bottom-up proteomics techniques in identifying proteins contained in complex mixtures using a combination of high performance liquid chromatography coupled with mass spectrometry technology. Current mass spectrometers equipped with high sensitivity and accuracy can produce thousands of tandem mass spectrometry (MS/MS) spectra in a single run. The large amount of data collected in a single LC-MS/MS run requires effective computational approaches to automate the process of spectra interpretation. De novo peptide sequencing from tandem mass spectrometry (MS/MS) has emerged as an important …


Characterizing The Function Of B Cells That Accumulate In The Inflamed Central Nervous System In Anti-Myelin Autoimmunity, Lika Chowdhury Dec 2022

Characterizing The Function Of B Cells That Accumulate In The Inflamed Central Nervous System In Anti-Myelin Autoimmunity, Lika Chowdhury

Electronic Thesis and Dissertation Repository

While the role of autoimmune T cells has been extensively studied in anti-myelin

autoimmunity, little is known about the function of B cells in multiple sclerosis (MS), a chronic inflammatory disease of the central nervous system (CNS). B cells form clusters with T cells in the meninges directly adjacent to demyelinating lesions. Previous studies have shown that disease progression is dependent on the depletion of specific populations of B cells, but it is not clear which contributes to pathology or how. The purpose of this thesis is to characterize the population of meningeal B cells to determine how they differ …


Selection Pressure On Surface Exposed Virus Proteins, Sareh Bagherichimeh Dec 2022

Selection Pressure On Surface Exposed Virus Proteins, Sareh Bagherichimeh

Electronic Thesis and Dissertation Repository

Viral infection requires the interaction between virus surface-exposed (SE) proteins and host cell receptors. This can result in an “arms race” that is assumed to drive accelerated rates of evolution, and some well known examples of diversifying selection involve surface pro- teins (HIV-1 env, influenza hemagglutinin). We conducted a systematic analysis to determine whether this is truly a distinctive feature of SE virus proteins, in comparison to non-SE proteins encoded by the same genomes.

We obtained reference and all neighbour genomes of 52 human viruses from the NCBI Viral Genomes database. The coding sequences (CDS) of each genome extracted by …


Gene Regulatory Context Of Honey Bee Worker Sterility, Rahul Choorakkat Unnikrishnan Dec 2022

Gene Regulatory Context Of Honey Bee Worker Sterility, Rahul Choorakkat Unnikrishnan

Electronic Thesis and Dissertation Repository

Honey bee workers deactivate their ovaries and are functionally sterile when a queen is present in the colony. I adopt a bioinformatics approach to up-date a model transcriptional regulatory network (TRN) to study gene-regulatory processes that regulate fecundity in workers. On splitting the network, I obtained nine clusters and each cluster conformed to properties associated with real-world networks. Two of the nine clusters are enriched for 'sterility genes' and contained single well-connected hub genes (GB44769, ftz-f1). The genes in the two clusters were functionally enriched for nucleic acid binding (GO:0003676) and nucleotide binding (GO:0000166). I identified homologous genes for …


Capturing Within Host Hiv-1 Evolution Dynamics Using Simulation Methods, Emmanuel Wong Aug 2022

Capturing Within Host Hiv-1 Evolution Dynamics Using Simulation Methods, Emmanuel Wong

Electronic Thesis and Dissertation Repository

The persistent latent reservoir of long-lived cells carrying integrated HIV DNA is the source of reinfection upon treatment interruption, and a primary focus for cure research. The reservoir is difficult to study because these cells are relatively rare or located in tissues that are difficult to sample. Sequencing proviral DNA in the latent reservoir is an important source of information about reservoir establishment and persistence, especially from the presence of identical (clonal) sequences. I evaluated the relationship between select measures of these clonal sequences and drivers of reservoir persistence, e.g., clonal expansion, by implementing a simulation model of within-host HIV …


Towards More Complete Metagenomic Analyses Through Circularized Genomes And Conjugative Elements, Benjamin R. Joris Aug 2022

Towards More Complete Metagenomic Analyses Through Circularized Genomes And Conjugative Elements, Benjamin R. Joris

Electronic Thesis and Dissertation Repository

Advancements in sequencing technologies have revolutionized biological sciences and led to the emergence of a number of fields of research. One such field of research is metagenomics, which is the study of the genomic content of complex communities of bacteria. The goal of this thesis was to contribute computational methodology that can maximize the data generated in these studies and to apply these protocols human and environmental metagenomic samples.

Standard metagenomic analyses include a step for binning of assembled contigs, which has previously been shown to exclude mobile genetic elements, and I demonstrated that this phenomenon extends to all conjugative …


Manipulating The Root Mycobiome To Improve Plant Performance And Reduce Pathogen Pressure In Corn (Zea Mays), Noor F. Saeed Cheema Jun 2022

Manipulating The Root Mycobiome To Improve Plant Performance And Reduce Pathogen Pressure In Corn (Zea Mays), Noor F. Saeed Cheema

Electronic Thesis and Dissertation Repository

Crop yield often varies within a field of a single genetically uniform crop plant, with the causes presumed to be a mix of both biotic and abiotic factors. Manipulating crop root mycobiomes could potentially increase yield by reducing pathogen impacts and improving access to soil water and nutrients. This study aimed to identify different fungal inoculation treatments that could increase the growth of corn seedlings sown in low productivity soils to that in high productivity soils and shift the root mycobiome composition. Fungal inoculation treatments did not have significantly different root mycobiome composition than seedlings grown in low yield control …


Identification Of Dna Methylation Episignatures For Classification And Phenotype/Genotype Correlation In Mendelian Neurodevelopmental Disorders, John Reilly Apr 2022

Identification Of Dna Methylation Episignatures For Classification And Phenotype/Genotype Correlation In Mendelian Neurodevelopmental Disorders, John Reilly

Electronic Thesis and Dissertation Repository

ABSTRACT: Diagnosis for neurodevelopmental disorders poses numerous challenges, related to the lack of specific findings and limited understanding of clinical impact of the majority of genetic variation. Epigenomics mechanisms involve chemical modifications in DNA that involve a range of cellular mechanisms. DNA methylation is an epigenetic mechanism involving addition and removal of methyl groups to cytosine residues. These methylation signals form episignatures; patterns of methylation that can be used as biomarkers capable of differentiating neurodevelopmental disorders. EpiSigns have enabled molecular diagnosis of a number of genetic conditions, classification of variants of unknown significance, and provided insights into the pathophysiology of …


Visualization And Interpretation Of Protein Interactions, Dipanjan Chatterjee Apr 2021

Visualization And Interpretation Of Protein Interactions, Dipanjan Chatterjee

Electronic Thesis and Dissertation Repository

Visualization and interpretation of deep learning models' prediction is a very important area of research in machine learning nowadays. Researchers are not only focused on generating a model with good performance, but also they want to trust the model. Our aim in this thesis is to adapt existing interpretation methods to a protein-protein binding site prediction problem to visualize and understand the model's prediction and learning pattern.

We present three deep learning-based interpretation methods: sensitivity analysis, saliency map and integrated gradients to analyze the amino acid residues which create positive and negative relevance to the deep learning models' prediction. As …


Sequencing And Assembling The Nuclear Genome Of The Antarctic Psychrophilic Green Alga Chlamydomonas Sp. Uwo241: Unravelling The Evolution Of Cold Adaptation, Xi Zhang Jan 2021

Sequencing And Assembling The Nuclear Genome Of The Antarctic Psychrophilic Green Alga Chlamydomonas Sp. Uwo241: Unravelling The Evolution Of Cold Adaptation, Xi Zhang

Electronic Thesis and Dissertation Repository

DNA sequencing technologies have undergone tremendous advancements in recent years, but assembling, annotating, and analyzing a nuclear genome is still a huge undertaking, especially for small laboratory groups, partly because many eukaryotic genomes are repeat-rich and contain thousands of genes and introns. The Antarctic harbors a variety of algae that can withstand extreme cold but do not grow at warmer temperatures (psychrophiles), including the unicellular green alga Chlamydomonas sp. UWO241 (a.k.a. UWO241). Little is known, however, about how psychrophilic algae evolved from their respective mesophilic ancestors by adapting to particular cold environments. To present insights into this issue,I critically determined …


Deciphering The Ck2-Dependent Phosphoproteome And Its Integration With Regulatory Ptm Networks, Teresa Nunez De Villavicencio Diaz Nov 2020

Deciphering The Ck2-Dependent Phosphoproteome And Its Integration With Regulatory Ptm Networks, Teresa Nunez De Villavicencio Diaz

Electronic Thesis and Dissertation Repository

Protein functions are regulated by the post-translational addition of covalent modifications on certain amino acids. Depending on their distance within the 3-dimensional structure, addition/removal of individual post translational modifications (PTMs) can be impacted by others. This PTM interplay constitutes an essential regulatory mechanism that interconnects the molecular networks in the cell. Protein CK2, a clinically relevant acidophilic Ser/Thr kinase, may be responsible for 10-20% of the human phosphoproteome. Such estimates agree with the number of known substrates, which continues to expand. Furthermore, the demonstration that CK2 participates in hierarchical phosphorylation and has similar sequence determinants to caspases suggest extensive PTM …


Multiple Roles Of Nup1 In Arabidopsis Growth And Development, Raj K. Thapa Nov 2020

Multiple Roles Of Nup1 In Arabidopsis Growth And Development, Raj K. Thapa

Electronic Thesis and Dissertation Repository

The nuclear pore complex (NPC) is the gateway between the nucleus and cytoplasm, which provides the passage for transport of RNA, protein, and other molecules into and out of the nucleus. NPC is conserved across all eukaryotes and plays a vital role in various cellular processes. However, compared to other organisms, the study of NPC in plants is limited. Although more than 30 different types of nucleoporin proteins in the model plant Arabidopsis thaliana have been identified, none of those proteins has been studied in detail. In this thesis, I focused on one such protein named NUCLEOPORIN1 (NUP1) and investigated …


Pan-Cancer Analysis Of Telomerase Reverse Transcriptase (Tert) Isoforms, Mathushan Subasri Oct 2020

Pan-Cancer Analysis Of Telomerase Reverse Transcriptase (Tert) Isoforms, Mathushan Subasri

Electronic Thesis and Dissertation Repository

Reactivation of the multi-subunit ribonucleoprotein telomerase is the primary telomere maintenance mechanism in cancer, but it is rate-limited by the enzymatic component, telomerase reverse transcriptase (TERT). While regulatory in nature, TERT alternative splice variant/isoform regulation and functions are not fully elucidated and are further complicated by their highly diverse expression. In this thesis, I characterized TERT expression across normal and neoplastic tissues using TCGA and GTEx RNA-sequencing data. In doing so, I demonstrated the global overexpression and splicing shift towards full-length TERT in neoplastic tissue. Furthermore, my studies identified tumour subtype expression differences possibly regulated by subtype-specific characteristics, detailed heterogeneity …


Computational Methods For Predicting Protein-Protein Interactions And Binding Sites, Yiwei Li Aug 2020

Computational Methods For Predicting Protein-Protein Interactions And Binding Sites, Yiwei Li

Electronic Thesis and Dissertation Repository

Proteins are essential to organisms and participate in virtually every process within cells. Quite often, they keep the cells functioning by interacting with other proteins. This process is called protein-protein interaction (PPI). The bonding amino acid residues during the process of protein-protein interactions are called PPI binding sites. Identifying PPIs and PPI binding sites are fundamental problems in system biology.

Experimental methods for solving these two problems are slow and expensive. Therefore, great efforts are being made towards increasing the performance of computational methods.

We present DELPHI, a deep learning based program for PPI site prediction and SPRINT, an algorithmic …


Regulators Of Ectopic Calcification In A Mouse Model Of Dish: A Multi-Omics Perspective, Matthew A. Veras Jun 2020

Regulators Of Ectopic Calcification In A Mouse Model Of Dish: A Multi-Omics Perspective, Matthew A. Veras

Electronic Thesis and Dissertation Repository

Diffuse idiopathic skeletal hyperostosis (DISH) is a non-inflammatory spondyloarthropathy and the second most common form of arthritis characterized by formation of ectopic mineral along the spine. Pathological findings in DISH include regional calcification of the anterior longitudinal ligament, paraspinal connective tissues, and annulus fibrosus (AF) of the intervertebral disc (IVD). Clinical symptoms of DISH include increased spine stiffness, decreased spinal range of motion, and in severe cases dysphagia and spinal cord/nerve root compression. The molecular pathways responsible for DISH have not been delineated and as such, there are no disease-modifying treatments. Clinical treatment for DISH is limited to surgical resection …


B Cell Acute Lymphoblastic Leukemia Is Driven By Activating Janus Kinase Mutations Cooperating With Spi1 And Spib Deletions In A Murine Model, Michelle Lim Jun 2020

B Cell Acute Lymphoblastic Leukemia Is Driven By Activating Janus Kinase Mutations Cooperating With Spi1 And Spib Deletions In A Murine Model, Michelle Lim

Electronic Thesis and Dissertation Repository

B cell acute lymphoblastic leukemia (B-ALL) is caused by genetic lesions in developing B cells that function as drivers for accumulation of additional mutations in an evolutionary selection process. We investigated secondary drivers of leukemogenesis and their mechanism(s) of arising in a mouse model of B-ALL driven by PU.1/Spi-B deletion (Mb1-CreDPB). Whole exome sequencing revealed recurrent mutations in Jak3 (encoding Janus Kinase 3) and Jak1. Mutations with high variant allele frequency (VAF) were dominated by C->T transition mutations that were compatible with AID, whereas the majority of mutations, with low VAF, were dominated by C->A transversions associated with …


Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa Jun 2020

Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa

Electronic Thesis and Dissertation Repository

In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, naming, and grouping of organisms based on their similarities and differences. The problem of taxonomic classification is of immense importance considering that nearly 86% of existing species on Earth and 91% of marine species remain unclassified. Due to the magnitude of the datasets, the need exists for an approach and software tool that is scalable enough to handle large datasets and can be used for rapid sequence comparison and analysis. We propose ML-DSP, a stand-alone alignment-free software tool that uses Machine Learning and Digital Signal Processing to …


Designing A Novel Hiv-1 Candidate Vaccine, Rahul Pawa Apr 2020

Designing A Novel Hiv-1 Candidate Vaccine, Rahul Pawa

Electronic Thesis and Dissertation Repository

Currently no vaccine has been developed that can prevent the spread of HIV-1. During sexual transmission, a single viral variant called the Transmitted/Founder (T/F) purportedly with unique physical properties, establishes infection in 70-80% of individuals. Unlike previous studies that have tried to identify T/F viruses based on their structure glycan composition and amino acid sequence, we have analyzed the RNA sequences of HIV-1 to help identify T/F variants. Using a combination of both in silico data analysis and in vitro assays, we have identified that T/F viruses have higher numbers of immunostimulatory motifs than HIV virions that fail to infect. …


Mhcherrypan, A Novel Model To Predict The Binding Affinity Of Pan-Specific Class I Hla-Peptide, Xuezhi Xie Apr 2020

Mhcherrypan, A Novel Model To Predict The Binding Affinity Of Pan-Specific Class I Hla-Peptide, Xuezhi Xie

Electronic Thesis and Dissertation Repository

The human leukocyte antigen (HLA) system or complex plays an essential role in regulating the immune system in humans. Accurate prediction of peptide binding with HLA can efficiently help to identify those neoantigens, which potentially make a big difference in immune drug development. HLA is one of the most polymorphic genetic systems in humans, and thousands of HLA allelic versions exist. Due to the high polymorphism of HLA complex, it is still pretty difficult to accurately predict the binding affinity. In this thesis, we presented a new algorithm to combine convolutional neural network and long short-term memory to solve this …


Mushroom Body-Specific Gene Regulation By The Swi/Snf Chromatin Remodeling Complex, Kevin Cj Nixon Feb 2020

Mushroom Body-Specific Gene Regulation By The Swi/Snf Chromatin Remodeling Complex, Kevin Cj Nixon

Electronic Thesis and Dissertation Repository

Over the lifetime of an organism, neurons must establish, remodel, and maintain precise connections in order to form neural circuits that are required for proper nervous system functioning. Disruptions in these processes can lead to neurodevelopmental disorders such as intellectual disability (ID) and autism spectrum disorder. Mutations in genes encoding subunits of the SWI/SNF chromatin remodeling complex have been implicated in ID, yet the role of this complex in neurons is poorly understood. In this project, I established cell-type specific methods to examine the effect of SWI/SNF subunit knockdowns on gene transcription and chromatin structure in the memory-forming neurons of …


Cross-Species Utility Of The Mouse Diversity Genotyping Array In Assaying Single Nucleotide Polymorphisms, Rachel Kelly Aug 2019

Cross-Species Utility Of The Mouse Diversity Genotyping Array In Assaying Single Nucleotide Polymorphisms, Rachel Kelly

Electronic Thesis and Dissertation Repository

In the study of genetic diversity in non-model species there is a notable lack of the low-cost, high resolution tools that are readily available for model organisms. Genotyping microarray technology for model organisms is well-developed, affordable, and potentially adaptable for cross-species hybridization. The Mouse Diversity Genotyping Array (MDGA), a single nucleotide polymorphism (SNP) genotyping tool designed for M. musculus, was tested as a tool to survey genomic diversity of wild species for inter-order, inter-family, inter-genus, and intra-genus comparisons. Application of the MDGA cross-species provides genetic distance information that reflects known taxonomic relationships reported previously between non-model species, but there …


Novel Insights Into The Genomic Integration Site Landscape Of Hiv-1 And Other Retrovirus Genera, Hinissan P. Kohio Jan 2019

Novel Insights Into The Genomic Integration Site Landscape Of Hiv-1 And Other Retrovirus Genera, Hinissan P. Kohio

Electronic Thesis and Dissertation Repository

An important event during infection by retroviruses such as human immunodeficiency virus type 1 (HIV-1) is the permanent integration of the viral genome into the host genome. This event leads to life-long infection and is accompanied by a period of quiescence/latency ranging from a few years to >10 years where HIV-1 expression is barely detectable or undetectable. Despite the use of combination antiretroviral therapy (cART) which controls HIV-1 infection, quiescent/latent virus presents a major obstacle towards a functional cure. Integration site location in the genome is thought to contribute to latent infections and has the potential to confound anti-latency treatments, …


The Role Of H3k4 Methyltransferases In Drosophila Memory, Nicholas Raun Jan 2019

The Role Of H3k4 Methyltransferases In Drosophila Memory, Nicholas Raun

Electronic Thesis and Dissertation Repository

Gene transcription required for long-term memory requires the modification of histones. However, there are still many uncertainties about the identity and spatial expression of genes regulated by histone modifications during memory related processes. In this project I examined the role of Drosophila melanogaster methyltransferases Set1 and trx in courtship memory. Genetic knockdown of Set1 and trx in the mushroom body (MB) revealed that Set1 was necessary for short- and long-term memory, while trx was only required for long-term memory. Transcriptional profiling of MBs following trx-knockdown revealed expression changes in MB-enriched genes and genes involved in RNA processing. Among the …


Investigating The Role Of Brachypodium Distachyon Cellulose Synthase 8 In Gluconacetobacter Diazotrophicus Colonization, Xuan Yang Dec 2018

Investigating The Role Of Brachypodium Distachyon Cellulose Synthase 8 In Gluconacetobacter Diazotrophicus Colonization, Xuan Yang

Electronic Thesis and Dissertation Repository

Nitrogen is an essential nutrient for plant growth. Significant amount of nitrogen fertilizer is applied to crop field to maintain high yield. Alternatives to chemical nitrogen fertilizer are needed to reduce the costs of crop production and offset environmental damage. Gluconacetobacter diazotrophicus is a nitrogen fixing bacterium that was originally isolated from sugarcane and has been proposed as a possible biofertilizer for monocot crop production. However, the colonization of G. diazotrophicus in most monocot crops is limited and deep understanding of the response of the host plants to G. diazotrophicus colonization is still lacking. In this study, research was conducted …


Spatial And Temporal Patterns Of Neutral And Adaptive Genetic Variation In The Alpine Butterfly, Parnassius Smintheus, Maryam Jangjoo Nov 2018

Spatial And Temporal Patterns Of Neutral And Adaptive Genetic Variation In The Alpine Butterfly, Parnassius Smintheus, Maryam Jangjoo

Electronic Thesis and Dissertation Repository

Understanding how much genetic diversity exists in populations, and the processes that maintain that diversity, has been a central focus of population genetics. The evolutionary processes that determine patterns of genetic diversity depend on underlying ecological processes such as dispersal and changes in population size. In this thesis, I examine the influence of dispersal and population dynamics on neutral and adaptive genetic variation in a naturally occurring network of populations of the alpine butterfly, Parnassius smintheus.

My first objective was to determine the combined consequences of demographic bottlenecks and dispersal on neutral genetic variation within and among populations. Using …


Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes Oct 2018

Dna Sequence Classification: It’S Easier Than You Think: An Open-Source K-Mer Based Machine Learning Tool For Fast And Accurate Classification Of A Variety Of Genomic Datasets, Stephen Solis-Reyes

Electronic Thesis and Dissertation Repository

Supervised classification of genomic sequences is a challenging, well-studied problem with a variety of important applications. We propose an open-source, supervised, alignment-free, highly general method for sequence classification that operates on k-mer proportions of DNA sequences. This method was implemented in a fully standalone general-purpose software package called Kameris, publicly available under a permissive open-source license. Compared to competing software, ours provides key advantages in terms of data security and privacy, transparency, and reproducibility. We perform a detailed study of its accuracy and performance on a wide variety of classification tasks, including virus subtyping, taxonomic classification, and human haplogroup assignment. …