Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

Theses/Dissertations

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 37

Full-Text Articles in Computational Biology

Predicting Marine Teleost Responses To Ocean Warming And Pollution, Akila Harishchandra Aug 2023

Predicting Marine Teleost Responses To Ocean Warming And Pollution, Akila Harishchandra

Electronic Theses and Dissertations

Ocean warming and pollution are two detrimental anthropogenic factors causing rapid marine ecosystem degradation recorded in the past decades. These factors alter the marine environment intolerable for many marine species, forcing them to either adapt or shift their contemporary habitat ranges to reduce the extinction risk embedded with environmental degradation. Estimating marine species’ habitat range shifts, and their potential for developing adaptive mechanisms are critical for ecosystem conservation and management, human health risk assessment, and climate change vulnerability assessments. Given that, for the first chapter of this thesis, we focused on developing a species distribution model (SDM) integrating marine species …


Exploration Of The Immune Landscape Of Ebv-Associated Gastric Cancers, Mikhail Salnikov Jun 2023

Exploration Of The Immune Landscape Of Ebv-Associated Gastric Cancers, Mikhail Salnikov

Electronic Thesis and Dissertation Repository

Epstein–Barr virus (EBV) is a gammaherpesvirus associated with 9% of all gastric cancers (GCs). EBV-associated GCs (EBVaGCs) are pathologically and clinically distinct entities from EBV-negative GCs (EBVnGCs), with EBVaGCs exhibiting differential molecular pathology and patient prognosis. The purpose of this thesis is to investigate the tumor microenvironment (TME) of EBVaGCs, which has not been explored in-depth. We hypothesize that EBVaGCs and EBVnGCs are also distinct in terms of the molecular immune landscape. We employed over 400 stomach adenocarcinoma (STAD) samples from The Cancer Genome Atlas (TCGA), as well as a single cell dataset, for the construction of a web suite …


Mining Sars-Cov-2 Phylogenetic Trees To Estimate Circulating Infections And Patterns Of Migration, Erin V. Brintnell Jun 2023

Mining Sars-Cov-2 Phylogenetic Trees To Estimate Circulating Infections And Patterns Of Migration, Erin V. Brintnell

Electronic Thesis and Dissertation Repository

The SARS-CoV-2 pandemic led to the formation of very large databases of genomic viral data. These databases contain information on transmission dynamics, emergence and evolution of SARS-CoV-2. However, extracting this information from sequences is difficult, as most methods of analyzing viral genomes were developed for smaller data sets. Therefore, my objective was to develop new fast estimators of the number of infections (I) and the rate of migration based on simple features of SARS-CoV-2 phylogenies.

I simulated pathogen evolution using a susceptible-exposed-infectious-recovered (SEIR) model of pathogen spread, reconstructing evolution using CoVizu. For simulations of I, I varied the total number …


Integrating Omim And Intact Data For The Analysis Of Gene-Phenotype Interactions In Complex Diseases: A Linux-Based Computational Tool For Network Analysis, Devin Keane May 2023

Integrating Omim And Intact Data For The Analysis Of Gene-Phenotype Interactions In Complex Diseases: A Linux-Based Computational Tool For Network Analysis, Devin Keane

All Theses

The field of genetics is constantly evolving. New advances in bioinformatics and computational approaches are leading to exciting new developments in our ability to treat and prevent diseases. Computational genetics provides valuable insights into the complex mechanisms and layers of biological communication that shape an organism's phenotype. Understanding these mechanisms is critical to advancing human health.

The study of diseases in genetics requires a comprehensive understanding of the interactions between various biological processes, including gene expression, protein synthesis, RNA, metabolism, and cell-cell communication. To effectively address the root causes of such diseases, multi-disciplinary approaches that integrate information from different levels …


The Genomics Of Autism-Related Genes Il1rapl1 And Il1rapl2: Insights Into Their Cortical Distribution, Cell-Type Specificity, And Developmental Trajectories, Jacob Weaver Apr 2023

The Genomics Of Autism-Related Genes Il1rapl1 And Il1rapl2: Insights Into Their Cortical Distribution, Cell-Type Specificity, And Developmental Trajectories, Jacob Weaver

MUSC Theses and Dissertations

Neuropsychiatric disorders have a significant impact on modern society. These disorders affect a large percentage of the population: schizophrenia has a world-wide prevalence of 1% and autism spectrum disorders (ASD) affects 1 in 59 school-aged children in the US. There is substantial evidence that most neuropsychiatric disorders have a genetic component. Thus, with the advent of high throughput sequencing much effort has gone into identifying genetic variants associated with these disorders. The emerging picture from these studies is a complex one where hundreds of genes with small effects interact with a varied landscape of common variants to result in disease. …


Towards More Complete Metagenomic Analyses Through Circularized Genomes And Conjugative Elements, Benjamin R. Joris Aug 2022

Towards More Complete Metagenomic Analyses Through Circularized Genomes And Conjugative Elements, Benjamin R. Joris

Electronic Thesis and Dissertation Repository

Advancements in sequencing technologies have revolutionized biological sciences and led to the emergence of a number of fields of research. One such field of research is metagenomics, which is the study of the genomic content of complex communities of bacteria. The goal of this thesis was to contribute computational methodology that can maximize the data generated in these studies and to apply these protocols human and environmental metagenomic samples.

Standard metagenomic analyses include a step for binning of assembled contigs, which has previously been shown to exclude mobile genetic elements, and I demonstrated that this phenomenon extends to all conjugative …


Methods And Tools To Improve Performance Of Plant Genome Analysis, Drew Ferrell Aug 2022

Methods And Tools To Improve Performance Of Plant Genome Analysis, Drew Ferrell

Theses and Dissertations

Multi -omics data analysis and integration facilitates hypothesis building toward an understanding of genes and pathway responses driven by environments. Methods designed to estimate and analyze gene expression, with regard to treatments or conditions, can be leveraged to understand gene-level responses in the cell. However, genes often interact and signal within larger structures such as pathways and networks. Complex studies guided toward describing dynamic genetic pathways and networks require algorithms or methods designed for inference based on gene interactions and related topologies. Classes of algorithms and methods may be integrated into generalized workflows for comparative genomics studies, as multi -omics …


Modeling Electrostatics In Molecular Biology And Its Relevance With Molecular Mechanisms Of Diseases, Mahesh Koirala Aug 2022

Modeling Electrostatics In Molecular Biology And Its Relevance With Molecular Mechanisms Of Diseases, Mahesh Koirala

All Dissertations

Electrostatics plays an essential role in molecular biology. Modeling electrostatics in molecular biology is complicated due to the water phase, mobile ions, and irregularly shaped inhomogeneous biological macromolecules. This dissertation presents the popular DelPhi package that solves PBE and delivers the electrostatic potential distribution of biomolecules. We used the newly developed DelPhiForce steered Molecular Dynamics (DFMD) approach to model the binding of barstar to barnase and demonstrated that the first-principles method could also model the binding. This dissertation also reflects the use of existing computational approaches to model the effects of Single Amino Acid Variations (SAVs) to reveal molecular mechanisms …


Characterizing Endogenous Dicer Products To Unravel Novel Rnai Biogenesis Pathways, Jacob Oche Peter Jun 2022

Characterizing Endogenous Dicer Products To Unravel Novel Rnai Biogenesis Pathways, Jacob Oche Peter

Dissertations

ABSTRACT

RNA interference (RNAi) is a pervasive gene regulatory mechanism in eukaryotes based on the action of multiple classes of small RNA (sRNA). Exploiting RNAi pathways in non-model systems have great potential for creating potent RNAi technologies. Here, we accessed RNAi-mediated control of gene expression in the two-spotted spider mite, Tetranychus urticae (T. urticae) using engineered dsRNA designed to modulate the host RNAi pathway and increase RNAi efficacy. Analysis of Dicer (Dcr) generated fragments revealed how exogenous RNAs access the host RNAi pathway in this animal, opening avenues for designing RNAi technology for their control. Further, some organisms …


An Investigation Of Epigenetic Mechanisms Driving The Biology Of Head And Neck Squamous Cell Carcinoma, Scot Carson Callahan May 2022

An Investigation Of Epigenetic Mechanisms Driving The Biology Of Head And Neck Squamous Cell Carcinoma, Scot Carson Callahan

Dissertations & Theses (Open Access)

Head and neck squamous cell carcinoma (HNSCC) is the 6th most common cancer worldwide and is associated with significant morbidity and mortality. To date, the majority of work in the field has focused on genomic alterations such as mutations and copy number alterations. However, the clinical success of targeted therapies that exploit known genomic alterations, such as EGFR mutations, has remained mixed. Over the past decade, the importance of epigenetic regulators has come to the forefront, with the realization that many of these genes are mutated in cancer. Despite this realization, the role of epigenetics in regulating tumorigenesis, progression and …


Unveiling Global Roles Of G-Quadruplexes And G4-22 In Human Genetics, Ruth Barros De Paula Aug 2021

Unveiling Global Roles Of G-Quadruplexes And G4-22 In Human Genetics, Ruth Barros De Paula

Dissertations & Theses (Open Access)

G-quadruplexes are non-B DNA structures formed by four or more runs of repeated guanines that confer unique features to living organism’s genomes. These sequences are enriched in regulatory regions, such as promoters and 5’ UTRs, and have distinct regulatory roles in both health and disease states. Even though previous studies showed the impact of G4 in gene expression, none of them summarized the location-specific effect of G4. Also, there is no broad understanding about the most common G4 repeat in the human genome, named here as G4-22, and how it links to the evolution of mammals and their biology. In …


Comparative Genomics Methods And Applications, Emily N. Alden Jul 2021

Comparative Genomics Methods And Applications, Emily N. Alden

Biomedical Sciences ETDs

Virtually all fields of biology have benefited from the advancements in comparative genomics technologies, specifically in the study of evolution. In this dissertation I develop and use comparative genomic technologies to investigate the novel SARS-CoV-2 virus, assembly the first genome of the black lace domestic angelfish and identify germline genetic variants associated with altered breast cancer-specific survival. Our genome tiling array for the novel coronavirus presents a rapid and cost-effective method to sequence the entire viral genome and can be used to track the rapid evolution of viral variants in the population. The domestic angelfish is a member of the …


Ensemble Protein Inference Evaluation, Kyle Lee Lucke Jan 2021

Ensemble Protein Inference Evaluation, Kyle Lee Lucke

Graduate Student Theses, Dissertations, & Professional Papers

The Protein inference problem is becoming an increasingly important tool that aids in the characterization of complex proteomes and analysis of complex protein samples. In bottom-up shotgun proteomics experiments the metrics for evaluation (like AUC and calibration error) are based on an often imperfect target-decoy database. These metrics make the inherent assumption that all of the proteins in the target set are present in the sample being analyzed. In general, this is not the case, they are typically a mix of present and absent proteins. To objectively evaluate inference methods, protein standard datasets are used. These datasets are special in …


Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani Jan 2021

Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani

Theses and Dissertations (Comprehensive)

As new techniques have been introduced, specifically the possibility of complete genome sequencing, better methods of defining bacterial species have also been proposed. One of the most recently proposed methods, using bioinformatic techniques, is to calculate the average nucleotide identity (ANI) between the homologous genome segments of different isolates. Another method for species discrimination that has been tested successfully is the similarity of DNA compositional signatures. However, in a recent update, DNA signatures split the available Escherichia coli complete genomes into three groups. To check if this result was consistent with such genomes belonging to different species, we tested methods …


Distribution And Diversity Of Heliothine And Other Lepidopteran Nudiviruses, Emrah Ozel Jan 2021

Distribution And Diversity Of Heliothine And Other Lepidopteran Nudiviruses, Emrah Ozel

Theses and Dissertations--Entomology

Helicoverpa zea nudivirus 2 (HzNV-2) is the only known sterilizing and sexually-transmitted insect virus and causes pathological symptoms in H. zea reproductive tissues. HzNV-2 has features that make it a candidate as a H. zea (corn earworm) control agent, such as the ability to cause asymptomatic (latent) and symptomatic (lytic) infections and the ability to influence mating behavior of its host to favor virus spread. HzNV pathology has been studied and its genome sequenced, however, its prevalence in natural populations is largely unknown. In this study, we developed and used a low-cost PCR-based molecular survey to investigate HzNV-2 prevalence and …


Investigation Of Proliferation Suppressors In Genetic Fitness Screens, Walter Frank Lenoir Iv Dec 2020

Investigation Of Proliferation Suppressors In Genetic Fitness Screens, Walter Frank Lenoir Iv

Dissertations & Theses (Open Access)

Innovation of CRISPR gene-editing technology has provided scientists genome manipulation tools that allowed rapid advancement of scientific capabilities and thus improved our ability to systematically study mammalian genetic functional profiles. Genome-wide CRISPR knockout screens conducted in collections of human cell lines can knock out genes at multiple loci, and have provided new insights into functional roles for independent genes. This method has launched massive efforts in looking across genetic backgrounds for context specific genetic vulnerabilities within cancer. Much of the research effort thus far has been spent on optimizing phenotype distinctions between essential, genes required for cell fitness, and non-essential, …


Decoding The Evolutionary Response To Prostate Cancer Therapy Using Plasma Genome Sequencing, Naveen Ramesh Dec 2020

Decoding The Evolutionary Response To Prostate Cancer Therapy Using Plasma Genome Sequencing, Naveen Ramesh

Dissertations & Theses (Open Access)

Investigating genome evolution in response to therapy is difficult in human tissue samples due to the difficulty in accessing metastatic tumor sites and logistical challenges of collecting longitudinal samples. To overcome these issues, we developed an unbiased whole-genome plasma DNA sequencing approach called PEGASUS that concurrently measures genomic copy number and exome mutations from archival cryostored plasma samples. This approach was applied to study longitudinal blood plasma samples from prostate cancer patients. A molecular characterization of archival plasma DNA from 233 patients and genomic profiling of 101 patients identified clinical correlations of aneuploid plasma DNA profiles with poor survival, increased …


Deciphering The Ck2-Dependent Phosphoproteome And Its Integration With Regulatory Ptm Networks, Teresa Nunez De Villavicencio Diaz Nov 2020

Deciphering The Ck2-Dependent Phosphoproteome And Its Integration With Regulatory Ptm Networks, Teresa Nunez De Villavicencio Diaz

Electronic Thesis and Dissertation Repository

Protein functions are regulated by the post-translational addition of covalent modifications on certain amino acids. Depending on their distance within the 3-dimensional structure, addition/removal of individual post translational modifications (PTMs) can be impacted by others. This PTM interplay constitutes an essential regulatory mechanism that interconnects the molecular networks in the cell. Protein CK2, a clinically relevant acidophilic Ser/Thr kinase, may be responsible for 10-20% of the human phosphoproteome. Such estimates agree with the number of known substrates, which continues to expand. Furthermore, the demonstration that CK2 participates in hierarchical phosphorylation and has similar sequence determinants to caspases suggest extensive PTM …


Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa Jun 2020

Machine Learning With Digital Signal Processing For Rapid And Accurate Alignment-Free Genome Analysis: From Methodological Design To A Covid-19 Case Study, Gurjit Singh Randhawa

Electronic Thesis and Dissertation Repository

In the field of bioinformatics, taxonomic classification is the scientific practice of identifying, naming, and grouping of organisms based on their similarities and differences. The problem of taxonomic classification is of immense importance considering that nearly 86% of existing species on Earth and 91% of marine species remain unclassified. Due to the magnitude of the datasets, the need exists for an approach and software tool that is scalable enough to handle large datasets and can be used for rapid sequence comparison and analysis. We propose ML-DSP, a stand-alone alignment-free software tool that uses Machine Learning and Digital Signal Processing to …


The Evolution Of Bivalve Shell Matrix Proteins, Mark Ira Duhon Ii May 2020

The Evolution Of Bivalve Shell Matrix Proteins, Mark Ira Duhon Ii

LSU Doctoral Dissertations

This dissertation focuses on the molecular underpinnings surrounding the evolution of the biomineralized shells of marine bivalves. Bivalve molluscs synthesize remarkably complex shells from calcium carbonate and an organic matrix of proteins secreted from the dorsal edge of the mantle. Molecular analyses of shell matrix proteins (SMPs) have suggested high rates of gene turnover despite the conserved nature of the shell itself. Here, I used proteomic and transcriptomic data to identify the SMPs and other biomineralization proteins from seven bivalve species that diverged 3-513 Mya. Contrary to previous studies that identified only a few shared biomineralization transcripts across the Bivalvia, …


Computational Genomic Models For Spatio-Temporal Investigation Of Early Lung Cancer Pathology, Smruthy Sivakumar May 2019

Computational Genomic Models For Spatio-Temporal Investigation Of Early Lung Cancer Pathology, Smruthy Sivakumar

Dissertations & Theses (Open Access)

Lung cancer, of which non-small cell lung cancer (NSCLC) is the most common form, is the second most prevalent cancer and the leading cause of cancer-related deaths. NSCLCs primarily comprise adenocarcinomas (LUAD) and squamous cell carcinomas (LUSC). Advances in early detection and prevention have been limited by the lack of early-stage biomarkers and targets. A comprehensive molecular characterization of premalignant lesions and tumor-adjacent normal tissue can aid in better understanding NSCLC pathogenesis. However, these investigations are further challenged by limited tissue availability and low cellular fractions of detectable somatic mutations.

Therefore, there is a dearth of knowledge about the pathogenesis …


Microbial Ecology Of South Florida Surface Waters: Examining The Potential For Anthropogenic Influences, Chase P. Donnelly Aug 2018

Microbial Ecology Of South Florida Surface Waters: Examining The Potential For Anthropogenic Influences, Chase P. Donnelly

HCNSO Student Theses and Dissertations

South Florida contains one of the largest subtropical wetlands in the world, and yet not much is known about the microbes that live in these surface waters. These microbes play an important role in chemical cycling and maintaining good water quality for both human and ecosystem health. The hydrology of Florida’s surface waters is tightly regulated with the use of canal and levee systems run by the US Army Corps of Engineers and The South Florida Water Management District. These canals run through the Everglades, agriculture, and urban environments to control water levels in Lake Okeechobee, the Water Conservation Areas, …


Efficient Alignment Algorithms For Dna Sequencing Data, Nilesh Vinod Khiste Jan 2018

Efficient Alignment Algorithms For Dna Sequencing Data, Nilesh Vinod Khiste

Electronic Thesis and Dissertation Repository

The DNA Next Generation Sequencing (NGS) technologies produce data at a low cost, enabling their application to many ambitious fields such as cancer research, disease control, personalized medicine etc. However, even after a decade of research, the modern aligners and assemblers are far from providing efficient and error free genome alignments and assemblies respectively. This is due to the inherent nature of the genome alignment and assembly problem, which involves many complexities. Many algorithms to address this problem have been proposed over the years, but there still is a huge scope for improvement in this research space.

Many new genome …


Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal Aug 2017

Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal

University of New Orleans Theses and Dissertations

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that …


Identification Of Novel Sleep Related Genes From Large Scale Phenotyping Experiments In Mice, Shreyas Joshi Jan 2017

Identification Of Novel Sleep Related Genes From Large Scale Phenotyping Experiments In Mice, Shreyas Joshi

Theses and Dissertations--Biology

Humans spend a third of their lives sleeping but very little is known about the physiological and genetic mechanisms controlling sleep. Increased data from sleep phenotyping studies in mouse and other species, genetic crosses, and gene expression databases can all help improve our understanding of the process. Here, we present analysis of our own sleep data from the large-scale phenotyping program at The Jackson Laboratory (JAX), to identify the best gene candidates and phenotype predictors for influencing sleep traits.

The original knockout mouse project (KOMP) was a worldwide collaborative effort to produce embryonic stem (ES) cell lines with one of …


Network Analytics For The Mirna Regulome And Mirna-Disease Interactions, Joseph Jayakar Nalluri Jan 2017

Network Analytics For The Mirna Regulome And Mirna-Disease Interactions, Joseph Jayakar Nalluri

Theses and Dissertations

miRNAs are non-coding RNAs of approx. 22 nucleotides in length that inhibit gene expression at the post-transcriptional level. By virtue of this gene regulation mechanism, miRNAs play a critical role in several biological processes and patho-physiological conditions, including cancers. miRNA behavior is a result of a multi-level complex interaction network involving miRNA-mRNA, TF-miRNA-gene, and miRNA-chemical interactions; hence the precise patterns through which a miRNA regulates a certain disease(s) are still elusive. Herein, I have developed an integrative genomics methods/pipeline to (i) build a miRNA regulomics and data analytics repository, (ii) create/model these interactions into networks and use optimization techniques, motif …


Punctuated Evolution Within A Eurythermic Genus (Mesenchytraeus) Of Segmented Worms: Genetic Modification Of The Glacier Ice Worm F1f0 Atp Synthase, Shirley A. Lang Dec 2016

Punctuated Evolution Within A Eurythermic Genus (Mesenchytraeus) Of Segmented Worms: Genetic Modification Of The Glacier Ice Worm F1f0 Atp Synthase, Shirley A. Lang

Graduate School of Biomedical Sciences Theses and Dissertations

Segmented worms (Annelida) are among the most successful animal inhabitants of extreme environments worldwide. An unusual group of Mesenchytraeus worms endemic to the Pacific Northwest of North America occupy geographically proximal ecozones ranging from low elevation temperate rainforests to high altitude glaciers. Along this altitudinal transect, Mesenchytraeus representatives from disparate habitat types were collected and subjected to deep mitochondrial and nuclear phylogenetic analyses. Evidence presented here employing modern bioinformatic analyses (i.e., maximum likelihood, Bayesian inference, multi-species coalescent) supports a Mesenchytraeus “explosion” in the upper Miocene (5-10 million years ago) that gave rise to ice, snow and terrestrial worms, derived from …


Development Of An In Silico Kir Genotyping Algorithm And Its Application To Population And Cancer Immunogenetic Analyses, Howard Rosoff Aug 2016

Development Of An In Silico Kir Genotyping Algorithm And Its Application To Population And Cancer Immunogenetic Analyses, Howard Rosoff

Dissertations & Theses (Open Access)

Gene content determination and variant calling in the complex KIR genomic region are useful for immune system function analysis, pathogenesis and disease risk factor elucidation, immunotherapy development, evolutionary investigations, and human migration modeling. Sequence-specific oligonucleotide and sequence-specific primer PCR methods are the de facto standards for KIR presence/absence identification, but the current platforms are unsuitable for SNP calling, impractical for KIR typing large cohorts of DNA samples, and inapplicable for typing repositories in which sequence data, but not cells or cell analytes, are available. Alternative typing methods, such as in silico sequence-based typing, can address the problems associated with amplicon-based …


Computational Identification Of Terpene Synthase Genes And Their Evolutionary Analysis, Qidong Jia May 2016

Computational Identification Of Terpene Synthase Genes And Their Evolutionary Analysis, Qidong Jia

Doctoral Dissertations

Terpenoids, the largest and most structurally and functionally diverse class of natural compounds on earth, are mostly synthesized by plants to be involved in various plant environment interactions. Some terpenoids are classified as primary metabolites essential for plant growth and development. Terpene synthases (TPSs), the key enzymes for terpenoid biosynthesis, are the major determinant of the tremendous diversity of terpenoid carbon skeletons. The TPS genes represent a mid-size family of about 30-100 functional genes in almost all major sequenced plant genomes. TPSs are also found in fungi and bacteria, but microbial TPS genes share low levels of sequence similarity and …


A Pipeline For Creation Of Genome-Scale Metabolic Reconstructions, Shaun W. Norris Jan 2016

A Pipeline For Creation Of Genome-Scale Metabolic Reconstructions, Shaun W. Norris

Theses and Dissertations

The decreasing costs of next generation sequencing technologies and the increasing speeds at which they work have lead to an abundance of 'omic datasets. The need for tools and methods to analyze, annotate, and model these datasets to better understand biological systems is growing. Here we present a novel software pipeline to reconstruct the metabolic model of an organism in silico starting from its genome sequence and a novel compilation of biological databases to better serve the generation of metabolic models. We validate these methods using five Gardnerella vaginalis strains and compare the gene annotation results to NCBI and the …