Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Genetics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 57

Full-Text Articles in Bioinformatics

Mmappr2: An Improved Bioinformatics Approach To Find Novel Genes, Aiden Cardall, Jonathon T. Hill, Kyle Johnsen, Connor Ward, Maliha Tasnim, Jared Taylor Mar 2024

Mmappr2: An Improved Bioinformatics Approach To Find Novel Genes, Aiden Cardall, Jonathon T. Hill, Kyle Johnsen, Connor Ward, Maliha Tasnim, Jared Taylor

Library/Life Sciences Undergraduate Poster Competition 2024

Introduction

• New genes are commonly found by randomly inducing mutations in model organisms.

• Mapping the mutations to the genome to find novel genes is difficult, time-consuming, and expensive.

• We created a bioinformatics program, MMAPPR, to automate this process.

• Here, we introduce a new algorithm, MMAPPR2, which requires little to no bioinformatics knowledge to use.

• MMAPPR2 makes several improvements that allow it to identify genes more rapidly and precisely.

• MMAPPR2 will aid the rapid identification of genes in a wide range of species and developmental systems.


The Detection Of Putative Recessive Lethal Haplotypes In Irish Sheep Populations, Rory Mcauley Nov 2023

The Detection Of Putative Recessive Lethal Haplotypes In Irish Sheep Populations, Rory Mcauley

ORBioM (Open Research BioSciences Meeting)

In livestock populations, recessive lethal alleles are a known contributor to poor reproductive performance due to embryonic death in homozygous individuals. Despite their lethal effect in the recessive form, these alleles may be maintained at high frequencies among carrier animals because of their positive pleiotropic effects on economically important traits. Although several such recessive alleles have been identified in cattle and pig populations, limited studies have been completed in sheep, and none within Irish sheep populations. Genotype data for 69,034 animals from five major Irish sheep breeds genotyped on a variety of panels was available for this study. Only animals …


The Use Of Prognostic Markers To Predict Disease Progression And Clinical Outcome In Monoclonal Gammopathy Of Undetermined Significance, Smouldering Multiple Myeloma And Multiple Myeloma., Róisín C. Mcmonagle Sep 2023

The Use Of Prognostic Markers To Predict Disease Progression And Clinical Outcome In Monoclonal Gammopathy Of Undetermined Significance, Smouldering Multiple Myeloma And Multiple Myeloma., Róisín C. Mcmonagle

International Undergraduate Journal of Health Sciences

Multiple Myeloma (MM) is an incurable plasma cell malignancy with a complex and incompletely understood molecular pathogenesis. Monoclonal Gammopathy of Undetermined Significance (MGUS) and Smouldering Multiple Myeloma (SMM) precede MM, with variable risks and rates of disease progression. The continuing high relapse and death rate in MM cases has prompted research into more accurate prognostic markers to predict progression from MGUS and SMM to MM, as well as identify MM cases with aggressive disease, in order to begin early, targeted and effective therapeutic intervention. Many studies have focused on utilising current markers more effectively, including M-protein, serum-free light chain ratio, …


Annotation Of Non-Model Species’ Genomes, Taiya Jarva Jul 2023

Annotation Of Non-Model Species’ Genomes, Taiya Jarva

Master's Theses

The innovations in high throughput sequencing technologies in recent decades has allowed unprecedented examination and characterization of the genetic make-up of both model and non-model species, which has led to a surge in the use of genomics in fields which were previously considered unfeasible. These advances have greatly expanded the realm of possibilities in the fields of ecology and conservation. It is now possible to the identification of large cohorts of genetic markers, including single nucleotide polymorphisms (SNPs) and larger structural variants, as well as signatures of selection and local adaptation. Markers can be used to identify species, define population …


Understanding Host-Microbe Interactions In Maize Kernel And Sweetpotato Leaf Metagenomic Profiles., Alison K. Adams May 2023

Understanding Host-Microbe Interactions In Maize Kernel And Sweetpotato Leaf Metagenomic Profiles., Alison K. Adams

Doctoral Dissertations

Functional and quantitative metagenomic profiling remains challenging and limits our understanding of host-microbe interactions. This body of work aims to mediate these challenges by using a novel quantitative reduced representation sequencing strategy (OmeSeq-qRRS), development of a fully automated software for quantitative metagenomic/microbiome profiling (Qmatey: quantitative metagenomic alignment and taxonomic identification using exact-matching) and implementing these tools for understanding plant-microbe-pathogen interactions in maize and sweetpotato. The next generation sequencing-based OmeSeq-qRRS leverages the strengths of shotgun whole genome sequencing and costs lower that the more affordable amplicon sequencing method. The novel FASTQ data compression/indexing and enhanced-multithreading of the MegaBLAST in Qmatey allows …


The Genomics Of Autism-Related Genes Il1rapl1 And Il1rapl2: Insights Into Their Cortical Distribution, Cell-Type Specificity, And Developmental Trajectories, Jacob Weaver Apr 2023

The Genomics Of Autism-Related Genes Il1rapl1 And Il1rapl2: Insights Into Their Cortical Distribution, Cell-Type Specificity, And Developmental Trajectories, Jacob Weaver

MUSC Theses and Dissertations

Neuropsychiatric disorders have a significant impact on modern society. These disorders affect a large percentage of the population: schizophrenia has a world-wide prevalence of 1% and autism spectrum disorders (ASD) affects 1 in 59 school-aged children in the US. There is substantial evidence that most neuropsychiatric disorders have a genetic component. Thus, with the advent of high throughput sequencing much effort has gone into identifying genetic variants associated with these disorders. The emerging picture from these studies is a complex one where hundreds of genes with small effects interact with a varied landscape of common variants to result in disease. …


Incorporating Sex Chromosomes In Transcriptome Prediction Models And Improving Cross-Population Prediction Performance, Daniel S. Araujo Jan 2023

Incorporating Sex Chromosomes In Transcriptome Prediction Models And Improving Cross-Population Prediction Performance, Daniel S. Araujo

Master's Theses

Transcriptome prediction models built with data from European-descent individuals are less accurate when applied to different populations because of differences in linkage disequilibrium patterns and allele frequencies. We hypothesized multivariate adaptive shrinkage may improve cross-population transcriptome prediction, as it leverages effect size estimates across different conditions - in this case, different populations. To test this hypothesis, we made transcriptome prediction models for use in transcriptome-wide association studies (TWAS) using different methods (Elastic Net, Matrix eQTL and Multivariate Adaptive Shrinkage in R (MASHR)) and tested their out-of-sample transcriptome prediction accuracy in population-matched and cross-population scenarios. Additionally, to evaluate model applicability in …


Ngly1 Deficiency Affects Glycosaminoglycan Biosynthesis And Wnt Signaling Pathway In Mice, Amy Batten Oct 2022

Ngly1 Deficiency Affects Glycosaminoglycan Biosynthesis And Wnt Signaling Pathway In Mice, Amy Batten

PANDION: The Osprey Journal of Research and Ideas

Individuals affected by NGLY1 Deficiency cannot properly deglycosylate and recycle certain proteins. Even though less than 100 people worldwide have been diagnosed with this rare autosomal recessive condition, thousands are affected by similar glycosylation disorders. Common phenotypic manifestations of NGLY1 Deficiency include severe neural and intellectual delay, impaired muscle and liver function, and seizures that may become intractable. Very little is currently known about the various mechanisms through which NGLY1 deficiency affects the body and this has led to a lack of viable treatment options for those afflicted. This experiment uses a loss-of-function (LOF) mouse model of NGLY1 Deficiency homologous …


The Genetics Of Skin Cancer: What Genes Drive The Development Of Basal Cell Carcinoma, Squamous Cell Carcinoma, And Melanoma?, Cassandra Poole, Abagail Pack, Elizabeth Whitehead, Virginia Marshall Oct 2022

The Genetics Of Skin Cancer: What Genes Drive The Development Of Basal Cell Carcinoma, Squamous Cell Carcinoma, And Melanoma?, Cassandra Poole, Abagail Pack, Elizabeth Whitehead, Virginia Marshall

Spring Showcase for Research and Creative Inquiry

Skin cancer is one of the most common forms of cancer worldwide. The American Academy of Dermatology estimates that 9500 people in the United States are diagnosed with skin cancer every day, and that 1 in 5 Americans will be diagnosed with skin cancer by age 70. With such a high prevalence of disease, understanding how skin cancer develops and how it can be treated is extremely important. This project aims to analyze the genes involved in the development of the three most common forms of skin cancer: basal cell carcinoma, squamous cell carcinoma, and melanoma.


Identification Of Dna Methylation Episignatures For Classification And Phenotype/Genotype Correlation In Mendelian Neurodevelopmental Disorders, John Reilly Apr 2022

Identification Of Dna Methylation Episignatures For Classification And Phenotype/Genotype Correlation In Mendelian Neurodevelopmental Disorders, John Reilly

Electronic Thesis and Dissertation Repository

ABSTRACT: Diagnosis for neurodevelopmental disorders poses numerous challenges, related to the lack of specific findings and limited understanding of clinical impact of the majority of genetic variation. Epigenomics mechanisms involve chemical modifications in DNA that involve a range of cellular mechanisms. DNA methylation is an epigenetic mechanism involving addition and removal of methyl groups to cytosine residues. These methylation signals form episignatures; patterns of methylation that can be used as biomarkers capable of differentiating neurodevelopmental disorders. EpiSigns have enabled molecular diagnosis of a number of genetic conditions, classification of variants of unknown significance, and provided insights into the pathophysiology of …


The Low Abundance Of Cpg In The Sars-Cov-2 Genome Is Not An Evolutionarily Signature Of Zap, Ali Afrasiabi, Hamid Alinejad-Rokny, Azad Khosh, Mostafa Rahnama, Nigel Lovell, Zhenming Xu, Diako Ebrahimi Feb 2022

The Low Abundance Of Cpg In The Sars-Cov-2 Genome Is Not An Evolutionarily Signature Of Zap, Ali Afrasiabi, Hamid Alinejad-Rokny, Azad Khosh, Mostafa Rahnama, Nigel Lovell, Zhenming Xu, Diako Ebrahimi

Plant Pathology Faculty Publications

The zinc finger antiviral protein (ZAP) is known to restrict viral replication by binding to the CpG rich regions of viral RNA, and subsequently inducing viral RNA degradation. This enzyme has recently been shown to be capable of restricting SARS-CoV-2. These data have led to the hypothesis that the low abundance of CpG in the SARS-CoV-2 genome is due to an evolutionary pressure exerted by the host ZAP. To investigate this hypothesis, we performed a detailed analysis of many coronavirus sequences and ZAP RNA binding preference data. Our analyses showed neither evidence for an evolutionary pressure acting specifically on CpG …


Identifying The Cell Composition And Clonal Diversity Of Supratentorial Ependymoma Using Single Cell Rna-Sequencing, James He May 2021

Identifying The Cell Composition And Clonal Diversity Of Supratentorial Ependymoma Using Single Cell Rna-Sequencing, James He

University Scholar Projects

Ependymoma is a primary solid tumor of the central nervous system. Supratentorial ependymoma (ST-EPN), a subtype of ependymomas, is driven by an oncogenic fusion between the ZFTA and RELA genes in 70% of cases. We introduced this fusion into neural progenitor cells of mice embryos via in utero electroporation of a non-viral binary piggyBac transposon system containing ZFTA-RELA. From preliminary data in the LoTurco lab, inducing the expression of ZFTA-RELA into different neural progenitor cells produces tumors of varying lethality and cellular composition. To define the cellular composition and subclonal diversity of ST-EPN tumors, we used single cell RNA-sequencing to …


Identifying The Cell Composition And Clonal Diversity Of Supratentorial Ependymoma Using Single Cell Rna-Sequencing, James He May 2021

Identifying The Cell Composition And Clonal Diversity Of Supratentorial Ependymoma Using Single Cell Rna-Sequencing, James He

Honors Scholar Theses

Ependymoma is a primary solid tumor of the central nervous system. Supratentorial ependymoma (ST-EPN), a subtype of ependymomas, is driven by an oncogenic fusion between the ZFTA and RELA genes in 70% of cases. We introduced this fusion into neural progenitor cells of mice embryos via in utero electroporation of a non-viral binary piggyBac transposon system containing ZFTA-RELA. From preliminary data in the LoTurco lab, inducing the expression of ZFTA-RELA into different neural progenitor cells produces tumors of varying lethality and cellular composition. To define the cellular composition and subclonal diversity of ST-EPN tumors, we used single cell RNA-sequencing …


Population-Matched Transcriptome Prediction Increases Discovery And Replication Rate In Twas, Elyse Geoffroy Jan 2021

Population-Matched Transcriptome Prediction Increases Discovery And Replication Rate In Twas, Elyse Geoffroy

Master's Theses

Most genome-wide and transcriptome-wide association studies (GWAS, TWAS) focus on European populations; however, these results cannot always be accurately applied to non-European populations due to differences in genetic architecture. Using summary statistics from GWAS in the Population Architecture using Genomics and Epidemiology (PAGE) study, which comprises ~50,000 Hispanic/Latinos, African Americans, Asians, Native Hawaiians, and Native Americans, we perform transcriptome-wide association studies to determine gene-trait associations. Initially, we compared results using two transcriptome prediction models derived from the Multi-Ethnic Study of Atherosclerosis (MESA) populations: the African American (AFA) model and the Hispanic/Latino (HIS) model. We identified 141 unique genome-wide significant trait-associated …


Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani Jan 2021

Composition And Homology In The Taxonomic Classification Of Escherichia Coli, Tanya Irani

Theses and Dissertations (Comprehensive)

As new techniques have been introduced, specifically the possibility of complete genome sequencing, better methods of defining bacterial species have also been proposed. One of the most recently proposed methods, using bioinformatic techniques, is to calculate the average nucleotide identity (ANI) between the homologous genome segments of different isolates. Another method for species discrimination that has been tested successfully is the similarity of DNA compositional signatures. However, in a recent update, DNA signatures split the available Escherichia coli complete genomes into three groups. To check if this result was consistent with such genomes belonging to different species, we tested methods …


Connections In The Underworld: A Morphological And Molecular Study Of Diversity And Connectivity Among Anchialine Shrimp., Robert Eugene Ditter Nov 2020

Connections In The Underworld: A Morphological And Molecular Study Of Diversity And Connectivity Among Anchialine Shrimp., Robert Eugene Ditter

FIU Electronic Theses and Dissertations

This research investigates the distribution and population structure of crustaceans, endemic to anchialine systems in the tropical western Atlantic focusing on cave-dwelling shrimp from the family Barbouriidae. Taxonomic and molecular tools (genetic and genomic) are utilized to examine population dynamics and the presence of phenotypic hypervariation (PhyV) of the critically endangered species Barbouria cubensis (von Martens, 1872). The presence of PhyV and its geographic distribution is investigated among anchialine populations of B. cubensis from 34 sites on Abaco, Eleuthera, and San Salvador, Bahamas. Examination of 54 informative morphological characters revealed PhyV present in nearly 90% (n=463) of specimens with no …


Using Active Learning To Build A Foundation For Bioinformatics Training., Stacey E. Wahl Ph.D., Amy L. Olex Ms Mar 2020

Using Active Learning To Build A Foundation For Bioinformatics Training., Stacey E. Wahl Ph.D., Amy L. Olex Ms

Transforming Libraries for Graduate Students

As Health Sciences Libraries evolve, the support they offer graduate students has evolved to incorporate many aspects of the research life cycle. At Tompkins-McCaw Library for the Health Sciences, we have partnered with the Wright Center for Clinical and Translational Research to offer training workshops for graduate students who are interested in using bioinformatics to plan, analyze, or execute scientific experiments. We offer two series: 1) an 8-week, 1-hour per week seminar series providing a general overview of available techniques and 2) a week-long intensive, two hours per session, series on utilizing free databases from the National Center for Biotechnology …


Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity Jan 2020

Complex Systems Analysis In Selected Domains: Animal Biosecurity & Genetic Expression, Luke Trinity

Graduate College Dissertations and Theses

I first broadly define the study of complex systems, identifying language to describe and characterize mechanisms of such systems which is applicable across disciplines. An overview of methods is provided, including the description of a software development methodology which defines how a combination of computer science, statistics, and mathematics are applied to specified domains. This work describes strategies to facilitate timely completion of robust and adaptable projects which vary in complexity and scope. A biosecurity informatics pipeline is outlined, which is an abstraction useful in organizing the analysis of biological data from cells. This is followed by specific applications of …


Surveying Apicomplexan Diversity And Dynamics In Narragansett Bay, Evelyn Spencer May 2019

Surveying Apicomplexan Diversity And Dynamics In Narragansett Bay, Evelyn Spencer

Senior Honors Projects

Parasites play an important role in marine ecosystems and their diversity is generally understudied. Apicomplexans, a group of parasitic protists in the phylum Alveolata, infect a wide variety of animal hosts and are abundant in ecosystems spanning from Polar Regions to Neotropical rainforests. Previous data generated from marine sediments in Antarctica, Naples Bay, and off the coast of Oslo, exhibit high diversity and numbers of apicomplexans. Abundance and diversity of these protists are unknown for Narragansett Bay, despite the fact that they infect many commercially important species. The aim of my study was to obtain abundance data and understand genetic …


Phylogenetic History Of The Amy Gene Cluster In Catarrhines, Christian M. Gagnon Feb 2019

Phylogenetic History Of The Amy Gene Cluster In Catarrhines, Christian M. Gagnon

Theses and Dissertations

This study phylogenetically analyzed 30 AMY-related genes from 11 primates. The results show the gradual expansion of the AMY gene family which could have allowed primates to adapt to various ecological landscapes and maximize energy intake from starch-rich foods in periods of food scarcity.


Dna Methylation And Genetic Divergence Associated With An Inducible Defensive Response In Mimulus Guttatus, David Farr Jan 2019

Dna Methylation And Genetic Divergence Associated With An Inducible Defensive Response In Mimulus Guttatus, David Farr

All Master's Theses

Phenotypic plasticity allows many organisms to respond to their environment by changing their phenotype, but the mechanisms to do so are not well understood. Yellow Monkeyflower (formerly Mimulus guttatus; now Erythranthe guttata) is one such organism that can serve as a model to promote our understanding of these mechanisms due to its striking response to insect herbivory. Monkeyflower responds to leaf damage by increasing the number of hair-like glandular trichomes, a putative defensive trait that reduces the magnitude of damage by insects. This plastic response is transgenerationally inherited in a way that is sensitive to genome-wide demethylation when …


Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke Dec 2018

Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke

Ray Enke Ph.D.

This in class activity introduces basic bioinformatics analysis using the Saccharomyces Genome Database (SGD) and the UniProt Database. The yeast URA3 gene is studied in this activity, however, any other yeast gene can be substituted. This activity is designed for novice instructors and students for implementation into core biology lecture or lab courses.


Confirming World-Wide Distribution Of An Agriculturally Important Lacewing, Chrysoperla Zastrowi Sillemi, Using Songs, Morphology, Mitochondrial Gene Sequencing, And Phylogenetic Reconstruction, Zoe Mandese Aug 2018

Confirming World-Wide Distribution Of An Agriculturally Important Lacewing, Chrysoperla Zastrowi Sillemi, Using Songs, Morphology, Mitochondrial Gene Sequencing, And Phylogenetic Reconstruction, Zoe Mandese

Honors Scholar Theses

The Chrysoperla carnea-group of green lacewings is a cryptic species complex. Species within the group are morphologically similar, yet isolated from one another via reproductive mating song. Chrysoperla zastrowi, a species within the carnea-group, is currently described with a distribution ranging from South Africa to the Middle East and India. However, recent collections of carnea-group lacewings from Guatemala and California were preliminarily identified as Chrysoperla zastrowi based upon similarities in their vibrational courtship songs. This analysis aims to place six specimens, collected by collaborators in Guatemala, Armenia, Iran, and California, into a pre-existing phylogeny of the …


Genome Analysis Of Multiple Mycobacteriophage, Emily Kerstiens, Kari Clase, Yi Li, Gillian Smith, Sarah Bell Aug 2018

Genome Analysis Of Multiple Mycobacteriophage, Emily Kerstiens, Kari Clase, Yi Li, Gillian Smith, Sarah Bell

The Summer Undergraduate Research Fellowship (SURF) Symposium

Bacteriophage are viruses that infect and kill bacteria. They can be used as treatments for antibiotic resistant bacterial infections, but more knowledge is needed about phage and how they interact with bacteria in order to develop safe and effective phage therapy treatments. This study examines the genomes of eighteen mycobacteriophage that were isolated from the environment on and surrounding Purdue University. Phage genomes were annotated using several bioinformatics software, including DNA Master, GeneMark, and PECAAN. Evidence was examined to determine the correct location within the genome and the potential function. Approximately two thousand genes were annotated in this study. A …


Assessment Of A Metaviromic Dataset Generated From Nearshore Lake Michigan, Siobhan C. Watkins, Neil Kuehnle, C Anthony Ruggeri, Kema Malki, Katherine Bruder, Jinan Elayyan, Kristina Damisch, Naushin Vahora, Paul O'Malley, Brianne Ruggles-Sage, Zachary Romer, Catherine Putonti Sep 2017

Assessment Of A Metaviromic Dataset Generated From Nearshore Lake Michigan, Siobhan C. Watkins, Neil Kuehnle, C Anthony Ruggeri, Kema Malki, Katherine Bruder, Jinan Elayyan, Kristina Damisch, Naushin Vahora, Paul O'Malley, Brianne Ruggles-Sage, Zachary Romer, Catherine Putonti

Catherine Putonti

Bacteriophages are powerful ecosystem engineers. They drive bacterial mortality rates and genetic diversity, and affect microbially mediated biogeochemical processes on a global scale. This has been demonstrated in marine environments; however, phage communities have been less studied in freshwaters, despite representing a potentially more diverse environment. Lake Michigan is one of the largest bodies of freshwater on the planet, yet to date the diversity of its phages has yet to be examined. Here, we present a composite survey of viral ecology in the nearshore waters of Lake Michigan. Sequence analysis was performed using a web server previously used to analyse …


Genome Wide Association And Next Generation Sequencing Approaches To Map Determinants Of Ascites In Broiler Chickens, Shatovisha Dey Aug 2017

Genome Wide Association And Next Generation Sequencing Approaches To Map Determinants Of Ascites In Broiler Chickens, Shatovisha Dey

Graduate Theses and Dissertations

These studies have investigated different candidate genomic regions for their contributions to ascites in broilers. Ascites syndrome is a manifestation of idiopathic pulmonary arteriole hypertension that concerns the poultry industry worldwide. Investigations have demonstrated the disease to be genetically regulated and to exhibit moderate to high heritabilities. Although previous studies have indicated a few chromosomes to be involved with ascites, no genes have been identified to date with direct links to the disease. This dissertation presents a collection of studies that determine the genomic and genetic interactions for regions on chromosome 2 and 9 for ascites phenotypes in broiler chickens. …


Isolation And Comparative Genomic Analysis Of Final Third Of Satis Genome, Kelly Hartigan, Nicole Curnutt, Matthew Mcdermut May 2017

Isolation And Comparative Genomic Analysis Of Final Third Of Satis Genome, Kelly Hartigan, Nicole Curnutt, Matthew Mcdermut

Undergraduate Research Symposium Posters

A highly novel Streptomyces phage, Satis, was isolated from a direct environmental sample collected from outside Danforth House on the Washington University campus. Satis infects bacterial species Streptomyces lividans producing pinpoint, cloudy plaques less than 1mm in diameter. Electron microscope data shows rare atypical physical features. Rather than the common octahedral capsid shape, Satis has a prolate head with visible cross-linked hexagonal protein structure and average measurements of 285 nm by 47 nm with a long, flexible tail measuring 268 nm. Upon sequencing, it was found that Satis contains the longest phage genome discovered to date through the SEA-PHAGE program …


Software Development For Genome Sequence Analysis, David Farr May 2017

Software Development For Genome Sequence Analysis, David Farr

Symposium Of University Research and Creative Expression (SOURCE)

The cost of genome sequencing has decreased rapidly, expanding availability for many biological applications (Muir 2016). For example, researchers can now obtain genome sequences from multiple populations under different types of selection. Comparison of these sequences allows for identification of chromosome regions and specific genes associated with adaptive evolution (Kelly 2013). As an increasing number of researchers engage in this type of inquiry, many have created in-house computer scripts to analyze the raw sequence data (e.g., Kelly 2013), creating a gap in both continuity and standardization.

Using a test dataset and preliminary results from an ongoing artificial selection experiment in …


Gene Set Enrichment And Projection: A Computational Tool For Knowledge Discovery In Transcriptomes, Karl Douglas Stamm Jul 2016

Gene Set Enrichment And Projection: A Computational Tool For Knowledge Discovery In Transcriptomes, Karl Douglas Stamm

Dissertations (1934 -)

Explaining the mechanism behind a genetic disease involves two phases, collecting and analyzing data associated to the disease, then interpreting those data in the context of biological systems. The objective of this dissertation was to develop a method of integrating complementary datasets surrounding any single biological process, with the goal of presenting the response to a signal in terms of a set of downstream biological effects. This dissertation specifically tests the hypothesis that computational projection methods overlaid with domain expertise can direct research towards relevant systems-level signals underlying complex genetic disease. To this end, I developed a software algorithm named …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …