Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 74

Full-Text Articles in Bioinformatics

Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke Dec 2018

Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke

Ray Enke Ph.D.

This in class activity introduces basic bioinformatics analysis using the Saccharomyces Genome Database (SGD) and the UniProt Database. The yeast URA3 gene is studied in this activity, however, any other yeast gene can be substituted. This activity is designed for novice instructors and students for implementation into core biology lecture or lab courses.


Dna Subway Purple Line Metagenome Analysis, Ray A. Enke Nov 2018

Dna Subway Purple Line Metagenome Analysis, Ray A. Enke

Ray Enke Ph.D.

This in class walkthrough demonstrates how to use the DNA Subway Purple Line for metagenomics analysis of 16S microbiome Illumina sequencing reads.


Fastqc Analysis & Hisat Alignments Using Cyverse (Part 2), Ray A. Enke Oct 2018

Fastqc Analysis & Hisat Alignments Using Cyverse (Part 2), Ray A. Enke

Ray Enke Ph.D.

Part 2 of this in class exercise uses CyVerse Discovery Environment (DE) for the following:
  • view the output files of FastQC analysis
  • create custom data tracks from HISAT alignment files for visualization in the UCSC Genome Browser


Fastqc Analysis & Hisat Alignments Using Cyverse (Part 1), Ray A. Enke Oct 2018

Fastqc Analysis & Hisat Alignments Using Cyverse (Part 1), Ray A. Enke

Ray Enke Ph.D.

This in class exercise demonstrates the basic features of the CyVerse Discovery Environment (DE) cyberinfrastructure and also provides a tutorial for setting up FastQC analysis of next generation sequencing reads as well as HISAT alignment of eukaryotic RNA-seq FASTQ files.


Intro To Command Line Coding (Fastqe & Fastp), Ray A. Enke Oct 2018

Intro To Command Line Coding (Fastqe & Fastp), Ray A. Enke

Ray Enke Ph.D.

This in class activity is designed for novices as an introduction to command line coding. The activity uses the programs FASTQE and FASTP to analyze the quality and trim Illumina FASTQ sequencing data.


De-Identified Interviews For The Study: Data Challenges Of Biomedical Researchers In The Age Of Omics, Rolando Garcia-Milian, Denise Hersey, Milica Vukmirovic Jan 2018

De-Identified Interviews For The Study: Data Challenges Of Biomedical Researchers In The Age Of Omics, Rolando Garcia-Milian, Denise Hersey, Milica Vukmirovic

Rolando Garcia-Milian


Background: High-throughput technologies are rapidly generating large amounts of diverse omics data. Although this offers a great opportunity, it also poses great challenges as data analysis becomes more complex. The purpose of this study was to identify the main challenges researchers face in analyzing data, and how academic libraries can support them in this endeavor.
Methods: A multimodal needs assessment analysis, combined an online survey of 860 Yale-affiliated researchers and 15 in-depth one-on-one semi-structured interviews. Interviews were recorded, transcribed, and analyzed using NVivo 10® software according to the thematic analysis approach.
Results: The survey response rate was …


Landscape Genomics: Natural Selection Drives The Evolution Of Mitogenome In Penguins, Barbara Ramos, Daniel González-Acuña, David E. Loyola, Warren E. Johnson, Patricia G. Parker, Melanie Massaro, Gisele P. M. Dantas, Marcelo D. Miranda, Juliana A. Vianna Jan 2018

Landscape Genomics: Natural Selection Drives The Evolution Of Mitogenome In Penguins, Barbara Ramos, Daniel González-Acuña, David E. Loyola, Warren E. Johnson, Patricia G. Parker, Melanie Massaro, Gisele P. M. Dantas, Marcelo D. Miranda, Juliana A. Vianna

Patricia Parker

Background
Mitochondria play a key role in the balance of energy and heat production, and therefore the mitochondrial genome is under natural selection by environmental temperature and food availability, since starvation can generate more efficient coupling of energy production. However, selection over mitochondrial DNA (mtDNA) genes has usually been evaluated at the population level. We sequenced by NGS 12 mitogenomes and with four published genomes, assessed genetic variation in ten penguin species distributed from the equator to Antarctica. Signatures of selection of 13 mitochondrial protein-coding genes were evaluated by comparing among species within and among genera (Spheniscus, Pygoscelis, Eudyptula, Eudyptes …


Transcripity Split: Course-Based Rna-Seq Analysis Using The Ultrafast Kallisto-Sleuth Pipeline, Ray A. Enke Dec 2017

Transcripity Split: Course-Based Rna-Seq Analysis Using The Ultrafast Kallisto-Sleuth Pipeline, Ray A. Enke

Ray Enke Ph.D.

No abstract provided.


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti Oct 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti

Konstantin Läufer

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Oct 2017

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Konstantin Läufer

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


Phagephisher: A Pipeline For The Discovery Of Covert Viral Sequences In Complex Genomic Datasets, Thomas Hatzopoulos, Siobhan C. Watkins, Catherine Putonti Sep 2017

Phagephisher: A Pipeline For The Discovery Of Covert Viral Sequences In Complex Genomic Datasets, Thomas Hatzopoulos, Siobhan C. Watkins, Catherine Putonti

Catherine Putonti

Obtaining meaningful viral information from large sequencing datasets presents unique challenges distinct from prokaryotic and eukaryotic sequencing efforts. The difficulties surrounding this issue can be ascribed in part to the genomic plasticity of viruses themselves as well as the scarcity of existing information in genomic databases. The open-source software PhagePhisher (http://www.putonti-lab.com/phagephisher) has been designed as a simple pipeline to extract relevant information from complex and mixed datasets, and will improve the examination of bacteriophages, viruses, and virally related sequences, in a range of environments. Key aspects of the software include speed and ease of use; PhagePhisher can be used with …


Hash-Map-Eradicator: Filtering Non-Target Sequences From Next Generation Sequencing Reads, Jonathon Brenner, Catherine Putonti Sep 2017

Hash-Map-Eradicator: Filtering Non-Target Sequences From Next Generation Sequencing Reads, Jonathon Brenner, Catherine Putonti

Catherine Putonti

Contemporary DNA sequencing technologies are continuously increasing throughput at ever decreasing costs. Moreover, due to recent advances in sequencing technology new platforms are emerging. As such computational challenges persist. The average read length possible has taken a giant leap forward with the PacBio and Nanopore solutions. Regardless of the platform used, impurities within the DNA preparation of the sample - be it from unintentional contaminants or pervasive symbiots - remains an issue. We have developed a new tool, HAsh-MaP-ERadicator (HAMPER), for the detection and removal of non-target, contaminating DNA sequences. Integrating hash-based and mapping-based strategies, HAMPER is both memory and …


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti Sep 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti

Catherine Putonti

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


Finding Function In The Unknown, Kelly Boyd, Emma Highland, Amanda Misch, Amber Hu, Sushma Reddy, Catherine Putonti Sep 2017

Finding Function In The Unknown, Kelly Boyd, Emma Highland, Amanda Misch, Amber Hu, Sushma Reddy, Catherine Putonti

Catherine Putonti

Through high-throughput RNA sequencing (RNAseq), transcriptomes for a single cell, tissue, or organism(s) can be ascertained at a high resolution. While a number of bioinformatic tools have been developed for transcriptome analyses, significant challenges exist for studies of non-model organisms. Without a reference sequence available, raw reads must first be assembled de novo followed by the tedious task of BLAST searches and data mining for functional information. We have created a pipeline, PyRanger, to automate this process. The pipeline includes functionality to assess a single transcriptome and also facilitate comparative transcriptomic studies.


Genomes Of Gardnerella Strains Reveal An Abundance Of Prophages Within The Bladder Microbiome, Kema Malki, Jason W. Shapiro, Travis Kyle Price, Evann Elizabeth Hilt, Krystal Thomas-White, Trina Sircar, Amy B. Rosenfeld, Michael J. Zilliox, Alan J. Wolfe, Catherine Putonti Sep 2017

Genomes Of Gardnerella Strains Reveal An Abundance Of Prophages Within The Bladder Microbiome, Kema Malki, Jason W. Shapiro, Travis Kyle Price, Evann Elizabeth Hilt, Krystal Thomas-White, Trina Sircar, Amy B. Rosenfeld, Michael J. Zilliox, Alan J. Wolfe, Catherine Putonti

Catherine Putonti

Bacterial surveys of the vaginal and bladder human microbiota have revealed an abundance of many similar bacterial taxa. As the bladder was once thought to be sterile, the complex interactions between microbes within the bladder have yet to be characterized. To initiate this process, we have begun sequencing isolates, including the clinically relevant genus Gardnerella. Herein, we present the genomic sequences of four Gardnerella strains isolated from the bladders of women with symptoms of urgency urinary incontinence; these are the first Gardnerella genomes produced from this niche. Congruent to genomic characterization of Gardnerella isolates from the reproductive tract, isolates from …


Bacteriophages Isolated From Lake Michigan Demonstrate Broad Host-Range Across Several Bacterial Phyla, Kema Malki, Alex Kula, Katherine Bruder, Emily Sible, Thomas Hatzopoulos, Stephanie Steidel, Siobhan C. Watkins, Catherine Putonti Sep 2017

Bacteriophages Isolated From Lake Michigan Demonstrate Broad Host-Range Across Several Bacterial Phyla, Kema Malki, Alex Kula, Katherine Bruder, Emily Sible, Thomas Hatzopoulos, Stephanie Steidel, Siobhan C. Watkins, Catherine Putonti

Catherine Putonti

BACKGROUND:

The study of bacteriophages continues to generate key information about microbial interactions in the environment. Many phenotypic characteristics of bacteriophages cannot be examined by sequencing alone, further highlighting the necessity for isolation and examination of phages from environmental samples. While much of our current knowledge base has been generated by the study of marine phages, freshwater viruses are understudied in comparison. Our group has previously conducted metagenomics-based studies samples collected from Lake Michigan - the data presented in this study relate to four phages that were extracted from the same samples.

FINDINGS:

Four phages were extracted from Lake Michigan …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Sep 2017

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Catherine Putonti

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


Uploading Data To The Ncbi Sra Database, Ray A. Enke Jun 2017

Uploading Data To The Ncbi Sra Database, Ray A. Enke

Ray Enke Ph.D.

This in class exercise focuses on uploading FASTQ files sequencing data to the NCBI SRA database


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti Jan 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulos, Konstantin Laufer, George K. Thiruvathukal, Catherine Putonti

George K. Thiruvathukal

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


Genomics Rna-Seq Analysis Part 2_ Kallisto Indexing And Quantification (Updated 11/17), Ray A. Enke, Melika Rahmani-Mofrad Dec 2016

Genomics Rna-Seq Analysis Part 2_ Kallisto Indexing And Quantification (Updated 11/17), Ray A. Enke, Melika Rahmani-Mofrad

Ray Enke Ph.D.

This in class exercise is a hands on activity designed to teach students about how to run Kallisto indexing quantification using CyVerse DE apps as part of a eukaryotic RNA-seq analysis pipeline.


Genomics Rna-Seq Analysis Part 3-Sleuth Data Visualization (Updated 11/17), Ray A. Enke, Scott Schumacker Dec 2016

Genomics Rna-Seq Analysis Part 3-Sleuth Data Visualization (Updated 11/17), Ray A. Enke, Scott Schumacker

Ray Enke Ph.D.

This in class exercise is a hands on activity designed to teach students about how to run Sleuth statistical modeling and RStudio data visualization package using Kallisto pseudoalignment output files as part of a eukaryotic RNA-seq analysis pipeline.


Heat Map Analysis Of Rna-Seq Data Using Rstudio, Ray A. Enke, Ashton Holub Nov 2016

Heat Map Analysis Of Rna-Seq Data Using Rstudio, Ray A. Enke, Ashton Holub

Ray Enke Ph.D.

This in class exercise focuses on using the CummeRbund package in RStudio to create heat maps for analyzing differential gene expression output generated by Cuffdiff in DNA Subway Green Line


Rna Sequencing Analysis Of The Developing Chicken Retina, Christophe Langouet-Astrie*, Annamarie Meinsen*, Emily R. Grunwald*, Stephen Turner, Raymond A. Enke Nov 2016

Rna Sequencing Analysis Of The Developing Chicken Retina, Christophe Langouet-Astrie*, Annamarie Meinsen*, Emily R. Grunwald*, Stephen Turner, Raymond A. Enke

Ray Enke Ph.D.

RNA sequencing transcriptome analysis using massively parallel next generation sequencing technology provides the capability to understand global changes in gene expression throughout a range of tissue samples. Development of the vertebrate retina requires complex temporal orchestration of transcriptional activation and repression. The chicken embryo (Gallus gallus) is a classic model system for studying developmental biology and retinogenesis. Existing retinal transcriptome projects have been critical to the vision research community for studying aspects of murine and human retinogenesis, however, there are currently no publicly available data sets describing the developing chicken retinal transcriptome. Here we used Illumina RNA sequencing …


Qpcr Primer Standard Curve Assay (Wet Lab) + Kegg Pathway Analysis (Computational), Ray A. Enke Oct 2016

Qpcr Primer Standard Curve Assay (Wet Lab) + Kegg Pathway Analysis (Computational), Ray A. Enke

Ray Enke Ph.D.

This class tested protocol will guide students through the steps for the following activities:
  • analyzing qPCR standard curve data to determine primer efficiency
  • analyzing differential gene expression experimental qPCR data
  • applying KEGG pathway analysis of selected candidates genes


A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im Aug 2016

A Gene-Based Association Method For Mapping Traits Using Reference Transcriptome Data, Eric R. Gamazon, Heather Wheeler, Kaanan P. Shah, Sahar V. Mozaffari, Keston Aquino-Michaels, Robert J. Carroll, Anne E. Eyler, Joshua C. Denny, Dan L. Nicolae, Nancy J. Cox, Hae Kyung Im

Heather Wheeler

Genome-wide association studies (GWAS) have identified thousands of variants robustly associated with complex traits. However, the biological mechanisms underlying these associations are, in general, not well understood. We propose a gene-based association method called PrediXcan that directly tests the molecular mechanisms through which genetic variation affects phenotype. The approach estimates the component of gene expression determined by an individual’s genetic profile and correlates ‘imputed’ gene expression with the phenotype under investigation to identify genes involved in the etiology of the phenotype. Genetically regulated gene expression is estimated using whole-genome tissue-dependent prediction models trained with reference transcriptome data sets. PrediXcan enjoys …


Making Sense Of Genomic Variation: Part 1 Snp Annotation, Rolando Garcia-Milian Mar 2016

Making Sense Of Genomic Variation: Part 1 Snp Annotation, Rolando Garcia-Milian

Rolando Garcia-Milian

The  specific combination of genetic variation in an individual defines not  only the external appearance but also susceptibility to diseases,  cancer, genetic disorders, drug response, etc. This explains the great  interest in discovering and cataloging these variations and using them  for disease association and functional studies, among others. In this  session we will review the most popular databases and tools to annotate,  analyze and visualize genetic variations. Some of the databases and  tools that will be discussed are:
-dbSNP
- Online Mendelian Inheritance in Man a comprehensive, authoritative compendium of human genes and genetic phenotypes.
- GWAS Catalog
-  EBI's …


An Efficient And Sensitive Method For Preparing Cdna Libraries From Scarce Biological Samples, Catherine H. Sterling, Isana Veksler-Lublinsky, Victor R. Ambros Oct 2015

An Efficient And Sensitive Method For Preparing Cdna Libraries From Scarce Biological Samples, Catherine H. Sterling, Isana Veksler-Lublinsky, Victor R. Ambros

Victor R. Ambros

The preparation and high-throughput sequencing of cDNA libraries from samples of small RNA is a powerful tool to quantify known small RNAs (such as microRNAs) and to discover novel RNA species. Interest in identifying the small RNA repertoire present in tissues and in biofluids has grown substantially with the findings that small RNAs can serve as indicators of biological conditions and disease states. Here we describe a novel and straightforward method to clone cDNA libraries from small quantities of input RNA. This method permits the generation of cDNA libraries from sub-picogram quantities of RNA robustly, efficiently and reproducibly. We demonstrate …


Proteomic Characterization Of Her-2/Neu-Overexpressing Breast Cancer Cells, Hexin Chen, G. Pimienta, Y. Gu, X. Sun, Jianjun Hu, M.-S. Kim, R. Chaerkady, M. Gucek, R. Cole, S. Sukumar, A. Pandey Jun 2015

Proteomic Characterization Of Her-2/Neu-Overexpressing Breast Cancer Cells, Hexin Chen, G. Pimienta, Y. Gu, X. Sun, Jianjun Hu, M.-S. Kim, R. Chaerkady, M. Gucek, R. Cole, S. Sukumar, A. Pandey

Jianjun Hu

No abstract provided.


Isolation And Comparative Genomic Analysis Of Mycobacteriophage Enkatz, Thomas Van Horn, Micah Rickles-Young, Shaarada Srivasta, Tina Zudock Apr 2015

Isolation And Comparative Genomic Analysis Of Mycobacteriophage Enkatz, Thomas Van Horn, Micah Rickles-Young, Shaarada Srivasta, Tina Zudock

Thomas Van Horn

Phage Enkatz is a temperate mycobacteriophage isolated from an un-enriched soil sample collected from the South Forty housing area of the Washington University in St. Louis campus. Enkatz displays unequally sized plaques with a clear center that become cloudier with radial distance from the center. Genome analysis indicates that Enkatz is a cluster A1 mycobacteriophage with a genome size of 49,738 bases and 82 identified genes, 33 of which have been assigned functions. This analysis reveals that the majority of the genes in the positive strand code for structural proteins, while the majority of the genes in the negative strand …


Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull Jan 2015

Ordinal Probit Wavelet-Based Functional Models For Eqtl Analysis, Mark J. Meyer, Jeffrey S. Morris, Craig P. Hersh, Jarret D. Morrow, Christoph Lange, Brent A. Coull

Jeffrey S. Morris

Current methods for conducting expression Quantitative Trait Loci (eQTL) analysis are limited in scope to a pairwise association testing between a single nucleotide polymorphism (SNPs) and expression probe set in a region around a gene of interest, thus ignoring the inherent between-SNP correlation. To determine association, p-values are then typically adjusted using Plug-in False Discovery Rate. As many SNPs are interrogated in the region and multiple probe-sets taken, the current approach requires the fitting of a large number of models. We propose to remedy this by introducing a flexible function-on-scalar regression that models the genome as a functional outcome. The …