Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 30

Full-Text Articles in Life Sciences

Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke Dec 2018

Saccharomyces Genome Database & Uniprot Bioinformatics Analysis, Ray A. Enke

Ray Enke Ph.D.

This in class activity introduces basic bioinformatics analysis using the Saccharomyces Genome Database (SGD) and the UniProt Database. The yeast URA3 gene is studied in this activity, however, any other yeast gene can be substituted. This activity is designed for novice instructors and students for implementation into core biology lecture or lab courses.


Fast And Space-Efficient Location Of Heavy Or Dense Segments In Run-Length Encoded Sequences, Ronald I. Greenberg Jan 2018

Fast And Space-Efficient Location Of Heavy Or Dense Segments In Run-Length Encoded Sequences, Ronald I. Greenberg

Ronald Greenberg

This paper considers several variations of an optimization problem with potential applications in such areas as biomolecular sequence analysis and image processing. Given a sequence of items, each with a weight and a length, the goal is to find a subsequence of consecutive items of optimal value, where value is either total weight or total weight divided by total length. There may also be a specified lower and/or upper bound on the acceptable length of subsequences. This paper shows that all the variations of the problem are solvable in linear time and space even with non-uniform item lengths and divisible …


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti Oct 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti

Konstantin Läufer

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Oct 2017

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Konstantin Läufer

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti Sep 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti

Catherine Putonti

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Sep 2017

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Catherine Putonti

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


Using Phylogenetically-Informed Annotation (Pia) To Search For Light-Interacting Genes In Transcriptomes From Non-Model Organisms, Daniel I. Speiser, M. Sabrina Pankey, Alexander K. Zaharoff, Barbara A. Battelle, Heather D. Bracken-Grissom, Jesse W. Breinholt, Seth M. Bybee, Thomas W. Cronin, Anders Garm, Annie R. Lindgren, Nipam H. Patel, Megan L. Porter, Meredith E. Protas, Anja S. Rivera, Jeanne M. Serb, Kirk S. Zigler, Keith A. Crandall, Todd H. Oakley Jan 2017

Using Phylogenetically-Informed Annotation (Pia) To Search For Light-Interacting Genes In Transcriptomes From Non-Model Organisms, Daniel I. Speiser, M. Sabrina Pankey, Alexander K. Zaharoff, Barbara A. Battelle, Heather D. Bracken-Grissom, Jesse W. Breinholt, Seth M. Bybee, Thomas W. Cronin, Anders Garm, Annie R. Lindgren, Nipam H. Patel, Megan L. Porter, Meredith E. Protas, Anja S. Rivera, Jeanne M. Serb, Kirk S. Zigler, Keith A. Crandall, Todd H. Oakley

Meredith Protas

Background: Tools for high throughput sequencing and de novo assembly make the analysis of transcriptomes (i.e. the suite of genes expressed in a tissue) feasible for almost any organism. Yet a challenge for biologists is that it can be difficult to assign identities to gene sequences, especially from non-model organisms. Phylogenetic analyses are one useful method for assigning identities to these sequences, but such methods tend to be time-consuming because of the need to re-calculate trees for every gene of interest and each time a new data set is analyzed. In response, we employed existing tools for phylogenetic analysis to …


A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti Jan 2017

A Polyglot Approach To Bioinformatics Data Integration: A Phylogenetic Analysis Of Hiv-1, Steven Reisman, Thomas Hatzopoulous, Konstantin Läufer, George K. Thiruvathukal, Catherine Putonti

George K. Thiruvathukal

As sequencing technologies continue to drop in price and increase in throughput, new challenges emerge for the management and accessibility of genomic sequence data. We have developed a pipeline for facilitating the storage, retrieval, and subsequent analysis of molecular data, integrating both sequence and metadata. Taking a polyglot approach involving multiple languages, libraries, and persistence mechanisms, sequence data can be aggregated from publicly available and local repositories. Data are exposed in the form of a RESTful web service, formatted for easy querying, and retrieved for downstream analyses. As a proof of concept, we have developed a resource for annotated HIV-1 …


Statistical Contributions To Bioinformatics: Design, Modeling, Structure Learning, And Integration, Jeffrey S. Morris, Veera Baladandayuthapani Dec 2016

Statistical Contributions To Bioinformatics: Design, Modeling, Structure Learning, And Integration, Jeffrey S. Morris, Veera Baladandayuthapani

Jeffrey S. Morris

The advent of high-throughput multi-platform genomics technologies providing whole-genome molecular summaries of biological samples has revolutionalized biomedical research. These technologies yield highly structured big data, whose analysis poses significant quantitative challenges. The field of Bioinformatics has emerged to deal with these challenges, and is comprised of many quantitative and biological scientists working together to eectively process these data and extract the treasure trove of information they contain. Statisticians, with their deep understanding of variability and uncertainty quantification, play a key role in these efforts. In this article, we attempt to summarize some of the key contributions of statisticians to bioinformatics, …


Using Rstudio For Manipulating And Visualizing Data (Updated 11/17), Ray A. Enke, Bejan A. Rasoul Dec 2016

Using Rstudio For Manipulating And Visualizing Data (Updated 11/17), Ray A. Enke, Bejan A. Rasoul

Ray Enke Ph.D.

This in class exercise is designed to teach novices about the basic features of R and RStudio using a non-biological data set called Gapminder. It is a modified version of a Data Carpentry Workshop that I use to teach programming to beginners.


Genomics Rna-Seq Analysis Part 2_ Kallisto Indexing And Quantification (Updated 11/17), Ray A. Enke, Melika Rahmani-Mofrad Dec 2016

Genomics Rna-Seq Analysis Part 2_ Kallisto Indexing And Quantification (Updated 11/17), Ray A. Enke, Melika Rahmani-Mofrad

Ray Enke Ph.D.

This in class exercise is a hands on activity designed to teach students about how to run Kallisto indexing quantification using CyVerse DE apps as part of a eukaryotic RNA-seq analysis pipeline.


Genomics Rna-Seq Analysis Part 3-Sleuth Data Visualization (Updated 11/17), Ray A. Enke, Scott Schumacker Dec 2016

Genomics Rna-Seq Analysis Part 3-Sleuth Data Visualization (Updated 11/17), Ray A. Enke, Scott Schumacker

Ray Enke Ph.D.

This in class exercise is a hands on activity designed to teach students about how to run Sleuth statistical modeling and RStudio data visualization package using Kallisto pseudoalignment output files as part of a eukaryotic RNA-seq analysis pipeline.


Supporting Biomedical Research In The Era Of Omics And Precision Medicine, Rolando Garcia-Milian, Denise Hersey, Nathan Rupp Aug 2016

Supporting Biomedical Research In The Era Of Omics And Precision Medicine, Rolando Garcia-Milian, Denise Hersey, Nathan Rupp

Rolando Garcia-Milian


This annual report (2015-2016) provides a continuing view on the position of the Cushing/Whitney Medical Library End-user Bioinformatics Program. Besides the report on the three main areas of training, resources and tools, and consultations, it contains the results of the recent assessment “Information and Needs Assessment for Biomedical Research in the Omics Era” During this period, 741 Yale affiliates attended (out of 1240 registered) the end-user bioinformatics training and presentations organized by the Medical Library.  This year, the number of Ingenuity Pathway Analysis and MetaCore accounts continued to grow. Consequently, the number and length of research support consultations (130 researchers …


Bringing Toxicology Into The 21st Century: A Global Call To Action, Troy Seidle, Martin Stephens Jul 2016

Bringing Toxicology Into The 21st Century: A Global Call To Action, Troy Seidle, Martin Stephens

Martin Stephens, PhD

Conventional toxicological testing methods are often decades old, costly and low-throughput, with questionable relevance to the human condition. Several of these factors have contributed to a backlog of chemicals that have been inadequately assessed for toxicity. Some authorities have responded to this challenge by implementing large-scale testing programmes. Others have concluded that a paradigm shift in toxicology is warranted. One such call came in 2007 from the United States National Research Council (NRC), which articulated a vision of ‘‘21st century toxicology” based predominantly on non-animal techniques. Potential advantages of such an approach include the capacity to examine a far greater …


Analysis Of Rna-Seq Alignments Using Dna Subway Green Line (Computational), Raymond A. Enke May 2016

Analysis Of Rna-Seq Alignments Using Dna Subway Green Line (Computational), Raymond A. Enke

Ray Enke Ph.D.

This class tested protocol will guide students through the steps for the following activities:
  • Review basic steps of RNA-Seq bioinformatics analysis in DNA Subway Green Line
  • View and run basic analytics of RNA-Seq data set in DNA Subway Green Line


Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang Jun 2015

Bayesmotif: De Novo Protein Sorting Motif Discovery From Impure Datasets, Jianjun Hu, F. Zhang

Jianjun Hu

Background

Protein sorting is the process that newly synthesized proteins are transported to their target locations within or outside of the cell. This process is precisely regulated by protein sorting signals in different forms. A major category of sorting signals are amino acid sub-sequences usually located at the N-terminals or C-terminals of protein sequences. Genome-wide experimental identification of protein sorting signals is extremely time-consuming and costly. Effective computational algorithms for de novo discovery of protein sorting signals is needed to improve the understanding of protein sorting mechanisms.

Methods

We formulated the protein sorting motif discovery problem as a classification problem …


Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu Jun 2015

Hemebind: A Novel Method For Heme Binding Residue Prediction By Combining Structural And Sequence Information, R. Liu, Jianjun Hu

Jianjun Hu

Background Accurate prediction of binding residues involved in the interactions between proteins and small ligands is one of the major challenges in structural bioinformatics. Heme is an essential and commonly used ligand that plays critical roles in electron transfer, catalysis, signal transduction and gene expression. Although much effort has been devoted to the development of various generic algorithms for ligand binding site prediction over the last decade, no algorithm has been specifically designed to complement experimental techniques for identification of heme binding residues. Consequently, an urgent need is to develop a computational method for recognizing these important residues. Results Here …


Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. Zhou Jun 2015

Integrative Disease Classification Based On Cross-Platform Microarray Data, C.-C. Liu, Jianjun Hu, M. Kalakrishnan, H. Huang, X. Zhou

Jianjun Hu

Background Disease classification has been an important application of microarray technology. However, most microarray-based classifiers can only handle data generated within the same study, since microarray data generated by different laboratories or with different platforms can not be compared directly due to systematic variations. This issue has severely limited the practical use of microarray-based disease classification. Results In this study, we tested the feasibility of disease classification by integrating the large amount of heterogeneous microarray datasets from the public microarray repositories. Cross-platform data compatibility is created by deriving expression log-rank ratios within datasets. One may then compare vectors of log-rank …


Integrative Missing Value Estimation For Microarray Data, Jianjun Hu, H. Li, M. Waterman, X. Zhou Jun 2015

Integrative Missing Value Estimation For Microarray Data, Jianjun Hu, H. Li, M. Waterman, X. Zhou

Jianjun Hu

Background Missing value estimation is an important preprocessing step in microarray analysis. Although several methods have been developed to solve this problem, their performance is unsatisfactory for datasets with high rates of missing data, high measurement noise, or limited numbers of samples. In fact, more than 80% of the time-series datasets in Stanford Microarray Database contain less than eight samples. Results We present the integrative Missing Value Estimation method (iMISS) by incorporating information from multiple reference microarray datasets to improve missing value estimation. For each gene with missing data, we derive a consistent neighbor-gene list by taking reference data sets …


Library Support For Biomedical Research In The Omics Era: 2014- 2015 Report, Rolando Garcia-Milian May 2015

Library Support For Biomedical Research In The Omics Era: 2014- 2015 Report, Rolando Garcia-Milian

Rolando Garcia-Milian

The decreased cost of high-throughput technologies has enabled its use as the main research methods to study biological processes and disorders. In order to understand the relevance of the data generated by these methods, the researcher needs mining and integrating the enormous amount of biomedical information and knowledge contained in the text of the scientific literature and biomedical databases. Accordingly, the ability to access and examine molecular data should not be restricted to bioinformaticians or those with exceptional computer skills. In May 2014, the Cushing/Whitney Medical Library began to provide end-user bioinformatics support to the biomedical researchers of the Yale …


Deciphering The Associations Between Gene Expression And Copy Number Alteration Using A Sparse Double Laplacian Shrinkage Approach, Shuangge Ma Dec 2014

Deciphering The Associations Between Gene Expression And Copy Number Alteration Using A Sparse Double Laplacian Shrinkage Approach, Shuangge Ma

Shuangge Ma

Both gene expression levels (GEs) and copy number alterations (CNAs) have important implications in the development of complex diseases. GEs are partly regulated by CNAs, and much effort has been devoted to understanding their relations. The expression of a gene can be regulated by multiple CNAs, and one CNA can regulate the expression of multiple genes. In addition, multiple GEs (CNAs) can be correlated with each other. The existing methods for associating GEs with CNAs have limitations in deciphering the complex data structures. In this study, we develop a sparse double Laplacian shrinkage approach. It jointly models the effects of …


A Penalized Robust Semiparametric Approach For Gene-Environment Interactions, Shuangge Ma Dec 2014

A Penalized Robust Semiparametric Approach For Gene-Environment Interactions, Shuangge Ma

Shuangge Ma

In genetic and genomic studies, gene-environment (G*E) interactions have important implications. Some of the existing G$\times$E interaction methods are limited by analyzing a small number of G factors at a time, by assuming linear effects of E factors, by assuming no data contamination, and by adopting ineffective selection techniques. In this study, we propose a new approach for identifying important G*E interactions. It jointly models the effects of all E and G factors and their interactions. A partially linear varying coefficient model (PLVCM) is adopted to accommodate possible nonlinear effects of E factors. A rank-based loss function is used to …


Bringing Toxicology Into The 21st Century: A Global Call To Action, Troy Seidle, Martin Stephens Dec 2014

Bringing Toxicology Into The 21st Century: A Global Call To Action, Troy Seidle, Martin Stephens

Troy Seidle, PhD

Conventional toxicological testing methods are often decades old, costly and low-throughput, with questionable relevance to the human condition. Several of these factors have contributed to a backlog of chemicals that have been inadequately assessed for toxicity. Some authorities have responded to this challenge by implementing large-scale testing programmes. Others have concluded that a paradigm shift in toxicology is warranted. One such call came in 2007 from the United States National Research Council (NRC), which articulated a vision of ‘‘21st century toxicology” based predominantly on non-animal techniques. Potential advantages of such an approach include the capacity to examine a far greater …


Penalized Integrative Analysis Of High-Dimensional Omics Data, Shuangge Ma Dec 2013

Penalized Integrative Analysis Of High-Dimensional Omics Data, Shuangge Ma

Shuangge Ma

No abstract provided.


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Jul 2013

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

George K. Thiruvathukal

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


The Evolution Of Cooperation: How Patience Matters, Atin Basu Choudhary, Raja Mazumder, Vahan Simoyan Jan 2012

The Evolution Of Cooperation: How Patience Matters, Atin Basu Choudhary, Raja Mazumder, Vahan Simoyan

Atin Basu Choudhary

The evolution of cooperation has been the focus of much attention from evolutionary game theorists. Of course, conventional game theorists often cite the Folk Theorem to suggest that cooperation is very likely as long as people are patient. However, experimental and real world evidence of the Folk Theorem has been sparse. We investigate whether cooperation can evolve endogenously in a population where people have different patience levels. We motivate our model by asking the following question: why don’t biologists cooperate with each other by contributing to biological databases? We apply the Folk theorem from conventional game theory and assume that …


Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

William B. Andreopoulos

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Mutual Information Without The Influence Of Phylogeny Or Entropy Dramatically Improves Residue Contact Prediction, Stanley Dunn, Lindi Wahl, Gregory Gloor Dec 2007

Mutual Information Without The Influence Of Phylogeny Or Entropy Dramatically Improves Residue Contact Prediction, Stanley Dunn, Lindi Wahl, Gregory Gloor

Stanley D Dunn

Motivation: Compensating alterations during the evolution of protein families give rise to coevolving positions that contain important structural and functional information. However, a high background composed of random noise and phylogenetic components interferes with the identification of coevolving positions.

Results: We have developed a rapid, simple and general method based on information theory that accurately estimates the level of background mutual information for each pair of positions in a given protein family. Removal of this background results in a metric, MIp, that correctly identifies substantially more coevolving positions in protein families than any existing method. A significant fraction of these …


Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang Dec 2006

Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang

William B. Andreopoulos

Clustering protein-protein interaction networks (PINs) helps to identify complexes that guide the cell machinery. Clustering algorithms often create a flat clustering, without considering the layered structure of PINs. We propose the MULIC clustering algorithm that produces layered clusters. We applied MULIC to five PINs. Clusters correlate with known MIPS protein complexes. For example, a cluster of 79 proteins overlaps with a known complex of 88 proteins. Proteins in top cluster layers tend to be more representative of complexes than proteins in bottom layers. Lab work on finding unknown complexes or determining drug effects can be guided by top layer proteins.


Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang Jun 2006

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang

William B. Andreopoulos

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.