Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.

30 Institutions 428 Full-Text Articles 838 Authors 59,709 Downloads

Recent Articles in Bioinformatics

Functional Analysis Of Transcription Factor Binding Sites In Human Promoters, Troy W. Whitfield, Jie Wang, Patrick J. Collins, E. Christopher Partridge, Shelley Force Aldred, Nathan D. Trinklein, Richard M. Myers, Zhiping Weng University of Massachusetts Medical School

Functional Analysis Of Transcription Factor Binding Sites In Human Promoters, Troy W. Whitfield, Jie Wang, Patrick J. Collins, E. Christopher Partridge, Shelley Force Aldred, Nathan D. Trinklein, Richard M. Myers, Zhiping Weng

Program in Bioinformatics and Integrative Biology Publications and Presentations

BACKGROUND: The binding of transcription factors to specific locations in the genome is integral to the orchestration of transcriptional regulation in cells. To characterize transcription factor binding site function on a large scale, we predicted and mutagenized 455 binding sites in human promoters. We carried out functional tests on these sites in four different immortalized human cell lines using transient transfections with a luciferase reporter assay, primarily for the transcription factors CTCF, GABP, GATA2, E2F, STAT, and YY1.

RESULTS: In each cell line, between 36% and 49% of binding sites made a functional contribution to the promoter activity; the overall ...


Understanding Transcriptional Regulation By Integrative Analysis Of Transcription Factor Binding Data, Chao Cheng, Roger Alexander, Rengqiang Min, Jing Leng, Kevin Y. Yip, Joel Rozowsky, Koon-Kiu Yan, Xianjun Dong, Sarah Djebali, Yijun Ruan, Carrie A. Davis, Piero Carninci, Timo Lassman, Thomas R. Gingeras, Roderic Guigo, Ewan Birney, Zhiping Weng, Michael Snyder, Mark B. Gerstein University of Massachusetts Medical School

Understanding Transcriptional Regulation By Integrative Analysis Of Transcription Factor Binding Data, Chao Cheng, Roger Alexander, Rengqiang Min, Jing Leng, Kevin Y. Yip, Joel Rozowsky, Koon-Kiu Yan, Xianjun Dong, Sarah Djebali, Yijun Ruan, Carrie A. Davis, Piero Carninci, Timo Lassman, Thomas R. Gingeras, Roderic Guigo, Ewan Birney, Zhiping Weng, Michael Snyder, Mark B. Gerstein

Program in Bioinformatics and Integrative Biology Publications and Presentations

Statistical models have been used to quantify the relationship between gene expression and transcription factor (TF) binding signals. Here we apply the models to the large-scale data generated by the ENCODE project to study transcriptional regulation by TFs. Our results reveal a notable difference in the prediction accuracy of expression levels of transcription start sites (TSSs) captured by different technologies and RNA extraction protocols. In general, the expression levels of TSSs with high CpG content are more predictable than those with low CpG content. For genes with alternative TSSs, the expression levels of downstream TSSs are more predictable than those ...


Sequence Features And Chromatin Structure Around The Genomic Regions Bound By 119 Human Transcription Factors, Jie Wang, Jiali Zhuang, Sowmya Iyer, Xinying Lin, Troy W. Whitfield, Melissa C. Greven, Brian G. Pierce, Xianjun Dong, Anshul Kundaje, Yong Cheng, Oliver J. Rando, Ewan Birney, Richard M. Myers, William S. Noble, Michael Snyder, Zhiping Weng University of Massachusetts Medical School

Sequence Features And Chromatin Structure Around The Genomic Regions Bound By 119 Human Transcription Factors, Jie Wang, Jiali Zhuang, Sowmya Iyer, Xinying Lin, Troy W. Whitfield, Melissa C. Greven, Brian G. Pierce, Xianjun Dong, Anshul Kundaje, Yong Cheng, Oliver J. Rando, Ewan Birney, Richard M. Myers, William S. Noble, Michael Snyder, Zhiping Weng

Program in Bioinformatics and Integrative Biology Publications and Presentations

Chromatin immunoprecipitation coupled with high-throughput sequencing (ChIP-seq) has become the dominant technique for mapping transcription factor (TF) binding regions genome-wide. We performed an integrative analysis centered around 457 ChIP-seq data sets on 119 human TFs generated by the ENCODE Consortium. We identified highly enriched sequence motifs in most data sets, revealing new motifs and validating known ones. The motif sites (TF binding sites) are highly conserved evolutionarily and show distinct footprints upon DNase I digestion. We frequently detected secondary motifs in addition to the canonical motifs of the TFs, indicating tethered binding and cobinding between multiple TFs. We observed significant ...


Factorbook.Org: A Wiki-Based Database For Transcription Factor-Binding Data Generated By The Encode Consortium, Jie Wang, Jiali Zhuang, Sowmya Iyer, Xinying Lin, Melissa C. Greven, Bong-Hyun Kim, Jill Moore, Brian G. Pierce, Xianjun Dong, Daniel Virgil, Ewan Birney, Jui-Hung Hung, Zhiping Weng University of Massachusetts Medical School

Factorbook.Org: A Wiki-Based Database For Transcription Factor-Binding Data Generated By The Encode Consortium, Jie Wang, Jiali Zhuang, Sowmya Iyer, Xinying Lin, Melissa C. Greven, Bong-Hyun Kim, Jill Moore, Brian G. Pierce, Xianjun Dong, Daniel Virgil, Ewan Birney, Jui-Hung Hung, Zhiping Weng

Program in Bioinformatics and Integrative Biology Publications and Presentations

The Encyclopedia of DNA Elements (ENCODE) consortium aims to identify all functional elements in the human genome including transcripts, transcriptional regulatory regions, along with their chromatin states and DNA methylation patterns. The ENCODE project generates data utilizing a variety of techniques that can enrich for regulatory regions, such as chromatin immunoprecipitation (ChIP), micrococcal nuclease (MNase) digestion and DNase I digestion, followed by deeply sequencing the resulting DNA. As part of the ENCODE project, we have developed a Web-accessible repository accessible at http://factorbook.org. In Wiki format, factorbook is a transcription factor (TF)-centric repository of all ENCODE ChIP-seq datasets ...


Strand-Specific Libraries For High Throughput Rna Sequencing (Rna-Seq) Prepared Without Poly(A) Selection, Zhao Zhang, William E. Theurkauf, Zhiping Weng, Phillip D. Zamore University of Massachusetts Medical School

Strand-Specific Libraries For High Throughput Rna Sequencing (Rna-Seq) Prepared Without Poly(A) Selection, Zhao Zhang, William E. Theurkauf, Zhiping Weng, Phillip D. Zamore

Program in Bioinformatics and Integrative Biology Publications and Presentations

BACKGROUND: High throughput DNA sequencing technology has enabled quantification of all the RNAs in a cell or tissue, a method widely known as RNA sequencing (RNA-Seq). However, non-coding RNAs such as rRNA are highly abundant and can consume >70% of sequencing reads. A common approach is to extract only polyadenylated mRNA; however, such approaches are blind to RNAs with short or no poly(A) tails, leading to an incomplete view of the transcriptome. Another challenge of preparing RNA-Seq libraries is to preserve the strand information of the RNAs.

DESIGN: Here, we describe a procedure for preparing RNA-Seq libraries from 1 ...


Networking Development By Boolean Logic, Shikui Tu, Thoru Pederson, Zhiping Weng University of Massachusetts Medical School

Networking Development By Boolean Logic, Shikui Tu, Thoru Pederson, Zhiping Weng

Program in Bioinformatics and Integrative Biology Publications and Presentations

Eric Davidson at Caltech has spent several decades investigating the molecular basis of animal development using the sea urchin embryo as an experimental system ( 1) (,) ( 2) although his scholarship extends to all of embryology as embodied in several editions of his landmark book. ( 3) In recent years his laboratory has become a leading force in constructing gene regulatory networks (GRNs) operating in sea urchin development. ( 4) This axis of his work has its roots in this laboratory's cDNA cloning of an actin mRNA from the sea urchin embryo (for the timeline see ref. 1)-one of the first eukaryotic ...


How I Would Like Semantic Web To Be, For My Children., Raghava Mutharaju Wright State University

How I Would Like Semantic Web To Be, For My Children., Raghava Mutharaju

Kno.e.sis Publications

Semantic Web, since its inception, has gone through lot of developments in its relatively nascent existence; right from people's perception, to the standards and to its adoption by the industry and more importantly by the scientific community. This impressive growth only seems to increase. In this paper, we project this growth to the next 10 years and highlight some of the facets on which Semantic Web could have a major impact on. We also present the challenges that Semantic Web and its community has to deal with in order to get there.


Reconciling Owl And Non-Monotonic Rules For The Semantic Web, Matthias Knorr, Pascal Hitzler, Frederick Maier Wright State University

Reconciling Owl And Non-Monotonic Rules For The Semantic Web, Matthias Knorr, Pascal Hitzler, Frederick Maier

Kno.e.sis Publications

We propose a description logic extending SROIQ (the description logic underlying OWL 2 DL) and at the same time encompassing some of the most prominent monotonic and nonmonotonic rule languages, in particular Datalog extended with the answer set semantics. Our proposal could be considered a substantial contribution towards fullfilling the quest for a unifying logic for the Semantic Web. As a case in point, two non-monotonic extensions of description logics considered to be of distinct expressiveness until now are covered in our proposal. In contrast to earlier such proposals, our language has the 'look and feel' of a description logic ...


Reasoning With Fuzzy-El+ Ontologies Using Mapreduce, Zhangquan Zhou, Guilin Qi, Chang Lui, Pascal Hitzler, Raghava Mutharaju Wright State University

Reasoning With Fuzzy-El+ Ontologies Using Mapreduce, Zhangquan Zhou, Guilin Qi, Chang Lui, Pascal Hitzler, Raghava Mutharaju

Kno.e.sis Publications

Fuzzy extension of Description Logics (DLs) allows the formal representation and handling of fuzzy knowledge. In this paper, we consider fuzzy-EL+, which is a fuzzy extension of EL+. We first present revised completion rules for fuzzy-EL+ that can be handled by MapReduce programs.We then propose an algorithm for scale reasoning with fuzzy-EL+ ontologies based on MapReduce.


A Tableau Algorithm For Description Logics With Nominal Schema, Adila Krisnadhi, Pascal Hitzler Wright State University

A Tableau Algorithm For Description Logics With Nominal Schema, Adila Krisnadhi, Pascal Hitzler

Kno.e.sis Publications

Nominal schema is an expressive description logic (DL) construct that was proposed in recent efforts to integrate DLs and (logic programming) rule-based paradigms for the Semantic Web [1] represented by two “diverging” W3C standards: the DL-based Web Ontology Language (OWL) [2] whose major variant, OWL 2 DL, is based on the description logic (DL) SROIQ [3]; and the rulebased Rule Interchange Format (RIF) whose core variant, called RIF Core [4], is essentially Datalog, i.e., function-free Horn logic.


Iexplore: A Provenance-Based Application For Exploring Biomedical Knowledge, Vinh Nguyen, Oliver Bodenreider, Thomas Rindflesch, Amit Sheth Wright State University

Iexplore: A Provenance-Based Application For Exploring Biomedical Knowledge, Vinh Nguyen, Oliver Bodenreider, Thomas Rindflesch, Amit Sheth

Kno.e.sis Publications

No abstract provided.


Traffic Analytics Using Probabilistic Graphical Models Enhanced With Knowledge Bases, Pramod Anatharam, Krishnaprasad Thirunarayan, Amit Sheth Wright State University

Traffic Analytics Using Probabilistic Graphical Models Enhanced With Knowledge Bases, Pramod Anatharam, Krishnaprasad Thirunarayan, Amit Sheth

Kno.e.sis Publications

Graphical models have been successfully used to deal with uncertainty, incompleteness, and dynamism within many domains. These models built from data often ignore preexisting declarative knowledge about the domain in the form of ontologies and Linked Open Data (LOD) that is increasingly available on the web. In this paper, we present an approach to leverage such 'top-down' domain knowledge to enhance 'bottom-up' building of graphical models. Specifically, we propose three operations on the graphical model structure to enrich it with nodes, edges, and edge directions. We illustrate the enrichment process using traffic data from 511.org and declarative knowledge from ...


What Kind Of #Conversation Is Twitter? Mining #Psycholinguistic Cues For Emergency Coordination, Hemant Purohit, Andrew Hampton, Valerie L. Shalin, Amit Sheth, John Flach, Shreyansh Bhatt Wright State University

What Kind Of #Conversation Is Twitter? Mining #Psycholinguistic Cues For Emergency Coordination, Hemant Purohit, Andrew Hampton, Valerie L. Shalin, Amit Sheth, John Flach, Shreyansh Bhatt

Kno.e.sis Publications

The information overload created by social media messages in emergency situations challenges response organizations to find targeted content and users. We aim to select useful messages by detecting the presence of conversation as an indicator of coordinated citizen action. Using simple linguistic indicators associated with conversation analysis in social science, we model the presence of conversation in the communication landscape of Twitter in a large corpus of 1.5M tweets for various disaster and non-disaster events spanning different periods, lengths of time and varied social significance. Within Replies, Retweets and tweets that mention other Twitter users, we found that domain-independent ...


Comparative Analyses Of Microbial Genomes To Identify Molecular Markers For Different Groups Of Prokaryotes, Vaibhav Bhandari McMaster University

Comparative Analyses Of Microbial Genomes To Identify Molecular Markers For Different Groups Of Prokaryotes, Vaibhav Bhandari

Open Access Dissertations and Theses

Currently centered on molecular data, bacterial and archaeal relationships are often based on their relative branching in 16S rRNA based phylogenetic trees. The availability of numerous bacterial genome sequences over the past two decades has provided new information for insights previously inaccessible to the field of taxonomy. Through utilization of comparative genomics, numerous molecular markers in the form of insertions and deletions within conserved regions of proteins, also known as Conserved Signature Indels or CSIs, have been discovered for various prokaryotic taxa. Using these techniques, we have analyzed relationships among the bacterial phyla of Thermotogae and Synergistetes and the conglomeration ...


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Loyola University Chicago

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Computer Science: Faculty Publications & Other Works

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we ...


Pathway Distiller - Multisource Biological Pathway Consolidation, Mark S. Doderer, Zachry Anguiano, Uthra Suresh, Ravi Dashnamoorthy, Alexander J. R. Bishop, Yidong Chen University of Massachusetts Medical School

Pathway Distiller - Multisource Biological Pathway Consolidation, Mark S. Doderer, Zachry Anguiano, Uthra Suresh, Ravi Dashnamoorthy, Alexander J. R. Bishop, Yidong Chen

Open Access Articles

BACKGROUND: One method to understand and evaluate an experiment that produces a large set of genes, such as a gene expression microarray analysis, is to identify overrepresentation or enrichment for biological pathways. Because pathways are able to functionally describe the set of genes, much effort has been made to collect curated biological pathways into publicly accessible databases. When combining disparate databases, highly related or redundant pathways exist, making their consolidation into pathway concepts essential. This will facilitate unbiased, comprehensive yet streamlined analysis of experiments that result in large gene sets.

METHODS: After gene set enrichment finds representative pathways for large ...


Interpretation, Stratification And Validation Of Sequence Variants Affecting Mrna Splicing In Complete Human Genome Sequences, Ben C. Shirley Western University

Interpretation, Stratification And Validation Of Sequence Variants Affecting Mrna Splicing In Complete Human Genome Sequences, Ben C. Shirley

University of Western Ontario - Electronic Thesis and Dissertation Repository

The Shannon Human Splicing Pipeline software has been developed to analyze variants on a genome-scale. Evidence is provided that this software predicts variants affecting mRNA splicing. Variants are examined through information-based analysis and the context of novel mutations as well as common and rare SNPs with splicing effects are displayed. Potential natural and cryptic mRNA splicing variants are identified, and inactivating mutations are distinguished from leaky mutations. Mutations and rare SNPs were predicted in genomes of three cancer cell lines (U2OS, U251 and A431), supported by expression analyses. After filtering, tractable numbers of potentially deleterious variants are predicted by the ...


An Investigation Of Gene Networks Influenced By Low Dose Ionizing Radiation Using Statistical And Graph Theoretical Algorithms, Sudhir Naswa University of Tennessee, Knoxville

An Investigation Of Gene Networks Influenced By Low Dose Ionizing Radiation Using Statistical And Graph Theoretical Algorithms, Sudhir Naswa

Doctoral Dissertations

Increased application of radiation in health and security sectors has raised concerns about its deleterious effects. Ionizing radiation (IR) less than 10cGys is considered low dose ionizing radiation (LDIR) by the National Research Committee to assess health risks from exposure to low levels of IR.

It is hard to extract the effects of mild stimulus such as LDIR on gene expression profiles using simple differential expression. We hypothesized that differential correlation instead would capture the effects of LDIR on mutual relationships between genes. We tested this hypothesis on expression profiles from five inbred strains of mice treated with LDIR. Whereas ...


Utilizing Nmr Spectroscopy And Molecular Docking As Tools For The Structural Determination And Functional Annotation Of Proteins, Jaime Stark University of Nebraska - Lincoln

Utilizing Nmr Spectroscopy And Molecular Docking As Tools For The Structural Determination And Functional Annotation Of Proteins, Jaime Stark

Student Research Projects, Dissertations, and Theses - Chemistry Department

With the completion of the Human Genome Project in 2001 and the subsequent explosion of organisms with sequenced genomes, we are now aware of nearly 28 million proteins. Determining the role of each of these proteins is essential to our understanding of biology and the development of medical advances. Unfortunately, the experimental approaches to determine protein function are too slow to investigate every protein. Bioinformatics approaches, such as sequence and structure homology, have helped to annotate the functions of many similar proteins. However, despite these computational approaches, approximately 40% of proteins still have no known function. Alleviating this deficit will ...


Radiomics Of Nsclc: Quantitative Ct Image Feature Characterization And Tumor Shrinkage Prediction, Luke Hunter Texas Medical Center Library

Radiomics Of Nsclc: Quantitative Ct Image Feature Characterization And Tumor Shrinkage Prediction, Luke Hunter

UT GSBS Dissertations and Theses (Open Access)

Radiomics is the high-throughput extraction and analysis of quantitative image features. For non-small cell lung cancer (NSCLC) patients, radiomics can be applied to standard of care computed tomography (CT) images to improve tumor diagnosis, staging, and response assessment.

The first objective of this work was to show that CT image features extracted from pre-treatment NSCLC tumors could be used to predict tumor shrinkage in response to therapy. This is important since tumor shrinkage is an important cancer treatment endpoint that is correlated with probability of disease progression and overall survival. Accurate prediction of tumor shrinkage could also lead to individually ...