Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Selected Works

Selected Works

2019

Computational Biology

Articles 1 - 5 of 5

Full-Text Articles in Life Sciences

Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu Sep 2019

Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu

Yong Chen

BACKGROUND: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called "missing gene" problem.

METHODS: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the …


Prioritizing Protein Complexes Implicated In Human Diseases By Network Optimization., Yong Chen, Thibault Jacquemin, Shuyan Zhang, Rui Jiang Sep 2019

Prioritizing Protein Complexes Implicated In Human Diseases By Network Optimization., Yong Chen, Thibault Jacquemin, Shuyan Zhang, Rui Jiang

Yong Chen

BACKGROUND: The detection of associations between protein complexes and human inherited diseases is of great importance in understanding mechanisms of diseases. Dysfunctions of a protein complex are usually defined by its member disturbance and consequently result in certain diseases. Although individual disease proteins have been widely predicted, computational methods are still absent for systematically investigating disease-related protein complexes.

RESULTS: We propose a method, MAXCOM, for the prioritization of candidate protein complexes. MAXCOM performs a maximum information flow algorithm to optimize relationships between a query disease and candidate protein complexes through a heterogeneous network that is constructed by combining protein-protein interactions …


Climp: Clustering Motifs Via Maximal Cliques With Parallel Computing Design., Shaoqiang Zhang, Yong Chen Sep 2019

Climp: Clustering Motifs Via Maximal Cliques With Parallel Computing Design., Shaoqiang Zhang, Yong Chen

Yong Chen

A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif …


Identifying Potential Cancer Driver Genes By Genomic Data Integration., Yong Chen, Jingjing Hao, Wei Jiang, Tong He, Xuegong Zhang, Tao Jiang, Rui Jiang Sep 2019

Identifying Potential Cancer Driver Genes By Genomic Data Integration., Yong Chen, Jingjing Hao, Wei Jiang, Tong He, Xuegong Zhang, Tao Jiang, Rui Jiang

Yong Chen

Cancer is a genomic disease associated with a plethora of gene mutations resulting in a loss of control over vital cellular functions. Among these mutated genes, driver genes are defined as being causally linked to oncogenesis, while passenger genes are thought to be irrelevant for cancer development. With increasing numbers of large-scale genomic datasets available, integrating these genomic data to identify driver genes from aberration regions of cancer genomes becomes an important goal of cancer genome analysis and investigations into mechanisms responsible for cancer development. A computational method, MAXDRIVER, is proposed here to identify potential driver genes on the basis …


Fishermp: Fully Parallel Algorithm For Detecting Combinatorial Motifs From Large Chip-Seq Datasets., Shaoqiang Zhang, Ying Liang, Xiangyun Wang, Zhengchang Su, Yong Chen Sep 2019

Fishermp: Fully Parallel Algorithm For Detecting Combinatorial Motifs From Large Chip-Seq Datasets., Shaoqiang Zhang, Ying Liang, Xiangyun Wang, Zhengchang Su, Yong Chen

Yong Chen

Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also …