Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Genetics and Genomics

Discerning Novel Splice Junctions Derived From Rna-Seq Alignment: A Deep Learning Approach, Yi Zhang, Xinan Liu, James N. Macleod, Jinze Liu Dec 2018

Discerning Novel Splice Junctions Derived From Rna-Seq Alignment: A Deep Learning Approach, Yi Zhang, Xinan Liu, James N. Macleod, Jinze Liu

Computer Science Faculty Publications

Background: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation.

Results: …


Seqothello: Querying Rna-Seq Experiments At Scale, Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu Oct 2018

Seqothello: Querying Rna-Seq Experiments At Scale, Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu

Computer Science Faculty Publications

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.


Imapsplice: Alleviating Reference Bias Through Personalized Rna-Seq Alignment, Xinan Liu, James N. Macleod, Jinze Liu Aug 2018

Imapsplice: Alleviating Reference Bias Through Personalized Rna-Seq Alignment, Xinan Liu, James N. Macleod, Jinze Liu

Computer Science Faculty Publications

Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The …


Prediction Of Lncrna-Disease Associations Based On Inductive Matrix Completion, Chengqian Lu, Mengyun Yang, Feng Luo, Fang-Xiang Wu, Min Li, Yi Pan, Yaohang Li, Jianxin Wang Apr 2018

Prediction Of Lncrna-Disease Associations Based On Inductive Matrix Completion, Chengqian Lu, Mengyun Yang, Feng Luo, Fang-Xiang Wu, Min Li, Yi Pan, Yaohang Li, Jianxin Wang

Computer Science Faculty Publications

Motivation: Accumulating evidences indicate that long non-coding RNAs (lncRNAs) play pivotal roles in various biological processes. Mutations and dysregulations of lncRNAs are implicated in miscellaneous human diseases. Predicting lncRNA–disease associations is beneficial to disease diagnosis as well as treatment. Although many computational methods have been developed, precisely identifying lncRNA–disease associations, especially for novel lncRNAs, remains challenging.

Results: In this study, we propose a method (named SIMCLDA) for predicting potential lncRNA– disease associations based on inductive matrix completion. We compute Gaussian interaction profile kernel of lncRNAs from known lncRNA–disease interactions and functional similarity of diseases based on disease–gene and gene–gene onotology …


An Investigation Of Atomic Structures Derived From X-Ray Crystallography And Cryo-Electron Microscopy Using Distal Blocks Of Side-Chains, Lin Chen, Jing He, Salim Sazzed, Rayshawn Walker Jan 2018

An Investigation Of Atomic Structures Derived From X-Ray Crystallography And Cryo-Electron Microscopy Using Distal Blocks Of Side-Chains, Lin Chen, Jing He, Salim Sazzed, Rayshawn Walker

Computer Science Faculty Publications

Cryo-electron microscopy (cryo-EM) is a structure determination method for large molecular complexes. As more and more atomic structures are determined using this technique, it is becoming possible to perform statistical characterization of side-chain conformations. Two data sets were involved to characterize block lengths for each of the 18 types of amino acids. One set contains 9131 structures resolved using X-ray crystallography from density maps with better than or equal to 1.5 Å resolutions, and the other contains 237 protein structures derived from cryo-EM density maps with 2-4 Å resolutions. The results show that the normalized probability density function of block …