Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Physical Sciences and Mathematics

Discerning Novel Splice Junctions Derived From Rna-Seq Alignment: A Deep Learning Approach, Yi Zhang, Xinan Liu, James N. Macleod, Jinze Liu Dec 2018

Discerning Novel Splice Junctions Derived From Rna-Seq Alignment: A Deep Learning Approach, Yi Zhang, Xinan Liu, James N. Macleod, Jinze Liu

Computer Science Faculty Publications

Background: Exon splicing is a regulated cellular process in the transcription of protein-coding genes. Technological advancements and cost reductions in RNA sequencing have made quantitative and qualitative assessments of the transcriptome both possible and widely available. RNA-seq provides unprecedented resolution to identify gene structures and resolve the diversity of splicing variants. However, currently available ab initio aligners are vulnerable to spurious alignments due to random sequence matches and sample-reference genome discordance. As a consequence, a significant set of false positive exon junction predictions would be introduced, which will further confuse downstream analyses of splice variant discovery and abundance estimation.

Results: …


X-Search: An Open Access Interface For Cross-Cohort Exploration Of The National Sleep Research Resource, Licong Cui, Ningzhou Zeng, Matthew Kim, Remo Mueller, Emily Ruth Hankosky, Susan Redline, Guo-Qiang Zhang Nov 2018

X-Search: An Open Access Interface For Cross-Cohort Exploration Of The National Sleep Research Resource, Licong Cui, Ningzhou Zeng, Matthew Kim, Remo Mueller, Emily Ruth Hankosky, Susan Redline, Guo-Qiang Zhang

Computer Science Faculty Publications

Background: The National Sleep Research Resource (NSRR) is a large-scale, openly shared, data repository of de-identified, highly curated clinical sleep data from multiple NIH-funded epidemiological studies. Although many data repositories allow users to browse their content, few support fine-grained, cross-cohort query and exploration at study-subject level. We introduce a cross-cohort query and exploration system, called X-search, to enable researchers to query patient cohort counts across a growing number of completed, NIH-funded studies in NSRR and explore the feasibility or likelihood of reusing the data for research studies.

Methods: X-search has been designed as a general framework with two loosely-coupled components: …


Seqothello: Querying Rna-Seq Experiments At Scale, Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu Oct 2018

Seqothello: Querying Rna-Seq Experiments At Scale, Ye Yu, Jinpeng Liu, Xinan Liu, Yi Zhang, Eamonn Magner, Erik Lehnert, Chen Qian, Jinze Liu

Computer Science Faculty Publications

We present SeqOthello, an ultra-fast and memory-efficient indexing structure to support arbitrary sequence query against large collections of RNA-seq experiments. It takes SeqOthello only 5 min and 19.1 GB memory to conduct a global survey of 11,658 fusion events against 10,113 TCGA Pan-Cancer RNA-seq datasets. The query recovers 92.7% of tier-1 fusions curated by TCGA Fusion Gene Database and reveals 270 novel occurrences, all of which are present as tumor-specific. By providing a reference-free, alignment-free, and parameter-free sequence search system, SeqOthello will enable large-scale integrative studies using sequence-level data, an undertaking not previously practicable for many individual labs.


An Outlier Detection Algorithm Based On Cross-Correlation Analysis For Time Series Dataset, Hui Lu, Yaxian Liu, Zongming Fei, Chongchong Guan Sep 2018

An Outlier Detection Algorithm Based On Cross-Correlation Analysis For Time Series Dataset, Hui Lu, Yaxian Liu, Zongming Fei, Chongchong Guan

Computer Science Faculty Publications

Outlier detection is a very essential problem in a variety of application areas. Many detection methods are deficient for high-dimensional time series data sets containing both isolated and assembled outliers. In this paper, we propose an Outlier Detection method based on Cross-correlation Analysis (ODCA). ODCA consists of three key parts. They are data preprocessing, outlier analysis, and outlier rank. First, we investigate a linear interpolation method to convert assembled outliers into isolated ones. Second, a detection mechanism based on the cross-correlation analysis is proposed for translating the high-dimensional data sets into 1-D cross-correlation function, according to which the isolated outlier …


Imapsplice: Alleviating Reference Bias Through Personalized Rna-Seq Alignment, Xinan Liu, James N. Macleod, Jinze Liu Aug 2018

Imapsplice: Alleviating Reference Bias Through Personalized Rna-Seq Alignment, Xinan Liu, James N. Macleod, Jinze Liu

Computer Science Faculty Publications

Genomic variants in both coding and non-coding sequences can have functionally important and sometimes deleterious effects on exon splicing of gene transcripts. For transcriptome profiling using RNA-seq, the accurate alignment of reads across exon junctions is a critical step. Existing algorithms that utilize a standard reference genome as a template sometimes have difficulty in mapping reads that carry genomic variants. These problems can lead to allelic ratio biases and the failure to detect splice variants created by splice site polymorphisms. To improve RNA-seq read alignment, we have developed a novel approach called iMapSplice that enables personalized mRNA transcriptome profiling. The …


Query-Constraint-Based Mining Of Association Rules For Exploratory Analysis Of Clinical Datasets In The National Sleep Research Resource, Rashmie Abeysinghe, Licong Cui Jul 2018

Query-Constraint-Based Mining Of Association Rules For Exploratory Analysis Of Clinical Datasets In The National Sleep Research Resource, Rashmie Abeysinghe, Licong Cui

Computer Science Faculty Publications

Background: Association Rule Mining (ARM) has been widely used by biomedical researchers to perform exploratory data analysis and uncover potential relationships among variables in biomedical datasets. However, when biomedical datasets are high-dimensional, performing ARM on such datasets will yield a large number of rules, many of which may be uninteresting. Especially for imbalanced datasets, performing ARM directly would result in uninteresting rules that are dominated by certain variables that capture general characteristics.

Methods: We introduce a query-constraint-based ARM (QARM) approach for exploratory analysis of multiple, diverse clinical datasets in the National Sleep Research Resource (NSRR). QARM enables rule mining on …


Dynamic Non-Rigid Objects Reconstruction With A Single Rgb-D Sensor, Sen Wang, Xinxin Zuo, Chao Du, Runxiao Wang, Jiangbin Zheng, Ruigang Yang Mar 2018

Dynamic Non-Rigid Objects Reconstruction With A Single Rgb-D Sensor, Sen Wang, Xinxin Zuo, Chao Du, Runxiao Wang, Jiangbin Zheng, Ruigang Yang

Computer Science Faculty Publications

This paper deals with the 3D reconstruction problem for dynamic non-rigid objects with a single RGB-D sensor. It is a challenging task as we consider the almost inevitable accumulation error issue in some previous sequential fusion methods and also the possible failure of surface tracking in a long sequence. Therefore, we propose a global non-rigid registration framework and tackle the drifting problem via an explicit loop closure. Our novel scheme starts with a fusion step to get multiple partial scans from the input sequence, followed by a pairwise non-rigid registration and loop detection step to obtain correspondences between neighboring partial …


Scheduling Based On Interruption Analysis And Pso For Strictly Periodic And Preemptive Partitions In Integrated Modular Avionics, Hui Lu, Qianlin Zhou, Zongming Fei, Rongrong Zhou Mar 2018

Scheduling Based On Interruption Analysis And Pso For Strictly Periodic And Preemptive Partitions In Integrated Modular Avionics, Hui Lu, Qianlin Zhou, Zongming Fei, Rongrong Zhou

Computer Science Faculty Publications

Integrated modular avionics introduces the concept of partition and has been widely used in avionics industry. Partitions share the computing resources together. Partition scheduling plays a key role in guaranteeing correct execution of partitions. In this paper, a strictly periodic and preemptive partition scheduling strategy is investigated. First, we propose a partition scheduling model that allows a partition to be interrupted by other partitions, but minimizes the number of interruptions. The model not only retains the execution reliability of the simple partition sets that can be scheduled without interruptions, but also enhances the schedulability of the complex partition sets that …


Kratylos: A Tool For Sharing Interlinearized And Lexical Data In Diverse Formats, Daniel Kaufman, Raphael Finkel Mar 2018

Kratylos: A Tool For Sharing Interlinearized And Lexical Data In Diverse Formats, Daniel Kaufman, Raphael Finkel

Computer Science Faculty Publications

In this paper we present Kratylos, at www.kratylos.org/, a web application that creates searchable multimedia corpora from data collections in diverse formats, including collections of interlinearized glossed text (IGT) and dictionaries. There exists a crucial lacuna in the electronic ecology that supports language documentation and linguistic research. Vast amounts of IGT are produced in stand-alone programs without an easy way to share them publicly as dynamic databases. Solving this problem will not only unlock an enormous amount of linguistic information that can be shared easily across the web, it will also improve accountability by allowing us to verify analyses …


Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang Feb 2018

Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang

Computer Science Faculty Publications

Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.

Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor …


Learning To Generate Natural Language Rationales For Game Playing Agents, Upol Ehsan, Pradyumna Tambwekar, Larry Chan, Brent Harrison, Mark O. Riedl Jan 2018

Learning To Generate Natural Language Rationales For Game Playing Agents, Upol Ehsan, Pradyumna Tambwekar, Larry Chan, Brent Harrison, Mark O. Riedl

Computer Science Faculty Publications

Many computer games feature non-player charactert (NPC) teammates and companions; however, playing with or against NPCs can be frustrating when they perform unexpectedly. These frustrations can be avoided if the NPC has the ability to explain its actions and motivations. When NPC behavior is controlled by a black box AI system it can be hard to generate the necessary explanations. In this paper, we present a system that generates human-like, natural language explanations—called rationales—of an agent's actions in a game environment regardless of how the decisions are made by a black box AI. We outline a robust data collection …


Random Models Of Very Hard 2qbf And Disjunctive Programs: An Overview, Giovanni Amendola, Francesco Ricca, Miroslaw Truszczynski Jan 2018

Random Models Of Very Hard 2qbf And Disjunctive Programs: An Overview, Giovanni Amendola, Francesco Ricca, Miroslaw Truszczynski

Computer Science Faculty Publications

We present an overview of models of random quantified boolean formulas and their natural random disjunctive ASP program counter-parts that we have recently proposed. The models have a simple structure but also theoretical and empirical properties that make them useful for further advancement of the SAT, QBF and ASP solvers.