Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Series

2010

Institution
Keyword
Publication

Articles 61 - 68 of 68

Full-Text Articles in Life Sciences

An Intelligent Data-Centric Approach Toward Identification Of Conserved Motifs In Protein Sequences, Kathryn Dempsey Cooper, Benjamin Currall, Richard Hallworth, Hesham Ali Jan 2010

An Intelligent Data-Centric Approach Toward Identification Of Conserved Motifs In Protein Sequences, Kathryn Dempsey Cooper, Benjamin Currall, Richard Hallworth, Hesham Ali

Interdisciplinary Informatics Faculty Proceedings & Presentations

The continued integration of the computational and biological sciences has revolutionized genomic and proteomic studies. However, efficient collaboration between these fields requires the creation of shared standards. A common problem arises when biological input does not properly fit the expectations of the algorithm, which can result in misinterpretation of the output. This potential confounding of input/output is a drawback especially when regarding motif finding software. Here we propose a method for improving output by selecting input based upon evolutionary distance, domain architecture, and known function. This method improved detection of both known and unknown motifs in two separate case studies. …


Improving Predicted Protein Loop Structure Ranking Using A Pareto-Optimality Consensus Method, Yaohang Li, Ionel Rata, See-Wing Chiu, Erik Jakobsson Jan 2010

Improving Predicted Protein Loop Structure Ranking Using A Pareto-Optimality Consensus Method, Yaohang Li, Ionel Rata, See-Wing Chiu, Erik Jakobsson

Computer Science Faculty Publications

Background

Accurate protein loop structure models are important to understand functions of many proteins. Identifying the native or near-native models by distinguishing them from the misfolded ones is a critical step in protein loop structure prediction.

Results

We have developed a Pareto Optimal Consensus (POC) method, which is a consensus model ranking approach to integrate multiple knowledge- or physics-based scoring functions. The procedure of identifying the models of best quality in a model set includes: 1) identifying the models at the Pareto optimal front with respect to a set of scoring functions, and 2) ranking them based on the fuzzy …


Provenance Aware Linked Sensor Data, Harshal Kamlesh Patni, Satya S. Sahoo, Cory Andrew Henson, Amit P. Sheth Jan 2010

Provenance Aware Linked Sensor Data, Harshal Kamlesh Patni, Satya S. Sahoo, Cory Andrew Henson, Amit P. Sheth

Kno.e.sis Publications

Provenance, from the French word “provenir”, describes the lineage or history of a data entity. Provenance is critical information in the sensors domain to identify a sensor and analyze the observation data over time and geographical space. In this paper, we present a framework to model and query the provenance information associated with the sensor data exposed as part of the Web of Data using the Linked Open Data conventions. This is accomplished by developing an ontology-driven provenance management infrastructure that includes a representation model and query infrastructure. This provenance infrastructure, called Sensor Provenance Management System (PMS), is …


Getting Code Near The Data: A Study Of Generating Customized Data Intensive Scientific Workflows With Domain Specific Language, Ashwin Manjunatha, Ajith Harshana Ranabahu, Paul E. Anderson, Amit P. Sheth Jan 2010

Getting Code Near The Data: A Study Of Generating Customized Data Intensive Scientific Workflows With Domain Specific Language, Ashwin Manjunatha, Ajith Harshana Ranabahu, Paul E. Anderson, Amit P. Sheth

Kno.e.sis Publications

The amount of data produced in modern biological experiments such as Nuclear Magnetic Resonance (NMR) analysis far exceeds the processing capability of a single machine. The present state-of-the-art is taking the ”data to code”, the philosophy followed by many of the current service oriented workflow systems. However this is not feasible in some cases such as NMR data analysis, primarily due to the large scale of data.

The objective of this research is to bring ”code to data”, preferred in the cases when the data is extremely large. We present a DSL based approach to develop customized data intensive scientific …


Scale: A Scalable Framework For Efficiently Clustering Transactional Data, Hua Yan, Keke Chen, Ling Liu, Zhang Yi Jan 2010

Scale: A Scalable Framework For Efficiently Clustering Transactional Data, Hua Yan, Keke Chen, Ling Liu, Zhang Yi

Kno.e.sis Publications

This paper presents SCALE, a fully automated transactional clustering framework. The SCALE design highlights three unique features. First, we introduce the concept of Weighted Coverage Density as a categorical similarity measure for efficient clustering of transactional datasets. The concept of weighted coverage density is intuitive and it allows the weight of each item in a cluster to be changed dynamically according to the occurrences of items. Second, we develop the weighted coverage density measure based clustering algorithm, a fast, memory-efficient, and scalable clustering algorithm for analyzing transactional data. Third, we introduce two clustering validation metrics and show that these domain …


Genbank, Dennis A. Benson, Ilene Karasch-Mizrachi, David J. Lipman, James Ostell, Eric W. Sayers Jan 2010

Genbank, Dennis A. Benson, Ilene Karasch-Mizrachi, David J. Lipman, James Ostell, Eric W. Sayers

Harold W. Manter Laboratory: Library Materials

GenBank(R) is a comprehensive database that contains publicly available nucleotide sequences for more than 380,000 organisms named at the genus level or lower, obtained primarily through submissions from individual laboratories and batch submissions from large-scale sequencing projects, including whole genome shotgun (WGS) and environmental sampling projects. Most submissions are made using the web-based BankIt or standalone Sequin programs, and accession numbers are assigned by GenBank staff upon receipt. Daily data exchange with the European Nucleotide Archive (ENA) and the DNA Data Bank of Japan (DDBJ) ensures worldwide coverage. GenBank is accessible through the NCBI Entrez retrieval system that integrates data …


A Sketch-Based Language For Representing Uncertainty In The Locations Of Origin Of Herbarium Specimens, Barry J. Kronenfeld, Andrew Weeks Jan 2010

A Sketch-Based Language For Representing Uncertainty In The Locations Of Origin Of Herbarium Specimens, Barry J. Kronenfeld, Andrew Weeks

Faculty Research and Creative Activity

Uncertainty fields have been suggested as an appropriate model for retrospective georeferencing of herbarium specimens. Previous work has focused only on automated data capture methods, but techniques for manual data specification may be able to harness human spatial cognition skills to quickly interpret complex spatial propositions. This paper develops a formal modeling language by which location uncertainty fields can be derived from manually sketched features. The language consists of low-level specification of critical probability isolines from which a surface can be uniquely derived, and high-level specification of features and predicates from which low-level isolines can be derived. In a case …


A Sketch-Based Language For Representing Uncertainty In The Locations Of Origin Of Herbarium Specimens, Barry Kronenfeld, Andrew Weeks Jan 2010

A Sketch-Based Language For Representing Uncertainty In The Locations Of Origin Of Herbarium Specimens, Barry Kronenfeld, Andrew Weeks

Faculty Research and Creative Activity

Uncertainty fields have been suggested as an appropriate model for retrospective georeferencing of herbarium specimens. Previous work has focused only on automated data capture methods, but techniques for manual data specification may be able to harness human spatial cognition skills to quickly interpret complex spatial propositions. This paper develops a formal modeling language by which location uncertainty fields can be derived from manually sketched features. The language consists of low-level specification of critical probability isolines from which a surface can be uniquely derived, and high-level specification of features and predicates from which low-level isolines can be derived. In a case …