Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Entire DC Network

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

Faculty Publications, Computer Science

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

William B. Andreopoulos

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Knowledge-Based Analysis Of Genomic Expression Data By Using Different Machine Learning Algorithms For The Purpose Of Diagnostic, Prognostic Or Therapeutic Application, Venkata Jagan Mohan Thodima Aug 2008

Knowledge-Based Analysis Of Genomic Expression Data By Using Different Machine Learning Algorithms For The Purpose Of Diagnostic, Prognostic Or Therapeutic Application, Venkata Jagan Mohan Thodima

Dissertations

With more and more biological information generated, the most pressing task of bioinformatics has become to analyze and interpret various types of data, including nucleotide and amino acid sequences, protein structures, gene expression profiling and so on. In this dissertation, we apply the data mining techniques of feature generation, feature selection, and feature integration with learning algorithms to tackle the problems of disease phenotype classification, clinical outcome and patient survival prediction from gene expression profiles.

We analyzed the effect of batch noise in microarray data on the performance of classification. Batchmatch, a batch adjusting algorithm based on double scaling method …


An Improved String Composition Method For Sequence Comparison, Guoquing Lu, Shunpu Zhang, Xiang Fang May 2008

An Improved String Composition Method For Sequence Comparison, Guoquing Lu, Shunpu Zhang, Xiang Fang

Biology Faculty Publications

Background: Historically, two categories of computational algorithms (alignment-based and alignment-free) have been applied to sequence comparison–one of the most fundamental issues in bioinformatics. Multiple sequence alignment, although dominantly used by biologists, possesses both fundamental as well as computational limitations. Consequently, alignment-free methods have been explored as important alternatives in estimating sequence similarity. Of the alignment-free methods, the string composition vector (CV) methods, which use the frequencies of nucleotide or amino acid strings to represent sequence information, show promising results in genome sequence comparison of prokaryotes. The existing CV-based methods, however, suffer certain statistical problems, thereby underestimating the amount of evolutionary …


Semantics And Services Enabled Problem Solving Environment For Trypanosoma Cruzi, Amit P. Sheth, Rick L. Tarleton, Mark Musen, Satya S. Sahoo, Prashant Doshi, Natasha Noy Jan 2008

Semantics And Services Enabled Problem Solving Environment For Trypanosoma Cruzi, Amit P. Sheth, Rick L. Tarleton, Mark Musen, Satya S. Sahoo, Prashant Doshi, Natasha Noy

Kno.e.sis Publications

No abstract provided.


On The Tradeoff Between Speedup And Energy Consumption In High Performance Computing – A Bioinformatics Case Study, Sachin Pawaskar, Hesham Ali Jan 2008

On The Tradeoff Between Speedup And Energy Consumption In High Performance Computing – A Bioinformatics Case Study, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing has been very useful to researchers in the Bioinformatics, Medical and related fields. The bioinformatics domain is rich in applications that require extracting useful information from very large and continuously growing sequence of databases. Automated techniques such as DNA sequencers, DNA microarrays & others are continually growing the dataset that is stored in large public databases such as GenBank and Protein DataBank. Most methods used for analyzing genetic/protein data have been found to be extremely computationally intensive, providing motivation for the use of powerful computers or systems with high throughput characteristics. In this paper, we provide a …


Mutual Information Without The Influence Of Phylogeny Or Entropy Dramatically Improves Residue Contact Prediction, Stanley Dunn, Lindi Wahl, Gregory Gloor Dec 2007

Mutual Information Without The Influence Of Phylogeny Or Entropy Dramatically Improves Residue Contact Prediction, Stanley Dunn, Lindi Wahl, Gregory Gloor

Stanley D Dunn

Motivation: Compensating alterations during the evolution of protein families give rise to coevolving positions that contain important structural and functional information. However, a high background composed of random noise and phylogenetic components interferes with the identification of coevolving positions.

Results: We have developed a rapid, simple and general method based on information theory that accurately estimates the level of background mutual information for each pair of positions in a given protein family. Removal of this background results in a metric, MIp, that correctly identifies substantially more coevolving positions in protein families than any existing method. A significant fraction of these …