Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Bioinformatics

Institution
Publication Year
Publication
Publication Type

Articles 91 - 120 of 123

Full-Text Articles in Physical Sciences and Mathematics

New Computational Approaches For Multiple Rna Alignment And Rna Search, Daniel Deblasio Jan 2009

New Computational Approaches For Multiple Rna Alignment And Rna Search, Daniel Deblasio

Electronic Theses and Dissertations

In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in O(n2) time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence …


Improving Remote Homology Detection Using A Sequence Property Approach, Gina Marie Cooper Jan 2009

Improving Remote Homology Detection Using A Sequence Property Approach, Gina Marie Cooper

Browse all Theses and Dissertations

Understanding the structure and function of proteins is a key part of understanding biological systems. Although proteins are complex biological macromolecules, they are made up of only 20 basic building blocks known as amino acids. The makeup of a protein can be described as a sequence of amino acids. One of the most important tools in modern bioinformatics is the ability to search for biological sequences (such as protein sequences) that are similar to a given query sequence. There are many tools for doing this (Altschul et al., 1990, Hobohm and Sander, 1995, Thomson et al., 1994, Karplus and Barrett, …


Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

Faculty Publications, Computer Science

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder Sep 2008

Word Sense Disambiguation In Biomedical Ontologies With Term Co-Occurrence Analysis And Document Clustering, Bill Andreopoulos, Dimitra Alexopoulou, Michael Schroeder

William B. Andreopoulos

With more and more genomes being sequenced, a lot of effort is devoted to their annotation with terms from controlled vocabularies such as the GeneOntology. Manual annotation based on relevant literature is tedious, but automation of this process is difficult. One particularly challenging problem is word sense disambiguation. Terms such as |development| can refer to developmental biology or to the more general sense. Here, we present two approaches to address this problem by using term co-occurrences and document clustering. To evaluate our method we defined a corpus of 331 documents on development and developmental biology. Term co-occurrence analysis achieves an …


Semantics And Services Enabled Problem Solving Environment For Trypanosoma Cruzi, Amit P. Sheth, Rick L. Tarleton, Mark Musen, Satya S. Sahoo, Prashant Doshi, Natasha Noy Jan 2008

Semantics And Services Enabled Problem Solving Environment For Trypanosoma Cruzi, Amit P. Sheth, Rick L. Tarleton, Mark Musen, Satya S. Sahoo, Prashant Doshi, Natasha Noy

Kno.e.sis Publications

No abstract provided.


On The Tradeoff Between Speedup And Energy Consumption In High Performance Computing – A Bioinformatics Case Study, Sachin Pawaskar, Hesham Ali Jan 2008

On The Tradeoff Between Speedup And Energy Consumption In High Performance Computing – A Bioinformatics Case Study, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing has been very useful to researchers in the Bioinformatics, Medical and related fields. The bioinformatics domain is rich in applications that require extracting useful information from very large and continuously growing sequence of databases. Automated techniques such as DNA sequencers, DNA microarrays & others are continually growing the dataset that is stored in large public databases such as GenBank and Protein DataBank. Most methods used for analyzing genetic/protein data have been found to be extremely computationally intensive, providing motivation for the use of powerful computers or systems with high throughput characteristics. In this paper, we provide a …


Graphics Processor Based Implementation Of Bioinformatics Codes, Andrew Bellenir, Christian Trefftz, Greg Wolffe Jan 2008

Graphics Processor Based Implementation Of Bioinformatics Codes, Andrew Bellenir, Christian Trefftz, Greg Wolffe

Student Summer Scholars Manuscripts

We created a powerful computing platform based on video cards with the goal of accelerating the performance of bioinformatics codes. To satisfy the demands of the video gaming industry, modern graphics processing units (GPUs) have become very advanced computational devices, using a large set of stream processors to render multiple pixels in parallel. Recently, computer scientists have taken interest in a GPU's ability to execute a single instruction on multiple data (SIMD computation) for general applications, as opposed to graphics processing only. This is known as general purpose computation on a graphics processing unit, or GPGPU.

Our project was comprised …


The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch Jan 2008

The Impact Of Directionality In Predications On Text Mining, Gondy Leroy, Marcelo Fiszman, Thomas C. Rindflesch

CGU Faculty Publications and Research

The number of publications in biomedicine is increasing enormously each year. To help researchers digest the information in these documents, text mining tools are being developed that present co-occurrence relations between concepts. Statistical measures are used to mine interesting subsets of relations. We demonstrate how directionality of these relations affects interestingness. Support and confidence, simple data mining statistics, are used as proxies for interestingness metrics. We first built a test bed of 126,404 directional relations extracted from biomedical abstracts, which we represent as graphs containing a central starting concept and 2 rings of associated relations. We manipulated directionality in four …


Algorithmic Techniques Employed In The Isolation Of Codon Usage Biases In Prokaryotic Genomes, Douglas W. Raiford Iii Jan 2008

Algorithmic Techniques Employed In The Isolation Of Codon Usage Biases In Prokaryotic Genomes, Douglas W. Raiford Iii

Browse all Theses and Dissertations

While genomic sequencing projects are an abundant source of information for biological studies ranging from the molecular to the ecological in scale, much of the information present may yet be hidden from casual analysis. One such information domain, trends in codon usage, can provide a wealth of information about an organism's genes and their expression. Degeneracy in the genetic code allows more than one triplet codon to code for the same amino acid, and usage of these codons is often biased such that one or more of these synonymous codons is preferred. Isolation of translational efficiency bias can have important …


Medical Language Processing For Patient Diagnosis Using Text Classification And Negation Labelling, Brian Mac Namee, John D. Kelleher, Sarah Jane Delany Jan 2008

Medical Language Processing For Patient Diagnosis Using Text Classification And Negation Labelling, Brian Mac Namee, John D. Kelleher, Sarah Jane Delany

Conference papers

This paper describes the approach of the DIT AIGroup to the i2b2 Obesity Challenge to build a system to diagnose obesity and related co-morbidities from narrative, unstructured patient records. Based on experimental results a system was developed which used knowledge-light text classification using decision trees, and negation labelling.


Computational Intelligence Based Classifier Fusion Models For Biomedical Classification Applications, Xiujuan Chen Nov 2007

Computational Intelligence Based Classifier Fusion Models For Biomedical Classification Applications, Xiujuan Chen

Computer Science Dissertations

The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better …


Informative Snp Selection And Validation, Diana Mohan Babu Aug 2007

Informative Snp Selection And Validation, Diana Mohan Babu

Computer Science Theses

The search for genetic regions associated with complex diseases, such as cancer or Alzheimer's disease, is an important challenge that may lead to better diagnosis and treatment. The existence of millions of DNA variations, primarily single nucleotide polymorphisms (SNPs), may allow the fine dissection of such associations. However, studies seeking disease association are limited by the cost of genotyping SNPs. Therefore, it is essential to find a small subset of informative SNPs (tag SNPs) that may be used as good representatives of the rest of the SNPs. Several informative SNP selection methods have been developed. Our experiments compare favorably to …


A Domain-Specific Conceptual Query System, Xiuyun Shen Aug 2007

A Domain-Specific Conceptual Query System, Xiuyun Shen

Computer Science Theses

This thesis presents the architecture and implementation of a query system resulted from a domain-specific conceptual data modeling and querying methodology. The query system is built for a high level conceptual query language that supports dynamically user-defined domain-specific functions and application-specific functions. It is DBMS-independent and can be translated to SQL and OQL through a normal form. Currently, it has been implemented in neuroscience domain and can be applied to any other domain.


Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu Jun 2007

Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu

Computer Science Dissertations

Biological data is accumulated at a fast pace. However, raw data are generally difficult to understand and not useful unless we unlock the information hidden in the data. Knowledge/information can be extracted as the patterns or features buried within the data. Thus data mining, aims at uncovering underlying rules, relationships, and patterns in data, has emerged as one of the most exciting fields in computational science. In this dissertation, we develop efficient approaches to the structure pattern analysis of RNA and protein three dimensional structures. The major techniques used in this work include term rewriting and clustering algorithms. Firstly, a …


Evolutionary Granular Kernel Machines, Bo Jin May 2007

Evolutionary Granular Kernel Machines, Bo Jin

Computer Science Dissertations

Kernel machines such as Support Vector Machines (SVMs) have been widely used in various data mining applications with good generalization properties. Performance of SVMs for solving nonlinear problems is highly affected by kernel functions. The complexity of SVMs training is mainly related to the size of a training dataset. How to design a powerful kernel, how to speed up SVMs training and how to train SVMs with millions of examples are still challenging problems in the SVMs research. For these important problems, powerful and flexible kernel trees called Evolutionary Granular Kernel Trees (EGKTs) are designed to incorporate prior domain knowledge. …


Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang Jan 2007

Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang

Faculty Publications, Computer Science

Clustering protein-protein interaction networks (PINs) helps to identify complexes that guide the cell machinery. Clustering algorithms often create a flat clustering, without considering the layered structure of PINs. We propose the MULIC clustering algorithm that produces layered clusters. We applied MULIC to five PINs. Clusters correlate with known MIPS protein complexes. For example, a cluster of 79 proteins overlaps with a known complex of 88 proteins. Proteins in top cluster layers tend to be more representative of complexes than proteins in bottom layers. Lab work on finding unknown complexes or determining drug effects can be guided by top layer proteins.


Sequence Similarity Search Portal, Arokiya Louis Monica Joseph Jan 2007

Sequence Similarity Search Portal, Arokiya Louis Monica Joseph

Theses Digitization Project

This project brings the bioinformatics community a new development concept in which users can access all data and applications hosted in the main research centers as if they were installed on their local machines, providing seamless integration between disparate services. The project moves to integrate the sequence similarity searching at EBI and NCBI by using web services. It also intends to allow molecular biologists to save their searches and act as a log book for their sequence similarity searches. The project will also allow the biologists to share their sequences and results with others.


Computational Methods For The Objective Review Of Forensic Dna Testing Results, Jason R. Gilder Jan 2007

Computational Methods For The Objective Review Of Forensic Dna Testing Results, Jason R. Gilder

Browse all Theses and Dissertations

Since the advent of criminal investigations, investigators have sought a "gold standard" for the evaluation of forensic evidence. Currently, deoxyribonucleic acid (DNA) technology is the most reliable method of identification. Short Tandem Repeat (STR) DNA genotyping has the potential for impressive match statistics, but the methodology not infallible. The condition of an evidentiary sample and potential issues with the handling and testing of a sample can lead to significant issues with the interpretation of DNA testing results. Forensic DNA interpretation standards are determined by laboratory validation studies that often involve small sample sizes. This dissertation presents novel methodologies to address …


Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang Dec 2006

Finding Molecular Complexes Through Multiple Layer Clustering Of Protein Interaction Networks, Bill Andreopoulos, Aijun An, Xiangji Huang, Xiaogang Wang

William B. Andreopoulos

Clustering protein-protein interaction networks (PINs) helps to identify complexes that guide the cell machinery. Clustering algorithms often create a flat clustering, without considering the layered structure of PINs. We propose the MULIC clustering algorithm that produces layered clusters. We applied MULIC to five PINs. Clusters correlate with known MIPS protein complexes. For example, a cluster of 79 proteins overlaps with a known complex of 88 proteins. Proteins in top cluster layers tend to be more representative of complexes than proteins in bottom layers. Lab work on finding unknown complexes or determining drug effects can be guided by top layer proteins.


Fuzzy-Granular Based Data Mining For Effective Decision Support In Biomedical Applications, Yuanchen He Dec 2006

Fuzzy-Granular Based Data Mining For Effective Decision Support In Biomedical Applications, Yuanchen He

Computer Science Dissertations

Due to complexity of biomedical problems, adaptive and intelligent knowledge discovery and data mining systems are highly needed to help humans to understand the inherent mechanism of diseases. For biomedical classification problems, typically it is impossible to build a perfect classifier with 100% prediction accuracy. Hence a more realistic target is to build an effective Decision Support System (DSS). In this dissertation, a novel adaptive Fuzzy Association Rules (FARs) mining algorithm, named FARM-DS, is proposed to build such a DSS for binary classification problems in the biomedical domain. Empirical studies show that FARM-DS is competitive to state-of-the-art classifiers in terms …


Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang Jun 2006

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang

Faculty Publications, Computer Science

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.


Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang Jun 2006

Bi-Level Clustering Of Mixed Categorical And Numerical Biomedical Data, Bill Andreopoulos, Aijun An, Xiaogang Wang

William B. Andreopoulos

Biomedical data sets often have mixed categorical and numerical types, where the former represent semantic information on the objects and the latter represent experimental results. We present the BILCOM algorithm for |Bi-Level Clustering of Mixed categorical and numerical data types|. BILCOM performs a pseudo-Bayesian process, where the prior is categorical clustering. BILCOM partitions biomedical data sets of mixed types, such as hepatitis, thyroid disease and yeast gene expression data with Gene Ontology annotations, more accurately than if using one type alone.


Granular Support Vector Machines Based On Granular Computing, Soft Computing And Statistical Learning, Yuchun Tang May 2006

Granular Support Vector Machines Based On Granular Computing, Soft Computing And Statistical Learning, Yuchun Tang

Computer Science Dissertations

With emergence of biomedical informatics, Web intelligence, and E-business, new challenges are coming for knowledge discovery and data mining modeling problems. In this dissertation work, a framework named Granular Support Vector Machines (GSVM) is proposed to systematically and formally combine statistical learning theory, granular computing theory and soft computing theory to address challenging predictive data modeling problems effectively and/or efficiently, with specific focus on binary classification problems. In general, GSVM works in 3 steps. Step 1 is granulation to build a sequence of information granules from the original dataset or from the original feature space. Step 2 is modeling Support …


Bioinformatics Framework For Genotyping Microarray Data Analysis, Kai Zhang Jan 2006

Bioinformatics Framework For Genotyping Microarray Data Analysis, Kai Zhang

Dissertations

Functional genomics is a flourishing science enabled by recent technological breakthroughs in high-throughput instrumentation and microarray data analysis. Genotyping microarrays establish the genotypes of DNA sequences containing single nucleotide polymorphisms (SNPs), and can help biologists probe the functions of different genes and/or construct complex gene interaction networks. The enormous amount of data from these experiments makes it infeasible to perform manual processing to obtain accurate and reliable results in daily routines. Advanced algorithms as well as an integrated software toolkit are needed to help perform reliable and fast data analysis.

The author developed a MatlabTM based software package, called …


Rna Structure Analysis : Algorithms And Applications, Jianghui Liu Aug 2005

Rna Structure Analysis : Algorithms And Applications, Jianghui Liu

Dissertations

In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure.

Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree …


Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang Aug 2005

Translation Initiation Sites Prediction With Mixture Gaussian Models In Human Cdna Sequences, G. Li, Tze-Yun Leong, Louxin Zhang

Research Collection School Of Computing and Information Systems

Translation initiation sites (TISs) are important signals in cDNA sequences. Many research efforts have tried to predict TISs in cDNA sequences. In this paper, we propose to use mixture Gaussian models for TIS prediction. Using both local features and some features generated from global measures, the proposed method predicts TISs with a sensitivity of 98 percent and a specificity of 93.6 percent. Our method outperforms many other existing methods in sensitivity while keeping specificity high. We attribute the improvement in sensitivity to the nature of the global features and the mixture Gaussian models. © 2005 IEEE.


A Brief History Of Bioperl, Colin Crossman, Arti K. Rai Jan 2005

A Brief History Of Bioperl, Colin Crossman, Arti K. Rai

Faculty Scholarship

Large-scale open-source projects face a litany of pitfalls and difficulties. Problems of contribution quality, credit for contributions, project coordination, funding, and mission-creep are ever-present. Of these, long-term funding and project coordination can interact to form a particularly difficult problem for open-source projects in an academic environment.

BioPerl was chosen as an example of a successful academic open-source project. Several of the roadblocks and hurdles encountered and overcome in the development of BioPerl are examined through the telling of the history of the project. Along the way, key points of open-source law are explained, such as license choice and copyright.

The …


New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert May 2004

New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert

Dissertations

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardized, methods and frameworks must be developed to handle both structural and traditional data.

The BIG-AJAX framework has been developed for solving these problems through both data cleaning and data integration. This framework exploits declarative data cleaning and exploratory data mining …


An Approximate Search Engine For Structure, Huiyuan Shan May 2004

An Approximate Search Engine For Structure, Huiyuan Shan

Dissertations

As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art.

In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label …


A Hidden Markov Model Capable Of Predicting And Discriminating Β-Barrel Outer Membrane Proteins, Pantelis G. Bagos, Theodore D. Liakopoulos, Ioannis C. Spyropoulos, Stavros J. Hamodrakas Jan 2004

A Hidden Markov Model Capable Of Predicting And Discriminating Β-Barrel Outer Membrane Proteins, Pantelis G. Bagos, Theodore D. Liakopoulos, Ioannis C. Spyropoulos, Stavros J. Hamodrakas

Pantelis Bagos

BACKGROUND: Integral membrane proteins constitute about 20-30% of all proteins in the fully sequenced genomes. They come in two structural classes, the alpha-helical and the beta-barrel membrane proteins, demonstrating different physicochemical characteristics, structure and localization. While transmembrane segment prediction for the alpha-helical integral membrane proteins appears to be an easy task nowadays, the same is much more difficult for the beta-barrel membrane proteins. We developed a method, based on a Hidden Markov Model, capable of predicting the transmembrane beta-strands of the outer membrane proteins of gram-negative bacteria, and discriminating those from water-soluble proteins in large datasets. The model is trained …