Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Entire DC Network

Parsimony-Based Genetic Algorithm For Haplotype Resolution And Block Partitioning, Nadezhda A. Sazonova Dec 2007

Parsimony-Based Genetic Algorithm For Haplotype Resolution And Block Partitioning, Nadezhda A. Sazonova

Graduate Theses, Dissertations, and Problem Reports

This dissertation proposes a new algorithm for performing simultaneous haplotype resolution and block partitioning. The algorithm is based on genetic algorithm approach and the parsimonious principle. The multiloculs LD measure (Normalized Entropy Difference) is used as a block identification criterion. The proposed algorithm incorporates missing data is a part of the model and allows blocks of arbitrary length. In addition, the algorithm provides scores for the block boundaries which represent measures of strength of the boundaries at specific positions. The performance of the proposed algorithm was validated by running it on several publicly available data sets including the HapMap data …


Computational Intelligence Based Classifier Fusion Models For Biomedical Classification Applications, Xiujuan Chen Nov 2007

Computational Intelligence Based Classifier Fusion Models For Biomedical Classification Applications, Xiujuan Chen

Computer Science Dissertations

The generalization abilities of machine learning algorithms often depend on the algorithms’ initialization, parameter settings, training sets, or feature selections. For instance, SVM classifier performance largely relies on whether the selected kernel functions are suitable for real application data. To enhance the performance of individual classifiers, this dissertation proposes classifier fusion models using computational intelligence knowledge to combine different classifiers. The first fusion model called T1FFSVM combines multiple SVM classifiers through constructing a fuzzy logic system. T1FFSVM can be improved by tuning the fuzzy membership functions of linguistic variables using genetic algorithms. The improved model is called GFFSVM. To better …


Informative Snp Selection And Validation, Diana Mohan Babu Aug 2007

Informative Snp Selection And Validation, Diana Mohan Babu

Computer Science Theses

The search for genetic regions associated with complex diseases, such as cancer or Alzheimer's disease, is an important challenge that may lead to better diagnosis and treatment. The existence of millions of DNA variations, primarily single nucleotide polymorphisms (SNPs), may allow the fine dissection of such associations. However, studies seeking disease association are limited by the cost of genotyping SNPs. Therefore, it is essential to find a small subset of informative SNPs (tag SNPs) that may be used as good representatives of the rest of the SNPs. Several informative SNP selection methods have been developed. Our experiments compare favorably to …


A Domain-Specific Conceptual Query System, Xiuyun Shen Aug 2007

A Domain-Specific Conceptual Query System, Xiuyun Shen

Computer Science Theses

This thesis presents the architecture and implementation of a query system resulted from a domain-specific conceptual data modeling and querying methodology. The query system is built for a high level conceptual query language that supports dynamically user-defined domain-specific functions and application-specific functions. It is DBMS-independent and can be translated to SQL and OQL through a normal form. Currently, it has been implemented in neuroscience domain and can be applied to any other domain.


Parallelization Of The Maximum Likelihood Approach To Phylogenetic Inference, Janine Garnham Aug 2007

Parallelization Of The Maximum Likelihood Approach To Phylogenetic Inference, Janine Garnham

Theses

Phylogenetic inference refers to the reconstruction of evolutionary relationships among various species, usually presented in the form of a tree. DNA sequences are most often used to determine these relationships. The results of phylogenetic inference have many important applications, including protein function determination, drug discovery, disease tracking and forensics. There are several popular computational methods used for phylogenetic inference, among them distance-based (i.e. neighbor joining), maximum parsimony, maximum likelihood, and Bayesian methods. This thesis focuses on the maximum likelihood method, which is regarded as one of the most accurate methods, with its computational demand being the main hindrance to its …


Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu Jun 2007

Structure Pattern Analysis Using Term Rewriting And Clustering Algorithm, Xuezheng Fu

Computer Science Dissertations

Biological data is accumulated at a fast pace. However, raw data are generally difficult to understand and not useful unless we unlock the information hidden in the data. Knowledge/information can be extracted as the patterns or features buried within the data. Thus data mining, aims at uncovering underlying rules, relationships, and patterns in data, has emerged as one of the most exciting fields in computational science. In this dissertation, we develop efficient approaches to the structure pattern analysis of RNA and protein three dimensional structures. The major techniques used in this work include term rewriting and clustering algorithms. Firstly, a …


Bioinformatics Tool Development And Sequence Analysis Of Rosaceae Family Expressed Sequence Tags, Margaret Staton May 2007

Bioinformatics Tool Development And Sequence Analysis Of Rosaceae Family Expressed Sequence Tags, Margaret Staton

All Dissertations

BACKGROUND: An international community of researchers has generated a significant number of Expressed Sequence Tags (ESTs) for the Rosaceae, an economically important plant family that includes most temperate fruits such as apple, cherry, peach, and strawberry as well as other commercially valuable members. ESTs are fragments of expressed genes that can be used for gene discovery, developing markers for mapping and cultivar improvement via marker assisted selection. Efficient dissemination and integration of this data is best facilitated through a centralized and curated database with associated sequence analysis tools.

DESCRIPTION: The Genome Database for Rosaceae (GDR) was initiated to provide a …


Evolutionary Granular Kernel Machines, Bo Jin May 2007

Evolutionary Granular Kernel Machines, Bo Jin

Computer Science Dissertations

Kernel machines such as Support Vector Machines (SVMs) have been widely used in various data mining applications with good generalization properties. Performance of SVMs for solving nonlinear problems is highly affected by kernel functions. The complexity of SVMs training is mainly related to the size of a training dataset. How to design a powerful kernel, how to speed up SVMs training and how to train SVMs with millions of examples are still challenging problems in the SVMs research. For these important problems, powerful and flexible kernel trees called Evolutionary Granular Kernel Trees (EGKTs) are designed to incorporate prior domain knowledge. …


Software Tools For Comparing Genomic Sequence, Morel Henley Jan 2007

Software Tools For Comparing Genomic Sequence, Morel Henley

Master's Theses and Capstones

We describe three software tools related to research in comparative genomics, a growing research area that explores the variation within and between organisms. We developed a set of tools that explore sequence similarity and differences in genomes. Two of these tools are specifically aimed at examining DNA sequence data from two or more genomes: (1) The Magenta's OPUS tool compares genomic sequences to identify shared or unique segments between closely related species. This tool looks for functional similarities and differences in genomic data by classifying sequences into groups based on genomic categories: Orthologs, Paralogs, and Unique Sequence. (2) The DSNP …


Sequence Similarity Search Portal, Arokiya Louis Monica Joseph Jan 2007

Sequence Similarity Search Portal, Arokiya Louis Monica Joseph

Theses Digitization Project

This project brings the bioinformatics community a new development concept in which users can access all data and applications hosted in the main research centers as if they were installed on their local machines, providing seamless integration between disparate services. The project moves to integrate the sequence similarity searching at EBI and NCBI by using web services. It also intends to allow molecular biologists to save their searches and act as a log book for their sequence similarity searches. The project will also allow the biologists to share their sequences and results with others.


Computational Methods For The Objective Review Of Forensic Dna Testing Results, Jason R. Gilder Jan 2007

Computational Methods For The Objective Review Of Forensic Dna Testing Results, Jason R. Gilder

Browse all Theses and Dissertations

Since the advent of criminal investigations, investigators have sought a "gold standard" for the evaluation of forensic evidence. Currently, deoxyribonucleic acid (DNA) technology is the most reliable method of identification. Short Tandem Repeat (STR) DNA genotyping has the potential for impressive match statistics, but the methodology not infallible. The condition of an evidentiary sample and potential issues with the handling and testing of a sample can lead to significant issues with the interpretation of DNA testing results. Forensic DNA interpretation standards are determined by laboratory validation studies that often involve small sample sizes. This dissertation presents novel methodologies to address …