Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 12 of 12

Full-Text Articles in Physical Sciences and Mathematics

Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian May 2019

Model-Based Deep Autoencoders For Characterizing Discrete Data With Application To Genomic Data Analysis, Tian Tian

Dissertations

Deep learning techniques have achieved tremendous successes in a wide range of real applications in recent years. For dimension reduction, deep neural networks (DNNs) provide a natural choice to parameterize a non-linear transforming function that maps the original high dimensional data to a lower dimensional latent space. Autoencoder is a kind of DNNs used to learn efficient feature representation in an unsupervised manner. Deep autoencoder has been widely explored and applied to analysis of continuous data, while it is understudied for characterizing discrete data. This dissertation focuses on developing model-based deep autoencoders for modeling discrete data. A motivating example of …


Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki Apr 2017

Development And Evaluation Of Machine Learning Algorithms For Biomedical Applications, Turki Talal Turki

Dissertations

Gene network inference and drug response prediction are two important problems in computational biomedicine. The former helps scientists better understand the functional elements and regulatory circuits of cells. The latter helps a physician gain full understanding of the effective treatment on patients. Both problems have been widely studied, though current solutions are far from perfect. More research is needed to improve the accuracy of existing approaches.

This dissertation develops machine learning and data mining algorithms, and applies these algorithms to solve the two important biomedical problems. Specifically, to tackle the gene network inference problem, the dissertation proposes (i) new techniques …


Fast And Sensitive Genome-Hashing Software And Its Application In Using Ngs As A Detection Agent For Bacterial Presence In Oral Metagenomic Samples, Paul Michael Gontarz Jun 2015

Fast And Sensitive Genome-Hashing Software And Its Application In Using Ngs As A Detection Agent For Bacterial Presence In Oral Metagenomic Samples, Paul Michael Gontarz

Dissertations

Next generation sequencing has increased the throughput of sequenced DNA into the range of billions of nucleotides sequenced per day. With the increased speed of DNA sequencing and the short length of reads produced by next generation sequencers, a significant challenge has been created in quickly and accurately assembling the hundreds of millions of short reads created by modern sequencing instruments into their full genomic sequences. With the increase in throughput in next generation sequencing and the decrease in time and cost to perform DNA sequencing, novel applications for DNA sequencing are being considered. Among them is a methodology by …


Data Mining In Computational Proteomics And Genomics, Yang Song May 2015

Data Mining In Computational Proteomics And Genomics, Yang Song

Dissertations

This dissertation addresses data mining in bioinformatics by investigating two important problems, namely peak detection and structure matching. Peak detection is useful for biological pattern discovery while structure matching finds many applications in clustering and classification.

The first part of this dissertation focuses on elastic peak detection in 2D liquid chromatographic mass spectrometry (LC-MS) data used in proteomics research. These data can be modeled as a time series, in which the X-axis represents time points and the Y-axis represents intensity values. A peak occurs in a set of 2D LC-MS data when the sum of the intensity values in a …


Computational Methods For The Analysis Of Next Generation Sequencing Data, Wei Wang May 2014

Computational Methods For The Analysis Of Next Generation Sequencing Data, Wei Wang

Dissertations

Recently, next generation sequencing (NGS) technology has emerged as a powerful approach and dramatically transformed biomedical research in an unprecedented scale. NGS is expected to replace the traditional hybridization-based microarray technology because of its affordable cost and high digital resolution. Although NGS has significantly extended the ability to study the human genome and to better understand the biology of genomes, the new technology has required profound changes to the data analysis. There is a substantial need for computational methods that allow a convenient analysis of these overwhelmingly high-throughput data sets and address an increasing number of compelling biological questions which …


Gene Regulatory Network Analysis And Web-Based Application Development, Yi Yang Dec 2013

Gene Regulatory Network Analysis And Web-Based Application Development, Yi Yang

Dissertations

Microarray data is a valuable source for gene regulatory network analysis. Using earthworm microarray data analysis as an example, this dissertation demonstrates that a bioinformatics-guided reverse engineering approach can be applied to analyze time-series data to uncover the underlying molecular mechanism. My network reconstruction results reinforce previous findings that certain neurotransmitter pathways are the target of two chemicals - carbaryl and RDX. This study also concludes that perturbations to these pathways by sublethal concentrations of these two chemicals were temporary, and earthworms were capable of fully recovering. Moreover, differential networks (DNs) analysis indicates that many pathways other than those related …


Algorithms In Comparative Genomics, Satish Chikkagoudar Jan 2010

Algorithms In Comparative Genomics, Satish Chikkagoudar

Dissertations

The field of comparative genomics is abundant with problems of interest to computer scientists. In this thesis, the author presents solutions to three contemporary problems: obtaining better alignments for phylogeny reconstruction, identifying related RNA sequences in genomes, and ranking Single Nucleotide Polymorphisms (SNPs) in genome-wide association studies (GWAS).

Sequence alignment is a basic and widely used task in bioinformatics. Its applications include identifying protein structure, RNAs and transcription factor binding sites in genomes, and phylogeny reconstruction. Phylogenetic descriptions depend not only on the employed reconstruction technique, but also on the underlying sequence alignment. The author has studied and established a …


Bioinformatics Framework For Genotyping Microarray Data Analysis, Kai Zhang Jan 2006

Bioinformatics Framework For Genotyping Microarray Data Analysis, Kai Zhang

Dissertations

Functional genomics is a flourishing science enabled by recent technological breakthroughs in high-throughput instrumentation and microarray data analysis. Genotyping microarrays establish the genotypes of DNA sequences containing single nucleotide polymorphisms (SNPs), and can help biologists probe the functions of different genes and/or construct complex gene interaction networks. The enormous amount of data from these experiments makes it infeasible to perform manual processing to obtain accurate and reliable results in daily routines. Advanced algorithms as well as an integrated software toolkit are needed to help perform reliable and fast data analysis.

The author developed a MatlabTM based software package, called …


Rna Structure Analysis : Algorithms And Applications, Jianghui Liu Aug 2005

Rna Structure Analysis : Algorithms And Applications, Jianghui Liu

Dissertations

In this doctoral thesis, efficient algorithms for aligning RNA secondary structures and mining unknown RNA motifs are presented. As the major contribution, a structure alignment algorithm, which combines both primary and secondary structure information, can find the optimal alignment between two given structures where one of them could be either a pattern structure of a known motif or a real query structure and the other be a subject structure.

Motivated by widely used algorithms for RNA folding, the proposed algorithm decomposes an RNA secondary structure into a set of atomic structural components that can be further organized in a tree …


New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert May 2004

New Techniques For Improving Biological Data Quality Through Information Integration, Katherine Grace Herbert

Dissertations

As databases become more pervasive through the biological sciences, various data quality concerns are emerging. Biological databases tend to develop data quality issues regarding data legacy, data uniformity and data duplication. Due to the nature of this data, each of these problems is non-trivial and can cause many problems for the database. For biological data to be corrected and standardized, methods and frameworks must be developed to handle both structural and traditional data.

The BIG-AJAX framework has been developed for solving these problems through both data cleaning and data integration. This framework exploits declarative data cleaning and exploratory data mining …


An Approximate Search Engine For Structure, Huiyuan Shan May 2004

An Approximate Search Engine For Structure, Huiyuan Shan

Dissertations

As the size of structural databases grows, the need for efficiently searching these databases arises. Thanks to previous and ongoing research, searching by attribute-value and by text has become commonplace in these databases. However, searching by topological or physical structure, especially for large databases and especially for approximate matches, is still an art.

In this dissertation, efficient search techniques are presented for retrieving trees from a database that are similar to a given query tree. Rooted ordered labeled trees, rooted unordered labeled trees and free trees are considered. Ordered labeled trees are trees in which each node has a label …


Knowledge Discovery And Modeling In Genomic Databases, Michael M. Yin Aug 2002

Knowledge Discovery And Modeling In Genomic Databases, Michael M. Yin

Dissertations

This dissertation research is targeted toward developing effective and accurate methods for identifying gene structures in the genomes of high eukaryotes, such as vertebrate organisms. Several effective hidden Markov models (HMMs) are developed to represent the consensus and degeneracy features of the functional sites including protein-translation start sites, mRNA splicing junction donor and acceptor sites in vertebrate genes. The HMM system based on the developed models is fully trained using an expectation maximization (EM) algorithm and the system performance is evaluated using a 10-way cross-validation method. Experimental results show that the proposed HMM system achieves high sensitivity and specificity in …