Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 121 - 150 of 189

Full-Text Articles in Physical Sciences and Mathematics

Random Forests Based Rule Learning And Feature Elimination, Sheng Liu Jan 2014

Random Forests Based Rule Learning And Feature Elimination, Sheng Liu

Electronic Theses and Dissertations

Much research combines data from multiple sources in an effort to understand the underlying problems. It is important to find and interpret the most important information from these sources. Thus it will be beneficial to have an effective algorithm that can simultaneously extract decision rules and select critical features for good interpretation while preserving the prediction performance. We propose an efficient approach, combining rule extraction and feature elimination, based on 1-norm regularized random forests. This approach simultaneously extracts a small number of rules generated by random forests and selects important features. To evaluate this approach, we have applied it to …


Measures For The Degree Of Overlap Of Gene Signatures And Applications To Tcga, Shuangge Ma Dec 2013

Measures For The Degree Of Overlap Of Gene Signatures And Applications To Tcga, Shuangge Ma

Shuangge Ma

For cancer and many other complex diseases, a large number of gene signatures have been generated. In this study, we use cancer as an example and note that other diseases can be analyzed in a similar manner. For signatures generated in multiple studies on the same cancer type/outcome, and for signatures on different cancer types, it is of interest to evaluate their degree of overlap. Many of the existing studies simply count the number (or percentage) of overlapped genes shared by two signatures. Such an approach has serious limitations. In this study, as a demonstrating example, we consider cancer prognosis …


Gene Regulatory Network Analysis And Web-Based Application Development, Yi Yang Dec 2013

Gene Regulatory Network Analysis And Web-Based Application Development, Yi Yang

Dissertations

Microarray data is a valuable source for gene regulatory network analysis. Using earthworm microarray data analysis as an example, this dissertation demonstrates that a bioinformatics-guided reverse engineering approach can be applied to analyze time-series data to uncover the underlying molecular mechanism. My network reconstruction results reinforce previous findings that certain neurotransmitter pathways are the target of two chemicals - carbaryl and RDX. This study also concludes that perturbations to these pathways by sublethal concentrations of these two chemicals were temporary, and earthworms were capable of fully recovering. Moreover, differential networks (DNs) analysis indicates that many pathways other than those related …


Integrative Analysis Of High-Throughput Cancer Studies With Contrasted Penalization, Shuangge Ma Oct 2013

Integrative Analysis Of High-Throughput Cancer Studies With Contrasted Penalization, Shuangge Ma

Shuangge Ma

In cancer studies with high-throughput genetic and genomic measurements, integrative analysis provides a way to effectively pool and analyze heterogeneous raw data from multiple independent studies and outperforms ``classic" meta-analysis and single-dataset analysis. When marker selection is of interest, the genetic basis of multiple datasets can be described using the homogeneity model or the heterogeneity model. In this study, we consider marker selection under the heterogeneity model, which includes the homogeneity model as a special case and can be more flexible. Penalization methods have been developed in the literature for marker selection. This study advances from the published ones by …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Jul 2013

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

George K. Thiruvathukal

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


Energy Awareness And Scheduling In Mobile Devices And High End Computing, Sachin S. Pawaskaw Jul 2013

Energy Awareness And Scheduling In Mobile Devices And High End Computing, Sachin S. Pawaskaw

Student Work

In the context of the big picture as energy demands rise due to growing economies and growing populations, there will be greater emphasis on sustainable supply, conservation, and efficient usage of this vital resource. Even at a smaller level, the need for minimizing energy consumption continues to be compelling in embedded, mobile, and server systems such as handheld devices, robots, spaceships, laptops, cluster servers, sensors, etc. This is due to the direct impact of constrained energy sources such as battery size and weight, as well as cooling expenses in cluster-based systems to reduce heat dissipation. Energy management therefore plays a …


A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer Apr 2013

A Polyglot Approach To Bioinformatics Data Integration: Phylogenetic Analysis Of Hiv-1, Steven Reisman, Catherine Putonti, George K. Thiruvathukal, Konstantin Läufer

Computer Science: Faculty Publications and Other Works

RNA-interference has potential therapeutic use against HIV-1 by targeting highly-functional mRNA sequences that contribute to the virulence of the virus. Empirical work has shown that within cell lines, all of the HIV-1 genes are affected by RNAi-induced gene silencing. While promising, inherent in this treatment is the fact that RNAi sequences must be highly specific. HIV, however, mutates rapidly, leading to the evolution of viral escape mutants. In fact, such strains are under strong selection to include mutations within the targeted region, evading the RNAi therapy and thus increasing the virus’ fitness in the host. Taking a phylogenetic approach, we …


Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese Apr 2013

Detecting And Correcting Batch Effects In High-Throughput Genomic Experiments, Sarah Reese

Theses and Dissertations

Batch effects are due to probe-specific systematic variation between groups of samples (batches) resulting from experimental features that are not of biological interest. Principal components analysis (PCA) is commonly used as a visual tool to determine whether batch effects exist after applying a global normalization method. However, PCA yields linear combinations of the variables that contribute maximum variance and thus will not necessarily detect batch effects if they are not the largest source of variability in the data. We present an extension of principal components analysis to quantify the existence of batch effects, called guided PCA (gPCA). We describe a …


A Novel Algorithm For Validating Peptide Identification From A Shotgun Proteomics Search Engine, Ling Jian, Xinnan Niu, Zhonghang Xia, Parimal Samir, Chiranthani Sumanasekera, Zheng Mu, Jennifer L. Jennings, Kristen L. Hoek, Tara Allos, Leigh M. Howard, Kathryn M. Edwards, P. Anthony Weil, Andrew J. Link Feb 2013

A Novel Algorithm For Validating Peptide Identification From A Shotgun Proteomics Search Engine, Ling Jian, Xinnan Niu, Zhonghang Xia, Parimal Samir, Chiranthani Sumanasekera, Zheng Mu, Jennifer L. Jennings, Kristen L. Hoek, Tara Allos, Leigh M. Howard, Kathryn M. Edwards, P. Anthony Weil, Andrew J. Link

Chemistry Faculty Research

Liquid chromatography coupled with tandem mass spectrometry (LC–MS/MS) has revolutionized the proteomics analysis of complexes, cells, and tissues. In a typical proteomic analysis, the tandem mass spectra from a LC–MS/MS experiment are assigned to a peptide by a search engine that compares the experimental MS/MS peptide data to theoretical peptide sequences in a protein database. The peptide spectra matches are then used to infer a list of identified proteins in the original sample. However, the search engines often fail to distinguish between correct and incorrect peptides assignments. In this study, we designed and implemented a novel algorithm called De-Noise to …


Utilizing Nmr Spectroscopy And Molecular Docking As Tools For The Structural Determination And Functional Annotation Of Proteins, Jaime Stark Feb 2013

Utilizing Nmr Spectroscopy And Molecular Docking As Tools For The Structural Determination And Functional Annotation Of Proteins, Jaime Stark

Department of Chemistry: Dissertations, Theses, and Student Research

With the completion of the Human Genome Project in 2001 and the subsequent explosion of organisms with sequenced genomes, we are now aware of nearly 28 million proteins. Determining the role of each of these proteins is essential to our understanding of biology and the development of medical advances. Unfortunately, the experimental approaches to determine protein function are too slow to investigate every protein. Bioinformatics approaches, such as sequence and structure homology, have helped to annotate the functions of many similar proteins. However, despite these computational approaches, approximately 40% of proteins still have no known function. Alleviating this deficit will …


Computational Methods For Comparative Non-Coding Rna Analysis: From Structural Motif Identification To Genome-Wide Functional Classification, Cuncong Zhong Jan 2013

Computational Methods For Comparative Non-Coding Rna Analysis: From Structural Motif Identification To Genome-Wide Functional Classification, Cuncong Zhong

Electronic Theses and Dissertations

Recent advances in biological research point out that many ribonucleic acids (RNAs) are transcribed from the genome to perform a variety of cellular functions, rather than merely acting as information carriers for protein synthesis. These RNAs are usually referred to as the non-coding RNAs (ncRNAs). The versatile regulation mechanisms and functionalities of the ncRNAs contribute to the amazing complexity of the biological system. The ncRNAs perform their biological functions by folding into specific structures. In this case, the comparative study of the ncRNA structures is key to the inference of their molecular and cellular functions. We are especially interested in …


Disulfide By Design 2.0: A Web-Based Tool For Disulfide Engineering In Proteins, Douglas B. Craig, Alan A. Dombkowski Jan 2013

Disulfide By Design 2.0: A Web-Based Tool For Disulfide Engineering In Proteins, Douglas B. Craig, Alan A. Dombkowski

Wayne State University Associated BioMed Central Scholarship

Abstract

Background

Disulfide engineering is an important biotechnological tool that has advanced a wide range of research. The introduction of novel disulfide bonds into proteins has been used extensively to improve protein stability, modify functional characteristics, and to assist in the study of protein dynamics. Successful use of this technology is greatly enhanced by software that can predict pairs of residues that will likely form a disulfide bond if mutated to cysteines.

Results

We had previously developed and distributed software for this purpose: Disulfide by Design (DbD). The original DbD program has been widely used; however, it has a number …


Computational Approaches To Anti-Toxin Therapies And Biomarker Identification, Rebecca Jane Swett Jan 2013

Computational Approaches To Anti-Toxin Therapies And Biomarker Identification, Rebecca Jane Swett

Wayne State University Dissertations

This work describes the fundamental study of two bacterial toxins with computational methods, the rational design of a potent inhibitor using molecular dynamics, as well as the development of two bioinformatic methods for mining genomic data.

Clostridium difficile is an opportunistic bacillus which produces two large glucosylating toxins. These toxins, TcdA and TcdB cause severe intestinal damage. As Clostridium difficile harbors considerable antibiotic resistance, one treatment strategy is to prevent the tissue damage that the toxins cause. The catalytic glucosyltransferase domain of TcdA and TcdB was studied using molecular dynamics in the presence of both a protein-protein binding partner and …


System Designs To Perform Bioinformatics Sequence Alignment, Çağlar Yilmaz, Mustafa Gök Jan 2013

System Designs To Perform Bioinformatics Sequence Alignment, Çağlar Yilmaz, Mustafa Gök

Turkish Journal of Electrical Engineering and Computer Sciences

The emerging field of bioinformatics uses computing as a tool to understand biology. Biological data of organisms (nucleotide and amino acid sequences) are stored in databases that contain billions of records. In order to process the vast amount of data in a reasonable time, high-performance analysis systems are developed. The main operation shared by the analysis tools is the search for matching patterns between sequences of data (sequence alignment). In this paper, we present 2 systems that can perform pairwise and multiple sequence alignment operations. Through the optimized design methods, proposed systems achieve up to 3.6 times more performance compared …


An Automated Signal Alignment Algorithm Based On Dynamic Time Warping For Capillary Electrophoresis Data, Fethullah Karabi̇ber Jan 2013

An Automated Signal Alignment Algorithm Based On Dynamic Time Warping For Capillary Electrophoresis Data, Fethullah Karabi̇ber

Turkish Journal of Electrical Engineering and Computer Sciences

Correcting the retention time variation and measuring the similarity of time series is one of the most popular challenges in the area of analyzing capillary electrophoresis (CE) data. In this study, an automated signal alignment method is proposed by modifying the dynamic time warping (DTW) approach to align the time-series data. Preprocessing tools and further optimizations were developed to increase the performance of the algorithm. As a demonstrative case study, the developed algorithm is applied to the analysis of CE data from a selective 2'-hydroxyl acylation analyzed by primer extension (SHAPE) evaluation of the RNA secondary structure. The time-shift problem …


Error Correction In Next Generation Dna Sequencing Data, Michael Z. Molnar Dec 2012

Error Correction In Next Generation Dna Sequencing Data, Michael Z. Molnar

Electronic Thesis and Dissertation Repository

Motivation: High throughput Next Generation Sequencing (NGS) technologies can sequence the genome of a species quickly and cheaply. Errors that are introduced by NGS technologies limit the full potential of the applications that rely on their data. Current techniques used to correct these errors are not sufficient, and a more efficient and accurate program is needed to correct errors.

Results: We have designed and implemented RACER (Rapid Accurate Correction of Errors in Reads), an error correction program that targets the Illumina genome sequencer, which is currently the dominant NGS technology. RACER combines advanced data structures with an intricate analysis of …


Utilization Of Probabilistic Models In Short Read Assembly From Second-Generation Sequencing, Matthew W. Segar May 2012

Utilization Of Probabilistic Models In Short Read Assembly From Second-Generation Sequencing, Matthew W. Segar

Honors Theses

With the advent of cheaper and faster DNA sequencing technologies, assembly methods have greatly changed. Instead of outputting reads that are thousands of base pairs long, new sequencers parallelize the task by producing read lengths between 35 and 400 base pairs. Reconstructing an organism’s genome from these millions of reads is a computationally expensive task. Our algorithm solves this problem by organizing and indexing the reads using n-grams, which are short, fixed-length DNA sequences of length n. These n-grams are used to efficiently locate putative read joins, thereby eliminating the need to perform an exhaustive search over all possible read …


On The Hardness Of Counting And Sampling Center Strings, Christina Boucher, Mohamed Omar Jan 2012

On The Hardness Of Counting And Sampling Center Strings, Christina Boucher, Mohamed Omar

All HMC Faculty Publications and Research

Given a set S of n strings, each of length ℓ, and a nonnegative value d, we define a center string as a string of length ` that has Hamming distance at most d from each string in S. The #CLOSEST STRING problem aims to determine the number of center strings for a given set of strings S and input parameters n, ℓ, and d. We show #CLOSEST STRING is impossible to solve exactly or even approximately in polynomial time, and that restricting #CLOSEST STRING so that any one of the parameters n, ℓ, or d is fixed leads to …


Data-Intensive Computing For Bioinformatics Using Virtualization Technologies And Hpc Infrastructures, Pengfei Xuan Dec 2011

Data-Intensive Computing For Bioinformatics Using Virtualization Technologies And Hpc Infrastructures, Pengfei Xuan

All Theses

The bioinformatics applications often involve many computational components and massive data sets, which are very difficult to be deployed on a single computing machine. In this thesis, we designed a data-intensive computing platform for bioinformatics applications using virtualization technologies and high performance computing (HPC) infrastructures with the concept of multi-tier architecture, which can seamlessly integrate the web user interface (presentation tier), scientific workflow (logic tier) and computing infrastructure (data/computing tier). We demonstrated our platform on two bioinformatics projects. First, we redesigned and deployed the cotton marker database (CMD) (http://www.cottonmarker.org), a centralized web portal in the cotton research community, using the …


Graph Kernels And Applications In Bioinformatics, Marco Alvarez Vega May 2011

Graph Kernels And Applications In Bioinformatics, Marco Alvarez Vega

All Graduate Theses and Dissertations, Spring 1920 to Summer 2023

Nowadays, machine learning techniques are widely used for extracting knowledge from data in a large number of bioinformatics problems. It turns out that in many of such problems, data observations can be naturally represented by discrete structures such as graphs, networks, trees, or sequences. For example, a protein can be seen as a cloud of interconnected atoms lying on a 3-dimensional space. The focus of this dissertation is on the development and application of machine learning techniques to bioinformatics problems wherein the data can be represented by graphs. In particular, we focus our attention on proteins, which are essential elements …


Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong Jan 2011

Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong

Computer Science Faculty Publications

Background: One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing extensive task. In addition, the advancement in DNA/RNA and Protein sequencing techniques has created a vast amount of sequences to be analyzed that exceeding the capability of traditional computing models. Therefore, an effective parallel multiple sequence alignment model capable of resolving these issues is in a great demand.

Results: We design O(1) run-time solutions …


A Noise Reducing Sampling Approach For Uncovering Critical Properties In Large Scale Biological Networks, Karthik Duraisamy, Kathryn Dempsey Cooper, Hesham Ali, Sanjukta Bhowmick Jan 2011

A Noise Reducing Sampling Approach For Uncovering Critical Properties In Large Scale Biological Networks, Karthik Duraisamy, Kathryn Dempsey Cooper, Hesham Ali, Sanjukta Bhowmick

Interdisciplinary Informatics Faculty Proceedings & Presentations

A correlation network is a graph-based representation of relationships among genes or gene products, such as proteins. The advent of high-throughput bioinformatics has resulted in the generation of volumes of data that require sophisticated in silico models, such as the correlation network, for in-depth analysis. Each element in our network represents expression levels of multiple samples of one gene and an edge connecting two nodes reflects the correlation level between the two corresponding genes in the network according to the Pearson correlation coefficient. Biological networks made in this manner are generally found to adhere to a scale-free structural nature, that …


The Maximum Clique Problem: Algorithms, Applications, And Implementations, John David Eblen Aug 2010

The Maximum Clique Problem: Algorithms, Applications, And Implementations, John David Eblen

Doctoral Dissertations

Computationally hard problems are routinely encountered during the course of solving practical problems. This is commonly dealt with by settling for less than optimal solutions, through the use of heuristics or approximation algorithms. This dissertation examines the alternate possibility of solving such problems exactly, through a detailed study of one particular problem, the maximum clique problem. It discusses algorithms, implementations, and the application of maximum clique results to real-world problems. First, the theoretical roots of the algorithmic method employed are discussed. Then a practical approach is described, which separates out important algorithmic decisions so that the algorithm can be easily …


A Dynamic Energy-Aware Model For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali Jul 2010

A Dynamic Energy-Aware Model For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing (HPC) resources are housed in large datacenters, which consume huge amounts of energy and are quickly demanding attention from businesses as they result in high operating costs. On the other hand HPC environments have been very useful to researchers in many emerging areas in life sciences such as Bioinformatics and Medical Informatics. In this paper, we provide a dynamic model for energy aware scheduling (EAS) in a HPC environment; we use a widely used bioinformatics tool named BLAT (BLAST-like alignment tool) running in a HPC environment as our case study. Our proposed EAS model incorporates 2-Phases: an …


The Gel Documentation System: A Cornerstone To The Implementation Of The Introduction To Biotechnology And Introduction To Bioinformatics Cross-Disciplinary Course Series, Marcy Kelly, Gregory Lampard, Constance Knapp Jun 2010

The Gel Documentation System: A Cornerstone To The Implementation Of The Introduction To Biotechnology And Introduction To Bioinformatics Cross-Disciplinary Course Series, Marcy Kelly, Gregory Lampard, Constance Knapp

Cornerstone 3 Reports : Interdisciplinary Informatics

No abstract provided.


Algorithms In Comparative Genomics, Satish Chikkagoudar Jan 2010

Algorithms In Comparative Genomics, Satish Chikkagoudar

Dissertations

The field of comparative genomics is abundant with problems of interest to computer scientists. In this thesis, the author presents solutions to three contemporary problems: obtaining better alignments for phylogeny reconstruction, identifying related RNA sequences in genomes, and ranking Single Nucleotide Polymorphisms (SNPs) in genome-wide association studies (GWAS).

Sequence alignment is a basic and widely used task in bioinformatics. Its applications include identifying protein structure, RNAs and transcription factor binding sites in genomes, and phylogeny reconstruction. Phylogenetic descriptions depend not only on the employed reconstruction technique, but also on the underlying sequence alignment. The author has studied and established a …


Biomedical Relationship Extraction From Literature Based On Bio-Semantic Token Subsequences, Ying Xie, Jayasimha R. Katukuri, Vijay V. Raghavan Jan 2010

Biomedical Relationship Extraction From Literature Based On Bio-Semantic Token Subsequences, Ying Xie, Jayasimha R. Katukuri, Vijay V. Raghavan

Faculty and Research Publications

Relationship Extraction (RE) from biomedical literature is an important and challenging problem in both text mining and bioinformatics. Although various approaches have been proposed to extract protein?protein interaction types, their accuracy rates leave a large room for further exploring. In this paper, two supervised learning algorithms based on newly defined "bio-semantic token subsequence" are proposed for multi-class biomedical relationship classification. The first approach calculates a "bio-semantic token subsequence kernel", whereas the second one explicitly extracts weighted features from bio-semantic token subsequences. The two proposed approaches outperform several alternatives reported in literature on multi-class protein?protein interaction classification.


A Proposed Syntax For Minimotif Semantics, Version 1., Jay Vyas, Ronald J. Nowling, Mark W. Maciejewski, Sanguthevar Rajasekaran, Michael R. Gryk, Martin R. Schiller Aug 2009

A Proposed Syntax For Minimotif Semantics, Version 1., Jay Vyas, Ronald J. Nowling, Mark W. Maciejewski, Sanguthevar Rajasekaran, Michael R. Gryk, Martin R. Schiller

Life Sciences Faculty Research

BACKGROUND:

One of the most important developments in bioinformatics over the past few decades has been the observation that short linear peptide sequences (minimotifs) mediate many classes of cellular functions such as protein-protein interactions, molecular trafficking and post-translational modifications. As both the creators and curators of a database which catalogues minimotifs, Minimotif Miner, the authors have a unique perspective on the commonalities of the many functional roles of minimotifs. There is an obvious usefulness in standardizing functional annotations both in allowing for the facile exchange of data between various bioinformatics resources, as well as the internal clustering of sets of …


New Computational Approaches For Multiple Rna Alignment And Rna Search, Daniel Deblasio Jan 2009

New Computational Approaches For Multiple Rna Alignment And Rna Search, Daniel Deblasio

Electronic Theses and Dissertations

In this thesis we explore the the theory and history behind RNA alignment. Normal sequence alignments as studied by computer scientists can be completed in O(n2) time in the naive case. The process involves taking two input sequences and finding the list of edits that can transform one sequence into the other. This process is applied to biology in many forms, such as the creation of multiple alignments and the search of genomic sequences. When you take into account the RNA sequence structure the problem becomes even harder. Multiple RNA structure alignment is particularly challenging because covarying mutations make sequence …


Improving Remote Homology Detection Using A Sequence Property Approach, Gina Marie Cooper Jan 2009

Improving Remote Homology Detection Using A Sequence Property Approach, Gina Marie Cooper

Browse all Theses and Dissertations

Understanding the structure and function of proteins is a key part of understanding biological systems. Although proteins are complex biological macromolecules, they are made up of only 20 basic building blocks known as amino acids. The makeup of a protein can be described as a sequence of amino acids. One of the most important tools in modern bioinformatics is the ability to search for biological sequences (such as protein sequences) that are similar to a given query sequence. There are many tools for doing this (Altschul et al., 1990, Hobohm and Sander, 1995, Thomson et al., 1994, Karplus and Barrett, …