Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

PDF

Theses/Dissertations

2017

Institution
Keyword
Publication

Articles 1 - 26 of 26

Full-Text Articles in Physical Sciences and Mathematics

Knowledge Driven Approaches And Machine Learning Improve The Identification Of Clinically Relevant Somatic Mutations In Cancer Genomics, Benjamin John Ainscough Dec 2017

Knowledge Driven Approaches And Machine Learning Improve The Identification Of Clinically Relevant Somatic Mutations In Cancer Genomics, Benjamin John Ainscough

Arts & Sciences Electronic Theses and Dissertations

For cancer genomics to fully expand its utility from research discovery to clinical adoption, somatic variant detection pipelines must be optimized and standardized to ensure identification of clinically relevant mutations and to reduce laborious and error-prone post-processing steps. To address the need for improved catalogues of clinically and biologically important somatic mutations, we developed DoCM, a Database of Curated Mutations in Cancer (http://docm.info), as described in Chapter 2. DoCM is an open source, openly licensed resource to enable the cancer research community to aggregate, store and track biologically and clinically important cancer variants. DoCM is currently comprised of 1,364 variants …


Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick Dec 2017

Functional Data Analysis Methods For Predicting Disease Status., Sarah Kendrick

Electronic Theses and Dissertations

Introduction: Differential scanning calorimetry (DSC) is used to determine thermally-induced conformational changes of biomolecules within a blood plasma sample. Recent research has indicated that DSC curves (or thermograms) may have different characteristics based on disease status and, thus, may be useful as a monitoring and diagnostic tool for some diseases. Since thermograms are curves measured over a range of temperature values, they are often considered as functional data. In this dissertation we propose and apply functional data analysis (FDA) techniques to analyze DSC data from the Lupus Family Registry and Repository (LFRR). The aim is to develop FDA methods to …


Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang Nov 2017

Nbpmf: Novel Network-Based Inference Methods For Peptide Mass Fingerprinting, Zhewei Liang

Electronic Thesis and Dissertation Repository

Proteins are large, complex molecules that perform a vast array of functions in every living cell. A proteome is a set of proteins produced in an organism, and proteomics is the large-scale study of proteomes. Several high-throughput technologies have been developed in proteomics, where the most commonly applied are mass spectrometry (MS) based approaches. MS is an analytical technique for determining the composition of a sample. Recently it has become a primary tool for protein identification, quantification, and post translational modification (PTM) characterization in proteomics research. There are usually two different ways to identify proteins: top-down and bottom-up. Top-down approaches …


Multiple Testing Correction With Repeated Correlated Outcomes: Applications To Epigenetics, Katie Leap Oct 2017

Multiple Testing Correction With Repeated Correlated Outcomes: Applications To Epigenetics, Katie Leap

Masters Theses

Epigenetic changes (specifically DNA methylation) have been associated with adverse health outcomes; however, unlike genetic markers that are fixed over the lifetime of an individual, methylation can change. Given that there are a large number of methylation sites, measuring them repeatedly introduces multiple testing problems beyond those that exist in a static genetic context. Using simulations of epigenetic data, we considered different methods of controlling the false discovery rate. We considered several underlying associations between an exposure and methylation over time.

We found that testing each site with a linear mixed effects model and then controlling the false discovery rate …


A Combinatorial Framework For Multiple Rna Interaction Prediction, Syed Ali Ahmed Sep 2017

A Combinatorial Framework For Multiple Rna Interaction Prediction, Syed Ali Ahmed

Dissertations, Theses, and Capstone Projects

The interaction of two RNA molecules involves a complex interplay between folding and binding that warranted recent developments in RNA-RNA interaction algorithms. However, biological mechanisms in which more than two RNAs take part in an interaction also exist.

A typical algorithmic approach to such problems is to find the minimum energy structure. Often the computationally optimal solution does not represent the biologically correct structure of the interaction. In addition, different biological structures may be observed, depending on several factors. Furthermore, scoring techniques often miss critical details about dependencies within different parts of the structure, which typically leads to lower scores …


Morphogenesis And Growth Driven By Selection Of Dynamical Properties, Yuri Cantor Sep 2017

Morphogenesis And Growth Driven By Selection Of Dynamical Properties, Yuri Cantor

Dissertations, Theses, and Capstone Projects

Organisms are understood to be complex adaptive systems that evolved to thrive in hostile environments. Though widely studied, the phenomena of organism development and growth, and their relationship to organism dynamics is not well understood. Indeed, the large number of components, their interconnectivity, and complex system interactions all obscure our ability to see, describe, and understand the functioning of biological organisms.

Here we take a synthetic and computational approach to the problem, abstracting the organism as a cellular automaton. Such systems are discrete digital models of real-world environments, making them more accessible and easier to study then their physical world …


Mass Spectrometry-Based Structural Proteomics: Methodology And Application Of Fast Photochemical Oxidation Of Proteins (Fpop), Ben Niu Aug 2017

Mass Spectrometry-Based Structural Proteomics: Methodology And Application Of Fast Photochemical Oxidation Of Proteins (Fpop), Ben Niu

Arts & Sciences Electronic Theses and Dissertations

The dissertation will be solely focused on using mass spectrometry to characterize protein high order structures (HOS), it emphasizes the use of hydroxyl radical footprinting (FPOP) coupled to bottom-up MS approach. A detailed background information about FPOP, and the corresponding method developments as well as applications will be covered.

The first chapter will be a comprehensive review regarding the FPOP. Following this, chapter 2, 3, and 4 will be focused on the method developments. Chapter 2 describes an isotope dilution GC-MS method to quantitate OH radicals in FPOP; chapter 3 describes the incorporation of Leu-enkephalin as reporter peptide for a …


Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal Aug 2017

Machine Learning Based Protein Sequence To (Un)Structure Mapping And Interaction Prediction, Sumaiya Iqbal

University of New Orleans Theses and Dissertations

Proteins are the fundamental macromolecules within a cell that carry out most of the biological functions. The computational study of protein structure and its functions, using machine learning and data analytics, is elemental in advancing the life-science research due to the fast-growing biological data and the extensive complexities involved in their analyses towards discovering meaningful insights. Mapping of protein’s primary sequence is not only limited to its structure, we extend that to its disordered component known as Intrinsically Disordered Proteins or Regions in proteins (IDPs/IDRs), and hence the involved dynamics, which help us explain complex interaction within a cell that …


Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand Aug 2017

Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand

Theses and Dissertations

Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain …


Microbial Community Richness Distinguishes Shark Species Microbiomes In South Florida, Rachael Cassandra Karns Jul 2017

Microbial Community Richness Distinguishes Shark Species Microbiomes In South Florida, Rachael Cassandra Karns

HCNSO Student Theses and Dissertations

The microbiome (microbial community) of individuals is crucial when characterizing and understanding processes that are required for organism function and survival. Microbial organisms, which make up an individual’s microbiome, can be linked to disease or function of the host organism. In humans, individuals differ substantially in their microbiome compositions in various areas of the body. The cause of much of the composition diversity is yet unexplained, however, it is speculated that habitat, diet, and early exposure to microbes could be altering the microbiomes of individuals (Human Microbiome Project Consortium, 2012b, 2012a). To date, only one study has reported on microbiome …


An Integrated Bioinformatic/Experimental Approach For Discovering Novel Type Ii Polyketides Encoded In Actinobacterial Genomes, Wubin Gao Jul 2017

An Integrated Bioinformatic/Experimental Approach For Discovering Novel Type Ii Polyketides Encoded In Actinobacterial Genomes, Wubin Gao

Chemistry and Chemical Biology ETDs

Discovery of new natural products (NPs) is critical both for diseases treatment and crops protection. Numerous NP biosynthetic gene clusters (BGCs) in sequenced microbial genomes allow identification of new NPs through genome mining. Developing an integrated bioinformatic/experimental approach for discovering novel type II polyketides (PK-IIs) facilitates investigation of this family of NPs in an efficient, systematic way. Here, we developed an approach to analyze ketosynthase α/β (KSα/β) gene sequences to predict PK-II core structures, allowing us to target novel PK-II BGCs either from isolated genomic DNA or genomes from the NCBI databank, and to isolate novel PK-IIs produced by these …


Signet: A Neural Network Architecture For Predicting Protein-Protein Interactions, Muhammad S. Ahmed Jul 2017

Signet: A Neural Network Architecture For Predicting Protein-Protein Interactions, Muhammad S. Ahmed

Electronic Thesis and Dissertation Repository

The study of protein-protein interactions (PPI) is critically important within the field of Molecular Biology, as proteins facilitate key organismal functions including the maintenance of both cellular structure and function. Current experimental methods for elucidating PPIs are greatly hindered by large operating costs, lengthy wait times, as well as low accuracy. The recent development of computational PPI predicting techniques has worked to address many of these issues. Despite this, many of these methods utilize over-engineered features and naive learning algorithms. With the recent advances in Machine Learning and Artificial Intelligence, we attempt to view this problem through a novel, deep …


Lichen Conservation In Eastern North America: Population Genomics, Climate Change, And Translocations, Jessica Allen Jun 2017

Lichen Conservation In Eastern North America: Population Genomics, Climate Change, And Translocations, Jessica Allen

Dissertations, Theses, and Capstone Projects

Conservation biology is a scientific discipline that draws on methods from diverse fields to address specific conservation concerns and inform conservation actions. This field is overwhelmingly focused on charismatic animals and vascular plants, often ignoring other diverse and ecologically important groups. This trend is slowly changing in some ways; for example, increasing number of fungal species are being added to the IUCN Red-List. However, a strong taxonomic bias still exists. Here I contribute four research chapters to further the conservation of lichens, one group of frequently overlooked organisms. I address specific conservation concerns in eastern North America using modern methods. …


Mapping Analyte-Signal Relations In Lc-Ms Based Untargeted Metabolomics, Nathaniel Guy Mahieu May 2017

Mapping Analyte-Signal Relations In Lc-Ms Based Untargeted Metabolomics, Nathaniel Guy Mahieu

Arts & Sciences Electronic Theses and Dissertations

The goal of untargeted metabolomics is to profile metabolism by measuring as many metabolites as possible. A major advantage of the untargeted approach is the detection of unexpected or unknown metabolites. These metabolites have chemical structures, metabolic pathways, or cellular functions that have not been previously described. Hence, they represent exciting opportunities to advance our understanding of biology. This beneficial approach, however, also adds considerable complexity to the analysis of metabolomics data - an individual signal cannot be readily identified as a unique metabolite. As such, a major challenge faced by the untargeted metabolomic workflow is extracting the analyte content …


Comparison Of The Regulatory Dynamics Of Related Small Gene Regulatory Networks That Control The Response To Cold Shock In Saccharomyces Cerevisiae, Natalie Williams May 2017

Comparison Of The Regulatory Dynamics Of Related Small Gene Regulatory Networks That Control The Response To Cold Shock In Saccharomyces Cerevisiae, Natalie Williams

Honors Thesis

The Dahlquist Lab investigates the global, transcriptional response of Sacchromyces cerevisiae, baker’s yeast, to the environmental stress of cold shock, using DNA microarrays for the wild type strain and strains deleted for a particular regulatory transcription factor. Gene regulatory networks (GRNs) consist of transcription factors (TF), genes, and the regulatory connections between them that control the resulting mRNA and protein expression levels. We use mathematical modeling to determine the dynamics of the GRN controlling the cold shock response to determine the relative influence of each transcription factor in the network. A family of GRNs has been derived from the …


Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah May 2017

Novel Statistical Approaches For Missing Values In Truncated High-Dimensional Metabolomics Data With A Detection Threshold., Jasmit Sureshkumar Shah

Electronic Theses and Dissertations

Despite considerable advances in high throughput technology over the last decade, new challenges have emerged related to the analysis, interpretation, and integration of high-dimensional data. The arrival of omics datasets has contributed to the rapid improvement of systems biology, which seeks the understanding of complex biological systems. Metabolomics is an emerging omics field, where mass spectrometry technologies generate high dimensional datasets. As advances in this area are progressing, the need for better analysis methods to provide correct and adequate results are required. While in other omics sectors such as genomics or proteomics there has and continues to be critical understanding …


Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li May 2017

Statistical Methods For Two Problems In Cancer Research: Analysis Of Rna-Seq Data From Archival Samples And Characterization Of Onset Of Multiple Primary Cancers, Jialu Li

Dissertations & Theses (Open Access)

My dissertation is focused on quantitative methodology development and application for two important topics in translational and clinical cancer research.

The first topic was motivated by the challenge of applying transcriptome sequencing (RNA-seq) to formalin-fixation and paraffin-embedding (FFPE) tumor samples for reliable diagnostic development. We designed a biospecimen study to directly compare gene expression results from different protocols to prepare libraries for RNA-seq from human breast cancer tissues, with randomization to fresh-frozen (FF) or FFPE conditions. To comprehensively evaluate the FFPE RNA-seq data quality for expression profiling, we developed multiple computational methods for assessment, such as the uniformity and continuity …


Network Exploration Of Correlated Multivariate Protein Data For Alzheimer's Disease Association, Matthew J. Lane Apr 2017

Network Exploration Of Correlated Multivariate Protein Data For Alzheimer's Disease Association, Matthew J. Lane

Theses

Alzheimer Disease (AD) is difficult to diagnose by using genetic testing or other traditional methods. Unlike diseases with simple genetic risk components, there exists no single marker determining as to whether someone will develop AD. Furthermore, AD is highly heterogeneous and different subgroups of individuals develop the disease due to differing factors. Traditional diagnostic methods using perceivable cognitive deficiencies are often too little too late due to the brain having suffered damage from decades of disease progression. In order to observe AD at early stages prior to the observation of cognitive deficiencies, biomarkers with greater accuracy are required. By using …


Novel Neuroevolution Techniques For The Life Science Domain, Timothy Manning Jan 2017

Novel Neuroevolution Techniques For The Life Science Domain, Timothy Manning

Theses

The life science domain is a high value research area, both in terms of the benefits in increased knowledge and in societal impact. Much of the research funding has focused on wet lab based approaches to increase visibility into biological processes and producing maximal relevant information on which to make decisions. Given the complexity of biological functions, in many cases this has led to an information overload. Researchers are now able to routinely generate and access petabytes of data as a result of high throughput experiments, and this capability is growing. This data can be difficult to interpret and intractable …


Micro-Spectroscopy Of Bio-Assemblies At The Single Cell Level, Jeslin Kera Jan 2017

Micro-Spectroscopy Of Bio-Assemblies At The Single Cell Level, Jeslin Kera

Honors Undergraduate Theses

In this thesis, we investigate biological molecules on a micron scale in the ultraviolet spectral region through the non-destructive confocal absorption microscopy. The setup involves a combination of confocal microscope with a UV light excitation beam to measure the optical absorption spectra with spatial resolution of 1.4 μm in the lateral and 3.6 μm in the axial direction. Confocal absorption microscopy has the benefits of requiring no labels and only low light intensity for excitation while providing a strong signal from the contrast generated by the attenuation of propagating light due to absorption. This enables spatially resolved measurements of single …


K-Mer Analysis Pipeline For Classification Of Dna Sequences From Metagenomic Samples, Russell Kaehler Jan 2017

K-Mer Analysis Pipeline For Classification Of Dna Sequences From Metagenomic Samples, Russell Kaehler

Graduate Student Theses, Dissertations, & Professional Papers

Biological sequence datasets are increasing at a prodigious rate. The volume of data in these datasets surpasses what is observed in many other fields of science. New developments wherein metagenomic DNA from complex bacterial communities is recovered and sequenced are producing a new kind of data known as metagenomic data, which is comprised of DNA fragments from many genomes. Developing a utility to analyze such metagenomic data and predict the sample class from which it originated has many possible implications for ecological and medical applications. Within this document is a description of a series of analytical techniques used to process …


A Functional Data Analytic Approach For Region Level Differential Dna Methylation Detection, Mohamed Salem F. Milad Jan 2017

A Functional Data Analytic Approach For Region Level Differential Dna Methylation Detection, Mohamed Salem F. Milad

Doctoral Dissertations

"DNA methylation is an epigenetic modification that can alter gene expression without a DNA sequence change. The role of DNA methylation in biological processes and human health is important to understand, with many studies identifying associations between specific methylation patterns and diseases such as cancer. In mammals, DNA methylation almost always occurs when a methyl group attaches to a cytosine followed by a guanine (i.e. CpG dinucleotides) on the DNA sequence. Many statistical methods have been developed to test for a difference in DNA methylation levels between groups (e.g. healthy vs disease) at individual cytosines. Site level testing is often …


Label-Free Raman Imaging To Monitor Breast Tumor Signatures, John Ciubuc Jan 2017

Label-Free Raman Imaging To Monitor Breast Tumor Signatures, John Ciubuc

Open Access Theses & Dissertations

Methods built on Raman spectroscopy have shown major potential in describing and discriminating between malignant and benign specimens. Accurate, real-time medical diagnosis benefits in substantial improvements through this vibrational optical method. Not only is acquisition of data possible in milliseconds and analysis in minutes, Raman allows concurrent detection and monitoring of all biological components. Besides validating a significant Raman signature distinction between non-tumorigenic (MCF-10A) and tumorigenic (MCF-7) breast epithelial cells, this study reveals a label-free method to assess overexpression of epidermal growth factor receptors (EGFR) in tumor cells. EGFR overexpression sires Raman features associated with phosphorylated threonine and serine, and …


Analysis Of Microbial Diversity In Disturbed Soil, Tyler G. Sanda Jan 2017

Analysis Of Microbial Diversity In Disturbed Soil, Tyler G. Sanda

Williams Honors College, Honors Research Projects

This paper uses the composition and abundance of microbial species to analyze soil recovery in disturbed land. Surface mining disturbs ecological communities throughout the world. As organizations seek to reclaim these disturbed lands, a proper analysis of recovery is needed. In previous studies, recovery of disturbed land was limited to surface examinations, which do not characterize the possible unseen devastating effects of the subsoil. Soil microorganisms are extremely sensitive to environmental changes such as strip mining. It is proposed that these microorganisms may serve as better indicators of recovery post disturbance. Our analysis indicates microbial recovery, however it may not …


Computational Methods For Prediction And Classification Of G Protein-Coupled Receptors, Khodeza Begum Jan 2017

Computational Methods For Prediction And Classification Of G Protein-Coupled Receptors, Khodeza Begum

Open Access Theses & Dissertations

G protein-coupled receptors (GPCRs) constitute the largest group of membrane receptor proteins in eukaryotes. Due to their significant roles in many physiological processes such as vision, smell, and inflammation, GPCRs are the targets of many prescribed drugs. However, the functional and structural diversity of GPCRs has kept their prediction and classification based on amino acid sequence data as a challenging bioinformatics problem. As existing computational methods to predict and classify GPCRs are focused on mammalian (mostly human) data, the ultimate goal of our project is to establish an ensemble approach and implement a web-based software that can be used reliably …


Horizontal And Vertical Integration Of Bio-Molecular Data, Tin Chi Nguyen Jan 2017

Horizontal And Vertical Integration Of Bio-Molecular Data, Tin Chi Nguyen

Wayne State University Dissertations

Modern biomedical research lies at the crossroads of data gathering, interpretation, and hypothesis testing. Due to noise, study bias, or too small changes in biological signals between disease and healthy, individual studies often fail to identify the true phenomenon. Data integration is the key to obtaining the power needed to pinpoint the biological mechanisms of disease states. Given this, we tried to make important contributions in both horizontal and vertical integration of high-throughput data; the former is meta-analysis of independent studies, while the latter is the integration of multi-omics data.

For horizontal meta-analysis, we developed two frameworks: DANUBE and the …