Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 43

Full-Text Articles in Life Sciences

Fishermp: Fully Parallel Algorithm For Detecting Combinatorial Motifs From Large Chip-Seq Datasets., Shaoqiang Zhang, Ying Liang, Xiangyun Wang, Zhengchang Su, Yong Chen Jun 2019

Fishermp: Fully Parallel Algorithm For Detecting Combinatorial Motifs From Large Chip-Seq Datasets., Shaoqiang Zhang, Ying Liang, Xiangyun Wang, Zhengchang Su, Yong Chen

Faculty Scholarship for the College of Science & Mathematics

Detecting binding motifs of combinatorial transcription factors (TFs) from chromatin immunoprecipitation sequencing (ChIP-seq) experiments is an important and challenging computational problem for understanding gene regulations. Although a number of motif-finding algorithms have been presented, most are either time consuming or have sub-optimal accuracy for processing large-scale datasets. In this article, we present a fully parallelized algorithm for detecting combinatorial motifs from ChIP-seq datasets by using Fisher combined method and OpenMP parallel design. Large scale validations on both synthetic data and 350 ChIP-seq datasets from the ENCODE database showed that FisherMP has not only super speeds on large datasets, but also …


Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Genotype Fingerprints Enable Fast And Private Comparison Of Genetic Testing Results For Research And Direct-To-Consumer Applications., Max Robinson, Gustavo Glusman Oct 2018

Genotype Fingerprints Enable Fast And Private Comparison Of Genetic Testing Results For Research And Direct-To-Consumer Applications., Max Robinson, Gustavo Glusman

Articles, Abstracts, and Reports

Genetic testing has expanded out of the research laboratory into medical practice and the direct-to-consumer market. Rapid analysis of the resulting genotype data now has a significant impact. We present a method for summarizing personal genotypes as 'genotype fingerprints' that meets these needs. Genotype fingerprints can be derived from any single nucleotide polymorphism-based assay, and remain comparable as chip designs evolve to higher marker densities. We demonstrate that these fingerprints support distinguishing types of relationships among closely related individuals and closely related individuals from individuals from the same background population, as well as high-throughput identification of identical genotypes, individuals in …


Lineage Marker Synchrony In Hematopoietic Genealogies Refutes The Pu.1/Gata1 Toggle Switch Paradigm., Michael K Strasser, Philipp S Hoppe, Dirk Loeffler, Konstantinos D Kokkaliaris, Timm Schroeder, Fabian J Theis, Carsten Marr Jul 2018

Lineage Marker Synchrony In Hematopoietic Genealogies Refutes The Pu.1/Gata1 Toggle Switch Paradigm., Michael K Strasser, Philipp S Hoppe, Dirk Loeffler, Konstantinos D Kokkaliaris, Timm Schroeder, Fabian J Theis, Carsten Marr

Articles, Abstracts, and Reports

Molecular regulation of cell fate decisions underlies health and disease. To identify molecules that are active or regulated during a decision, and not before or after, the decision time point is crucial. However, cell fate markers are usually delayed and the time of decision therefore unknown. Fortunately, dividing cells induce temporal correlations in their progeny, which allow for retrospective inference of the decision time point. We present a computational method to infer decision time points from correlated marker signals in genealogies and apply it to differentiating hematopoietic stem cells. We find that myeloid lineage decisions happen generations before lineage marker …


A Protein Standard That Emulates Homology For The Characterization Of Protein Inference Algorithms., Matthew The, Fredrik Edfors, Yasset Perez-Riverol, Samuel H Payne, Michael R Hoopmann, Magnus Palmblad, Björn Forsström, Lukas Käll May 2018

A Protein Standard That Emulates Homology For The Characterization Of Protein Inference Algorithms., Matthew The, Fredrik Edfors, Yasset Perez-Riverol, Samuel H Payne, Michael R Hoopmann, Magnus Palmblad, Björn Forsström, Lukas Käll

Articles, Abstracts, and Reports

A natural way to benchmark the performance of an analytical experimental setup is to use samples of known composition and see to what degree one can correctly infer the content of such a sample from the data. For shotgun proteomics, one of the inherent problems of interpreting data is that the measured analytes are peptides and not the actual proteins themselves. As some proteins share proteolytic peptides, there might be more than one possible causative set of proteins resulting in a given set of peptides and there is a need for mechanisms that infer proteins from lists of detected peptides. …


Moonlighting Newborn Screening Markers: The Incidental Discovery Of A Second-Tier Test For Pompe Disease, Silvia Tortorelli, Jason S. Eckerman, Joseph J. Orsini, Colleen Stevens, Jeremy Hart, Patricia L. Hall, John J. Alexander, Dimitar Gavrilov, Devin Oglesbee, Kimiyo Raymond, Dietrich Matern, Piero Rinaldo Nov 2017

Moonlighting Newborn Screening Markers: The Incidental Discovery Of A Second-Tier Test For Pompe Disease, Silvia Tortorelli, Jason S. Eckerman, Joseph J. Orsini, Colleen Stevens, Jeremy Hart, Patricia L. Hall, John J. Alexander, Dimitar Gavrilov, Devin Oglesbee, Kimiyo Raymond, Dietrich Matern, Piero Rinaldo

Pathology and Laboratory Medicine Faculty Publications

Purpose: To describe a novel biochemical marker in dried blood spots suitable to improve the specificity of newborn screening for Pompe disease.

Methods: The new marker is a ratio calculated between the creatine/creatinine (Cre/Crn) ratio as the numerator and the activity of acid α-glucosidase (GAA) as the denominator. Using Collaborative Laboratory Integrated Reports (CLIR), the new marker was incorporated in a dual scatter plot that can achieve almost complete segregation between Pompe disease and false-positive cases.

Results: The (Cre/Crn)/GAA ratio was measured in residual dried blood spots of five Pompe cases and was found to be elevated (range 4.41–13.26; 99%ile …


Solving The Influence Maximization Problem Reveals Regulatory Organization Of The Yeast Cell Cycle., David L Gibbs, Ilya Shmulevich Jun 2017

Solving The Influence Maximization Problem Reveals Regulatory Organization Of The Yeast Cell Cycle., David L Gibbs, Ilya Shmulevich

Articles, Abstracts, and Reports

The Influence Maximization Problem (IMP) aims to discover the set of nodes with the greatest influence on network dynamics. The problem has previously been applied in epidemiology and social network analysis. Here, we demonstrate the application to cell cycle regulatory network analysis for Saccharomyces cerevisiae. Fundamentally, gene regulation is linked to the flow of information. Therefore, our implementation of the IMP was framed as an information theoretic problem using network diffusion. Utilizing more than 26,000 regulatory edges from YeastMine, gene expression dynamics were encoded as edge weights using time lagged transfer entropy, a method for quantifying information transfer between variables. …


Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen Jan 2017

Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen

Institute for Biomedical Informatics Faculty Publications

Background: Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery.

Results: We propose a new network-based disease …


An Open Data Format For Visualization And Analysis Of Cross-Linked Mass Spectrometry Results., Michael R Hoopmann, Luis Mendoza, Eric W Deutsch, David Shteynberg, Robert L Moritz Nov 2016

An Open Data Format For Visualization And Analysis Of Cross-Linked Mass Spectrometry Results., Michael R Hoopmann, Luis Mendoza, Eric W Deutsch, David Shteynberg, Robert L Moritz

Articles, Abstracts, and Reports

Protein-protein interactions are an important element in the understanding of protein function, and chemical cross-linking shotgun mass spectrometry is rapidly becoming a routine approach to identify these specific interfaces and topographical interactions. Protein cross-link data analysis is aided by dozens of algorithm choices, but hindered by a lack of a common format for representing results. Consequently, interoperability between algorithms and pipelines utilizing chemical cross-linking remains a challenge. pepXML is an open, widely-used format for representing spectral search algorithm results that has facilitated information exchange and pipeline development for typical shotgun mass spectrometry analyses. We describe an extension of this format …


Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos Mar 2016

Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos

Dartmouth Scholarship

Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the …


Climp: Clustering Motifs Via Maximal Cliques With Parallel Computing Design., Shaoqiang Zhang, Yong Chen Jan 2016

Climp: Clustering Motifs Via Maximal Cliques With Parallel Computing Design., Shaoqiang Zhang, Yong Chen

Faculty Scholarship for the College of Science & Mathematics

A set of conserved binding sites recognized by a transcription factor is called a motif, which can be found by many applications of comparative genomics for identifying over-represented segments. Moreover, when numerous putative motifs are predicted from a collection of genome-wide data, their similarity data can be represented as a large graph, where these motifs are connected to one another. However, an efficient clustering algorithm is desired for clustering the motifs that belong to the same groups and separating the motifs that belong to different groups, or even deleting an amount of spurious ones. In this work, a new motif …


Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty Dec 2015

Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty

Dartmouth Scholarship

BackgroundLarge collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes.


Application Of Subspace Clustering In Dna Sequence Analysis, Tim Wallace, Ali Sekmen, Xiaofei Wang Sep 2015

Application Of Subspace Clustering In Dna Sequence Analysis, Tim Wallace, Ali Sekmen, Xiaofei Wang

Computer Science Faculty Research

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. …


The Role Of Visualization And 3-D Printing In Biological Data Mining, Talia L. Weiss, Amanda Zieselman, Douglas P. Hill, Solomon G. Diamond, Li Shen, Andrew J. Saykin, Jason H. Moore Aug 2015

The Role Of Visualization And 3-D Printing In Biological Data Mining, Talia L. Weiss, Amanda Zieselman, Douglas P. Hill, Solomon G. Diamond, Li Shen, Andrew J. Saykin, Jason H. Moore

Dartmouth Scholarship

Background:

Biological data mining is a powerful tool that can provide a wealth of information about patterns of genetic and genomic biomarkers of health and disease. A potential disadvantage of data mining is volume and complexity of the results that can often be overwhelming. It is our working hypothesis that visualization methods can greatly enhance our ability to make sense of data mining results. More specifically, we propose that 3-D printing has an important role to play as a visualization technology in biological data mining. We provide here a brief review of 3-D printing along with a case study to …


Loregic: A Method To Characterize The Cooperative Logic Of Regulatory Factors, Daifeng Wang, Koon-Kiu Yan, Cristina Sisu, Chao Cheng, Joel Rozowsky, William Meyerson, Mark B. Gerstein Apr 2015

Loregic: A Method To Characterize The Cooperative Logic Of Regulatory Factors, Daifeng Wang, Koon-Kiu Yan, Cristina Sisu, Chao Cheng, Joel Rozowsky, William Meyerson, Mark B. Gerstein

Dartmouth Scholarship

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet’s observed gene expression pattern across many conditions. …


An Approach For Determining And Measuring Network Hierarchy Applied To Comparing The Phosphorylome And The Regulome, Chao Cheng, Erik Andrews, Koon-Kiu Yan, Matthew Ung, Daifeng Wang, Mark Gerstein Mar 2015

An Approach For Determining And Measuring Network Hierarchy Applied To Comparing The Phosphorylome And The Regulome, Chao Cheng, Erik Andrews, Koon-Kiu Yan, Matthew Ung, Daifeng Wang, Mark Gerstein

Dartmouth Scholarship

Many biological networks naturally form a hierarchy with a preponderance of downward information flow. In this study, we define a score to quantify the degree of hierarchy in a network and develop a simulated-annealing algorithm to maximize the hierarchical score globally over a network. We apply our algorithm to determine the hierarchical structure of the phosphorylome in detail and investigate the correlation between its hierarchy and kinase properties. We also compare it to the regulatory network, finding that the phosphorylome is more hierarchical than the regulome.


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


Mapping The Pareto Optimal Design Space For A Functionally Deimmunized Biotherapeutic Candidate, Regina S. Salvat, Andrew S. Parker, Yoonjoo Choi, Chris Bailey-Kellogg, Karl E. Griswold Jan 2015

Mapping The Pareto Optimal Design Space For A Functionally Deimmunized Biotherapeutic Candidate, Regina S. Salvat, Andrew S. Parker, Yoonjoo Choi, Chris Bailey-Kellogg, Karl E. Griswold

Dartmouth Scholarship

The immunogenicity of biotherapeutics can bottleneck development pipelines and poses a barrier to widespread clinical application. As a result, there is a growing need for improved deimmunization technologies. We have recently described algorithms that simultaneously optimize proteins for both reduced T cell epitope content and high-level function. In silico analysis of this dual objective design space reveals that there is no single global optimum with respect to protein deimmunization. Instead, mutagenic epitope deletion yields a spectrum of designs that exhibit tradeoffs between immunogenic potential and molecular function. The leading edge of this design space is the Pareto frontier, i.e. the …


Dprp: A Database Of Phenotype-Specific Regulatory Programs Derived From Transcription Factor Binding Data, David T. W. Tzeng, Yu-Ting Tseng, Matthew Ung, I-En Liao, Chun-Chi Liu, Chao Cheng Dec 2014

Dprp: A Database Of Phenotype-Specific Regulatory Programs Derived From Transcription Factor Binding Data, David T. W. Tzeng, Yu-Ting Tseng, Matthew Ung, I-En Liao, Chun-Chi Liu, Chao Cheng

Dartmouth Scholarship

Gene expression profiling has been extensively used in the past decades, resulting in an enormous amount of expression data available in public databases. These data sets are informative in elucidating transcriptional regulation of genes underlying various biological and clinical conditions. However, it is usually difficult to identify transcription factors (TFs) responsible for gene expression changes directly from their own expression, as TF activity is often regulated at the posttranscriptional level. In recent years, technical advances have made it possible to systematically determine the target genes of TFs by ChIP-seq experiments. To identify the regulatory programs underlying gene expression profiles, we …


Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein Aug 2014

Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein

Dartmouth Scholarship

Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.


Characteristics And Prediction Of Rna Structure., Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, Keith A Crandall Jan 2014

Characteristics And Prediction Of Rna Structure., Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, Keith A Crandall

Computational Biology Institute

RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA …


Prioritizing Protein Complexes Implicated In Human Diseases By Network Optimization., Yong Chen, Thibault Jacquemin, Shuyan Zhang, Rui Jiang Jan 2014

Prioritizing Protein Complexes Implicated In Human Diseases By Network Optimization., Yong Chen, Thibault Jacquemin, Shuyan Zhang, Rui Jiang

Faculty Scholarship for the College of Science & Mathematics

BACKGROUND: The detection of associations between protein complexes and human inherited diseases is of great importance in understanding mechanisms of diseases. Dysfunctions of a protein complex are usually defined by its member disturbance and consequently result in certain diseases. Although individual disease proteins have been widely predicted, computational methods are still absent for systematically investigating disease-related protein complexes.

RESULTS: We propose a method, MAXCOM, for the prioritization of candidate protein complexes. MAXCOM performs a maximum information flow algorithm to optimize relationships between a query disease and candidate protein complexes through a heterogeneous network that is constructed by combining protein-protein interactions …


Pathoscope: Species Identification And Strain Attribution With Unassembled Sequencing Data., Owen E Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L Clement, Eduardo Castro-Nallar, Quinn Snell, G Bruce Schaalje, Mark J Clement, Keith A Crandall, W Evan Johnson Oct 2013

Pathoscope: Species Identification And Strain Attribution With Unassembled Sequencing Data., Owen E Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L Clement, Eduardo Castro-Nallar, Quinn Snell, G Bruce Schaalje, Mark J Clement, Keith A Crandall, W Evan Johnson

Computational Biology Institute

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence …


Identification Of Snps Associated With Variola Virus Virulence, Anne Gatewood Hoen, Shea N. Gardner, Jason H. Moore Feb 2013

Identification Of Snps Associated With Variola Virus Virulence, Anne Gatewood Hoen, Shea N. Gardner, Jason H. Moore

Dartmouth Scholarship

Background: Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings: Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The …


Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges Oct 2012

Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges

Bagley College of Engineering Publications and Scholarship

The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by …


Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore Jul 2012

Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore

Dartmouth Scholarship

It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).


Phylogenetic Search Through Partial Tree Mixing., Kenneth Sundberg, Mark Clement, Quinn Snell, Dan Ventura, Michael Whiting, Keith Crandall Jan 2012

Phylogenetic Search Through Partial Tree Mixing., Kenneth Sundberg, Mark Clement, Quinn Snell, Dan Ventura, Michael Whiting, Keith Crandall

Computational Biology Institute

BACKGROUND: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques.

RESULTS: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda

CONCLUSIONS: The use of Partial Tree Mixing …


Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg Nov 2011

Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.


Comparison Of Four Chip-Seq Analytical Algorithms Using Rice Endosperm H3k27 Trimethylation Profiling Data., Brandon M. Malone, Feng Tan, Susan M. Bridges, Zhaohua Peng Sep 2011

Comparison Of Four Chip-Seq Analytical Algorithms Using Rice Endosperm H3k27 Trimethylation Profiling Data., Brandon M. Malone, Feng Tan, Susan M. Bridges, Zhaohua Peng

Bagley College of Engineering Publications and Scholarship

Chromatin immunoprecipitation coupled with high throughput DNA Sequencing (ChIP-Seq) has emerged as a powerful tool for genome wide profiling of the binding sites of proteins associated with DNA such as histones and transcription factors. However, no peak calling program has gained consensus acceptance by the scientific community as the preferred tool for ChIP-Seq data analysis. Analyzing the large data sets generated by ChIP-Seq studies remains highly challenging for most molecular biology laboratories.Here we profile H3K27me3 enrichment sites in rice young endosperm using the ChIP-Seq approach and analyze the data using four peak calling algorithms (FindPeaks, PeakSeq, USeq, and MACS). Comparison …


Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore Jul 2011

Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore

Dartmouth Scholarship

BackgroundA goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models.