Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 22 of 22

Full-Text Articles in Entire DC Network

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang Apr 2019

Incorporating Pathway Information Into Feature Selection Towards Better Performed Gene Signatures, Suyan Tian, Chi Wang, Bing Wang

Biostatistics Faculty Publications

To analyze gene expression data with sophisticated grouping structures and to extract hidden patterns from such data, feature selection is of critical importance. It is well known that genes do not function in isolation but rather work together within various metabolic, regulatory, and signaling pathways. If the biological knowledge contained within these pathways is taken into account, the resulting method is a pathway-based algorithm. Studies have demonstrated that a pathway-based method usually outperforms its gene-based counterpart in which no biological knowledge is considered. In this article, a pathway-based feature selection is firstly divided into three major categories, namely, pathway-level selection, …


Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos Mar 2016

Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos

Dartmouth Scholarship

Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the …


Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty Dec 2015

Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty

Dartmouth Scholarship

BackgroundLarge collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes.


Application Of Subspace Clustering In Dna Sequence Analysis, Tim Wallace, Ali Sekmen, Xiaofei Wang Sep 2015

Application Of Subspace Clustering In Dna Sequence Analysis, Tim Wallace, Ali Sekmen, Xiaofei Wang

Computer Science Faculty Research

Identification and clustering of orthologous genes plays an important role in developing evolutionary models such as validating convergent and divergent phylogeny and predicting functional proteins in newly sequenced species of unverified nucleotide protein mappings. Here, we introduce an application of subspace clustering as applied to orthologous gene sequences and discuss the initial results. The working hypothesis is based upon the concept that genetic changes between nucleotide sequences coding for proteins among selected species and groups may lie within a union of subspaces for clusters of the orthologous groups. Estimates for the subspace dimensions were computed for a small population sample. …


Loregic: A Method To Characterize The Cooperative Logic Of Regulatory Factors, Daifeng Wang, Koon-Kiu Yan, Cristina Sisu, Chao Cheng, Joel Rozowsky, William Meyerson, Mark B. Gerstein Apr 2015

Loregic: A Method To Characterize The Cooperative Logic Of Regulatory Factors, Daifeng Wang, Koon-Kiu Yan, Cristina Sisu, Chao Cheng, Joel Rozowsky, William Meyerson, Mark B. Gerstein

Dartmouth Scholarship

The topology of the gene-regulatory network has been extensively analyzed. Now, given the large amount of available functional genomic data, it is possible to go beyond this and systematically study regulatory circuits in terms of logic elements. To this end, we present Loregic, a computational method integrating gene expression and regulatory network data, to characterize the cooperativity of regulatory factors. Loregic uses all 16 possible two-input-one-output logic gates (e.g. AND or XOR) to describe triplets of two factors regulating a common target. We attempt to find the gate that best matches each triplet’s observed gene expression pattern across many conditions. …


Mapping The Pareto Optimal Design Space For A Functionally Deimmunized Biotherapeutic Candidate, Regina S. Salvat, Andrew S. Parker, Yoonjoo Choi, Chris Bailey-Kellogg, Karl E. Griswold Jan 2015

Mapping The Pareto Optimal Design Space For A Functionally Deimmunized Biotherapeutic Candidate, Regina S. Salvat, Andrew S. Parker, Yoonjoo Choi, Chris Bailey-Kellogg, Karl E. Griswold

Dartmouth Scholarship

The immunogenicity of biotherapeutics can bottleneck development pipelines and poses a barrier to widespread clinical application. As a result, there is a growing need for improved deimmunization technologies. We have recently described algorithms that simultaneously optimize proteins for both reduced T cell epitope content and high-level function. In silico analysis of this dual objective design space reveals that there is no single global optimum with respect to protein deimmunization. Instead, mutagenic epitope deletion yields a spectrum of designs that exhibit tradeoffs between immunogenic potential and molecular function. The leading edge of this design space is the Pareto frontier, i.e. the …


Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein Aug 2014

Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein

Dartmouth Scholarship

Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.


Characteristics And Prediction Of Rna Structure., Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, Keith A Crandall Jan 2014

Characteristics And Prediction Of Rna Structure., Hengwu Li, Daming Zhu, Caiming Zhang, Huijian Han, Keith A Crandall

Computational Biology Institute

RNA secondary structures with pseudoknots are often predicted by minimizing free energy, which is NP-hard. Most RNAs fold during transcription from DNA into RNA through a hierarchical pathway wherein secondary structures form prior to tertiary structures. Real RNA secondary structures often have local instead of global optimization because of kinetic reasons. The performance of RNA structure prediction may be improved by considering dynamic and hierarchical folding mechanisms. This study is a novel report on RNA folding that accords with the golden mean characteristic based on the statistical analysis of the real RNA secondary structures of all 480 sequences from RNA …


Pathoscope: Species Identification And Strain Attribution With Unassembled Sequencing Data., Owen E Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L Clement, Eduardo Castro-Nallar, Quinn Snell, G Bruce Schaalje, Mark J Clement, Keith A Crandall, W Evan Johnson Oct 2013

Pathoscope: Species Identification And Strain Attribution With Unassembled Sequencing Data., Owen E Francis, Matthew Bendall, Solaiappan Manimaran, Changjin Hong, Nathan L Clement, Eduardo Castro-Nallar, Quinn Snell, G Bruce Schaalje, Mark J Clement, Keith A Crandall, W Evan Johnson

Computational Biology Institute

Emerging next-generation sequencing technologies have revolutionized the collection of genomic data for applications in bioforensics, biosurveillance, and for use in clinical settings. However, to make the most of these new data, new methodology needs to be developed that can accommodate large volumes of genetic data in a computationally efficient manner. We present a statistical framework to analyze raw next-generation sequence reads from purified or mixed environmental or targeted infected tissue samples for rapid species identification and strain attribution against a robust database of known biological agents. Our method, Pathoscope, capitalizes on a Bayesian statistical framework that accommodates information on sequence …


Identification Of Snps Associated With Variola Virus Virulence, Anne Gatewood Hoen, Shea N. Gardner, Jason H. Moore Feb 2013

Identification Of Snps Associated With Variola Virus Virulence, Anne Gatewood Hoen, Shea N. Gardner, Jason H. Moore

Dartmouth Scholarship

Background: Decades after the eradication of smallpox, its etiological agent, variola virus (VARV), remains a threat as a potential bioweapon. Outbreaks of smallpox around the time of the global eradication effort exhibited variable case fatality rates (CFRs), likely attributable in part to complex viral genetic determinants of smallpox virulence. We aimed to identify genome-wide single nucleotide polymorphisms associated with CFR. We evaluated unadjusted and outbreak geographic location-adjusted models of single SNPs and two- and three-way interactions between SNPs. Findings: Using the data mining approach multifactor dimensionality reduction (MDR), we identified five VARV SNPs in models significantly associated with CFR. The …


Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges Oct 2012

Cross-Ontology Multi-Level Association Rule Mining In The Gene Ontology., Prashanti Manda, Seval Ozkan, Hui Wang, Fiona M. Mccarthy, Susan M. Bridges

Bagley College of Engineering Publications and Scholarship

The Gene Ontology (GO) has become the internationally accepted standard for representing function, process, and location aspects of gene products. The wealth of GO annotation data provides a valuable source of implicit knowledge of relationships among these aspects. We describe a new method for association rule mining to discover implicit co-occurrence relationships across the GO sub-ontologies at multiple levels of abstraction. Prior work on association rule mining in the GO has concentrated on mining knowledge at a single level of abstraction and/or between terms from the same sub-ontology. We have developed a bottom-up generalization procedure called Cross-Ontology Data Mining-Level by …


Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore Jul 2012

Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore

Dartmouth Scholarship

It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).


Phylogenetic Search Through Partial Tree Mixing., Kenneth Sundberg, Mark Clement, Quinn Snell, Dan Ventura, Michael Whiting, Keith Crandall Jan 2012

Phylogenetic Search Through Partial Tree Mixing., Kenneth Sundberg, Mark Clement, Quinn Snell, Dan Ventura, Michael Whiting, Keith Crandall

Computational Biology Institute

BACKGROUND: Recent advances in sequencing technology have created large data sets upon which phylogenetic inference can be performed. Current research is limited by the prohibitive time necessary to perform tree search on a reasonable number of individuals. This research develops new phylogenetic algorithms that can operate on tens of thousands of species in a reasonable amount of time through several innovative search techniques.

RESULTS: When compared to popular phylogenetic search algorithms, better trees are found much more quickly for large data sets. These algorithms are incorporated in the PSODA application available at http://dna.cs.byu.edu/psoda

CONCLUSIONS: The use of Partial Tree Mixing …


Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg Nov 2011

Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.


Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore Jul 2011

Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore

Dartmouth Scholarship

BackgroundA goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models.


Improved Ibd Detection Using Incomplete Haplotype Information, Giulio Genovese, Gregory Leibon, Martin R. Pollak, Daniel N. Rockmore Jun 2010

Improved Ibd Detection Using Incomplete Haplotype Information, Giulio Genovese, Gregory Leibon, Martin R. Pollak, Daniel N. Rockmore

Dartmouth Scholarship

The availability of high density genetic maps and genotyping platforms has transformed human genetic studies. The use of these platforms has enabled population-based genome-wide association studies. However, in inheritance-based studies, current methods do not take full advantage of the information present in such genotyping analyses. In this paper we describe an improved method for identifying genetic regions shared identical-by-descent (IBD) from recent common ancestors. This method improves existing methods by taking advantage of phase information even if it is less than fully accurate or missing. We present an analysis of how using phase information increases the accuracy of IBD detection …


Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg Apr 2010

Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg

Dartmouth Scholarship

To develop protein therapeutics from exogenous sources, it is necessary to mitigate the risks of eliciting an anti-biotherapeutic immune response. A key aspect of the response is the recognition and surface display by antigen-presenting cells of epitopes, short peptide fragments derived from the foreign protein. Thus, developing minimal-epitope variants represents a powerful approach to deimmunizing protein therapeutics. Critically, mutations selected to reduce immunogenicity must not interfere with the protein's therapeutic activity.


Minimum Criteria For Dna Damage-Induced Phase Advances In Circadian Rhythms, Christian I. Hong, Judit Zámborszky, Attila Csikász-Nagy May 2009

Minimum Criteria For Dna Damage-Induced Phase Advances In Circadian Rhythms, Christian I. Hong, Judit Zámborszky, Attila Csikász-Nagy

Dartmouth Scholarship

Robust oscillatory behaviors are common features of circadian and cell cycle rhythms. These cyclic processes, however, behave distinctively in terms of their periods and phases in response to external influences such as light, temperature, nutrients, etc. Nevertheless, several links have been found between these two oscillators. Cell division cycles gated by the circadian clock have been observed since the late 1950s. On the other hand, ionizing radiation (IR) treatments cause cells to undergo a DNA damage response, which leads to phase shifts (mostly advances) in circadian rhythms. Circadian gating of the cell cycle can be attributed to the cell cycle …


Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore Mar 2009

Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore

Dartmouth Scholarship

The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation.


A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross Jul 2007

A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross

Dartmouth Scholarship

Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought.ResultsWe present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota Nov 2005

Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota

Dartmouth Scholarship

The responses to interleukin 1 (IL-1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD …