Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Algorithms

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 30

Full-Text Articles in Bioinformatics

Random Forest-Integrated Analysis In Ad And Late Brain Transcriptome-Wide Data To Identify Disease-Specific Gene Expression, Xinxing Wu, Chong Peng, Peter T. Nelson, Qiang Cheng Sep 2021

Random Forest-Integrated Analysis In Ad And Late Brain Transcriptome-Wide Data To Identify Disease-Specific Gene Expression, Xinxing Wu, Chong Peng, Peter T. Nelson, Qiang Cheng

Sanders-Brown Center on Aging Faculty Publications

Alzheimer's disease (AD) is a complex neurodegenerative disorder that affects thinking, memory, and behavior. Limbic-predominant age-related TDP-43 encephalopathy (LATE) is a recently identified common neurodegenerative disease that mimics the clinical symptoms of AD. The development of drugs to prevent or treat these neurodegenerative diseases has been slow, partly because the genes associated with these diseases are incompletely understood. A notable hindrance from data analysis perspective is that, usually, the clinical samples for patients and controls are highly imbalanced, thus rendering it challenging to apply most existing machine learning algorithms to directly analyze such datasets. Meeting this data analysis challenge is …


Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu Sep 2019

Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu

Yong Chen

BACKGROUND: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called "missing gene" problem.

METHODS: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the …


Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang Feb 2018

Auditing Snomed Ct Hierarchical Relations Based On Lexical Features Of Concepts In Non-Lattice Subgraphs, Licong Cui, Olivier Bodenreider, Jay Shi, Guo-Qiang Zhang

Computer Science Faculty Publications

Objective—We introduce a structural-lexical approach for auditing SNOMED CT using a combination of non-lattice subgraphs of the underlying hierarchical relations and enriched lexical attributes of fully specified concept names. Our goal is to develop a scalable and effective approach that automatically identifies missing hierarchical IS-A relations.

Methods—Our approach involves 3 stages. In stage 1, all non-lattice subgraphs of SNOMED CT’s IS-A hierarchical relations are extracted. In stage 2, lexical attributes of fully-specified concept names in such non-lattice subgraphs are extracted. For each concept in a non-lattice subgraph, we enrich its set of attributes with attributes from its ancestor …


Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen Jan 2017

Predicting Disease-Related Genes Using Integrated Biomedical Networks, Jiajie Peng, Kun Bai, Xuequn Shang, Guohua Wang, Hansheng Xue, Shuilin Jin, Liang Cheng, Yadong Wang, Jin Chen

Institute for Biomedical Informatics Faculty Publications

Background: Identifying the genes associated to human diseases is crucial for disease diagnosis and drug design. Computational approaches, esp. the network-based approaches, have been recently developed to identify disease-related genes effectively from the existing biomedical networks. Meanwhile, the advance in biotechnology enables researchers to produce multi-omics data, enriching our understanding on human diseases, and revealing the complex relationships between genes and diseases. However, none of the existing computational approaches is able to integrate the huge amount of omics data into a weighted integrated network and utilize it to enhance disease related gene discovery.

Results: We propose a new network-based disease …


Algorithms For Glycan Structure Identification With Tandem Mass Spectrometry, Weiping Sun Sep 2016

Algorithms For Glycan Structure Identification With Tandem Mass Spectrometry, Weiping Sun

Electronic Thesis and Dissertation Repository

Glycosylation is a frequently observed post-translational modification (PTM) of proteins. It has been estimated over half of eukaryotic proteins in nature are glycoproteins. Glycoprotein analysis plays a vital role in drug preparation. Thus, characterization of glycans that are linked to proteins has become necessary in glycoproteomics. Mass spectrometry has become an effective analytical technique for glycoproteomics analysis because of its high throughput and sensitivity. The large amount of spectral data collected in a mass spectrometry experiment makes manual interpretation impossible and requires effective computational approaches for automated analysis. Different algorithmic solutions have been proposed to address the challenges in glycoproteomics …


A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali Jul 2016

A Dynamic Run-Profile Energy-Aware Approach For Scheduling Computationally Intensive Bioinformatics Applications, Sachin Pawaskar, Hesham Ali

Computer Science Faculty Proceedings & Presentations

High Performance Computing (HPC) resources are housed in large datacenters, which consume exorbitant amounts of energy and are quickly demanding attention from businesses as they result in high operating costs. On the other hand HPC environments have been very useful to researchers in many emerging areas in life sciences such as Bioinformatics and Medical Informatics. In an earlier work, we introduced a dynamic model for energy aware scheduling (EAS) in a HPC environment; the model is domain agnostic and incorporates both the deadline parameter as well as energy parameters for computationally intensive applications. Our proposed EAS model incorporates 2-phases. In …


Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos Mar 2016

Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos

Dartmouth Scholarship

Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the …


Evaluating And Improving The Efficiency Of Software And Algorithms For Sequence Data Analysis, Hugh L. Eaves Jan 2016

Evaluating And Improving The Efficiency Of Software And Algorithms For Sequence Data Analysis, Hugh L. Eaves

Theses and Dissertations

With the ever-growing size of sequence data sets, data processing and analysis are an increasingly large portion of the time and money spent on nucleic acid sequencing projects. Correspondingly, the performance of the software and algorithms used to perform that analysis has a direct effect on the time and expense involved. Although the analytical methods are widely varied, certain types of software and algorithms are applicable to a number of areas. Targeting improvements to these common elements has the potential for wide reaching rewards. This dissertation research consisted of several projects to characterize and improve upon the efficiency of several …


Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen Nov 2015

Fizzy: Feature Subset Selection For Metagenomics., Gregory Ditzler, J Calvin Morrison, Yemin Lan, Gail L Rosen

Henry M. Rowan College of Engineering Faculty Scholarship

BACKGROUND: Some of the current software tools for comparative metagenomics provide ecologists with the ability to investigate and explore bacterial communities using α- & β-diversity. Feature subset selection--a sub-field of machine learning--can also provide a unique insight into the differences between metagenomic or 16S phenotypes. In particular, feature subset selection methods can obtain the operational taxonomic units (OTUs), or functional features, that have a high-level of influence on the condition being studied. For example, in a previous study we have used information-theoretic feature selection to understand the differences between protein family abundances that best discriminate between age groups in the …


Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu Jun 2015

Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu

Jianjun Hu

Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking. The experiment results on …


Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula May 2015

Optcluster : An R Package For Determining The Optimal Clustering Algorithm And Optimal Number Of Clusters., Michael N. Sekula

Electronic Theses and Dissertations

Determining the best clustering algorithm and ideal number of clusters for a particular dataset is a fundamental difficulty in unsupervised clustering analysis. In biological research, data generated from Next Generation Sequencing technology and microarray gene expression data are becoming more and more common, so new tools and resources are needed to group such high dimensional data using clustering analysis. Different clustering algorithms can group data very differently. Therefore, there is a need to determine the best groupings in a given dataset using the most suitable clustering algorithm for that data. This paper presents the R package optCluster as an efficient …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth Jan 2015

Extracting City Traffic Events From Social Streams, Pramod Anantharam, Payam Barnaghi, Krishnaprasad Thirunarayan, Amit P. Sheth

Kno.e.sis Publications

Cities are composed of complex systems with physical, cyber, and social components. Current works on extracting and understanding city events mainly rely on technology enabled infrastructure to observe and record events. In this work, we propose an approach to leverage citizen observations of various city systems and services such as traffic, public transport, water supply, weather, sewage, and public safety as a source of city events. We investigate the feasibility of using such textual streams for extracting city events from annotated text. We formalize the problem of annotating social streams such as microblogs as a sequence labeling problem. We present …


Algorithms And Tools For Computational Analysis Of Human Transcriptome Using Rna-Seq, Nan Deng Jan 2014

Algorithms And Tools For Computational Analysis Of Human Transcriptome Using Rna-Seq, Nan Deng

Wayne State University Dissertations

Alternative splicing plays a key role in regulating gene expression, and more than 90% of human genes are alternatively spliced through different types of alternative splicing. Dysregulated alternative splicing events have been linked to a number of human diseases. Recently, high-throughput RNA-Seq technologies have provided unprecedented opportunities to better characterize and understand transcriptomes, in particular useful for the detection of splicing variants between healthy and diseased human transcriptomes.

We have developed two novel algorithms and tools and a computational workflow to interrogate human transcriptomes between healthy and diseased conditions. The first is a read count-based Expectation-Maximization (EM) algorithm and tool, …


A Semantic-Based Method For Extracting Concept Definitions From Scientific Publications: Evaluation In The Autism Phenotype Domain, Saeed Hassanpour, Martin J. O’Connor, Amar K. Das Apr 2013

A Semantic-Based Method For Extracting Concept Definitions From Scientific Publications: Evaluation In The Autism Phenotype Domain, Saeed Hassanpour, Martin J. O’Connor, Amar K. Das

Dartmouth Scholarship

Background: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.

Results: Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and …


Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu Jan 2013

Seqnls: Nuclear Localization Signal Prediction Based On Frequent Pattern Mining And Linear Motif Scoring, J.-R. Lin, Jianjun Hu

Faculty Publications

Nuclear localization signals (NLSs) are stretches of residues in proteins mediating their importing into the nucleus. NLSs are known to have diverse patterns, of which only a limited number are covered by currently known NLS motifs. Here we propose a sequential pattern mining algorithm SeqNLS to effectively identify potential NLS patterns without being constrained by the limitation of current knowledge of NLSs. The extracted frequent sequential patterns are used to predict NLS candidates which are then filtered by a linear motif-scoring scheme based on predicted sequence disorder and by the relatively local conservation (IRLC) based masking.

The experiment results on …


Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg Aug 2012

Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library.

Results: This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. …


Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore Jul 2012

Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore

Dartmouth Scholarship

It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).


Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi Dec 2011

Computing Inconsistency Measure Based On Paraconsistent Semantics, Pascal Hitzler, Yue Ma, Guilin Qi

Computer Science and Engineering Faculty Publications

Measuring inconsistency in knowledge bases has been recognized as an important problem in several research areas. Many methods have been proposed to solve this problem and a main class of them is based on some kind of paraconsistent semantics. However, existing methods suffer from two limitations: (i) they are mostly restricted to propositional knowledge bases; (ii) very few of them discuss computational aspects of computing inconsistency measures. In this article, we try to solve these two limitations by exploring algorithms for computing an inconsistency measure of first-order knowledge bases. After introducing a four-valued semantics for first-order logic, we define an …


Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg Nov 2011

Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.


Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore Jul 2011

Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore

Dartmouth Scholarship

BackgroundA goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models.


Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu Feb 2011

Genome-Wide Discovery Of Missing Genes In Biological Pathways Of Prokaryotes., Yong Chen, Fenglou Mao, Guojun Li, Ying Xu

Faculty Scholarship for the College of Science & Mathematics

BACKGROUND: Reconstruction of biological pathways is typically done through mapping well-characterized pathways of model organisms to a target genome, through orthologous gene mapping. A limitation of such pathway-mapping approaches is that the mapped pathway models are constrained by the composition of the template pathways, e.g., some genes in a target pathway may not have corresponding genes in the template pathways, the so-called "missing gene" problem.

METHODS: We present a novel pathway-expansion method for identifying additional genes that are possibly involved in a target pathway after pathway mapping, to fill holes caused by missing genes as well as to expand the …


Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong Jan 2011

Parallel Progressive Multiple Sequence Alignment On Reconfigurable Meshes, Ken Nguyen, Yi Pan, Ge Nong

Computer Science Faculty Publications

Background: One of the most fundamental and challenging tasks in bio-informatics is to identify related sequences and their hidden biological significance. The most popular and proven best practice method to accomplish this task is aligning multiple sequences together. However, multiple sequence alignment is a computing extensive task. In addition, the advancement in DNA/RNA and Protein sequencing techniques has created a vast amount of sequences to be analyzed that exceeding the capability of traditional computing models. Therefore, an effective parallel multiple sequence alignment model capable of resolving these issues is in a great demand.

Results: We design O(1) run-time solutions …


A Comparison Of The Functional Modules Identified From Time Course And Static Ppi Network Data, Xiwei Tang, Jianxin Wang, Binbin Liu, Min Li, Gang Chen, Yi Pan Jan 2011

A Comparison Of The Functional Modules Identified From Time Course And Static Ppi Network Data, Xiwei Tang, Jianxin Wang, Binbin Liu, Min Li, Gang Chen, Yi Pan

Computer Science Faculty Publications

Background: Cellular systems are highly dynamic and responsive to cues from the environment. Cellular function and response patterns to external stimuli are regulated by biological networks. A protein-protein interaction (PPI) network with static connectivity is dynamic in the sense that the nodes implement so-called functional activities that evolve in time. The shift from static to dynamic network analysis is essential for further understanding of molecular systems.

Results: In this paper, Time Course Protein Interaction Networks (TC- PINs) are reconstructed by incorporating time series gene expression into PPI networks. Then, a clustering algorithm is used to create functional modules from three …


Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg Apr 2010

Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg

Dartmouth Scholarship

To develop protein therapeutics from exogenous sources, it is necessary to mitigate the risks of eliciting an anti-biotherapeutic immune response. A key aspect of the response is the recognition and surface display by antigen-presenting cells of epitopes, short peptide fragments derived from the foreign protein. Thus, developing minimal-epitope variants represents a powerful approach to deimmunizing protein therapeutics. Critically, mutations selected to reduce immunogenicity must not interfere with the protein's therapeutic activity.


Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore Mar 2009

Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore

Dartmouth Scholarship

The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation.


A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross Jul 2007

A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross

Dartmouth Scholarship

Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought.ResultsWe present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie Jan 2006

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.


A Novel Approach To Phylogenetic Tree Construction Using Stochastic Optimization And Clustering, Ling Qin, Yixin Chen, Yi Pan, Ling Chen Jan 2006

A Novel Approach To Phylogenetic Tree Construction Using Stochastic Optimization And Clustering, Ling Qin, Yixin Chen, Yi Pan, Ling Chen

Computer Science Faculty Publications

Background: The problem of inferring the evolutionary history and constructing the phylogenetic tree with high performance has become one of the major problems in computational biology.

Results: A new phylogenetic tree construction method from a given set of objects (proteins, species, etc.) is presented. As an extension of ant colony optimization, this method proposes an adaptive phylogenetic clustering algorithm based on a digraph to find a tree structure that defines the ancestral relationships among the given objects.

Conclusion: Our phylogenetic tree construction method is tested to compare its results with that of the genetic algorithm (GA). Experimental results show that …