Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

Dartmouth College

Discipline
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 30

Full-Text Articles in Bioinformatics

Dna Methylation-Based Epigenetic Biomarkers In Cell-Type Deconvolution And Tumor Tissue Of Origin Identification, Ze Zhang Dec 2023

Dna Methylation-Based Epigenetic Biomarkers In Cell-Type Deconvolution And Tumor Tissue Of Origin Identification, Ze Zhang

Dartmouth College Ph.D Dissertations

DNA methylation is an epigenetic modification that regulates gene expression and is essential to establishing and preserving cellular identity. Genome-wide DNA methylation arrays provide a standardized and cost-effective approach to measuring DNA methylation. When combined with a cell-type reference library, DNA methylation measures allow the assessment of underlying cell-type proportions in heterogeneous mixtures. This approach, known as DNA methylation deconvolution or methylation cytometry, offers a standardized and cost-effective method for evaluating cell-type proportions. While this approach has succeeded in discerning cell types in various human tissues like blood, brain, tumors, skin, breast, and buccal swabs, the existing methods have major …


Tracing Evolution Of Gene Transfer Agents Using Comparative Genomics, Roman Kogay Nov 2023

Tracing Evolution Of Gene Transfer Agents Using Comparative Genomics, Roman Kogay

Dartmouth College Ph.D Dissertations

The accumulating evidence suggest that viruses and their components can be domesticated by their hosts, equipping them with convenient molecular toolkits for various functions. One of such domesticated system is Gene Transfer Agents (GTAs) that are produced by some bacteria and archaea. GTAs morphologically resemble small phage-like particles and contain random fragments of their host genome. They are produced only by a small fraction of the microbial population and are released through a lysis of the host cell. Bioinformatic analyses suggest that GTAs are especially abundant in the taxonomic class of Alphaproteobacteria, where they are vertically inherited and evolve …


Genome-Scale Methylation Analysis In Blood And Tumor Identifies Immune Profile, Age Acceleration, And Dna Methylation Alterations Associated With Bladder Cancer Outcomes, Ji-Qing Chen Aug 2023

Genome-Scale Methylation Analysis In Blood And Tumor Identifies Immune Profile, Age Acceleration, And Dna Methylation Alterations Associated With Bladder Cancer Outcomes, Ji-Qing Chen

Dartmouth College Ph.D Dissertations

Bladder cancer patients receive frequent screening due to the high tumor recurrence rate (more than 60%). Nowadays, the conventional monitoring method relies on cystoscopy which is highly invasive and increases patient morbidity and burden to the health care system with frequent follow-up. As a result, it is urgent to explore novel markers related to the outcomes of bladder cancer. Immune profiles have been associated with cancer outcomes and may have the potential to be biomarkers for outcomes management. However, little work has been conducted to investigate the associations of immune cell profiles with bladder cancer outcomes. Here, I utilized the …


Cell-Typing And Interaction Analysis Of The Immune Compartment Of The Tumor Microenvironment Using High-Resolution Omics Modalities, Courtney Taylor Schiebout Apr 2023

Cell-Typing And Interaction Analysis Of The Immune Compartment Of The Tumor Microenvironment Using High-Resolution Omics Modalities, Courtney Taylor Schiebout

Dartmouth College Ph.D Dissertations

Single-cell RNA-sequencing (scRNA-seq) has provided a new frontier for the investigation of complex tissues. One ideal candidate for the utilization of this method is the tumor microenvironment (TME). The TME is often host to a complex set of cell populations and behaviors that can be highly influential for cancer inhibition or progression. This is especially true of the immune compartment of the TME: the presence of certain types of immune cells in the TME and their expression profiles can significantly affect cancer prognosis in some cases. By providing individual cell-level gene expression data, scRNA-seq can be highly informative for characterizing …


Deep Learning Methods For Prediction Of And Escape From Protein Recognition, Bowen Dai Mar 2023

Deep Learning Methods For Prediction Of And Escape From Protein Recognition, Bowen Dai

Dartmouth College Ph.D Dissertations

Protein interactions drive diverse processes essential to living organisms, and thus numerous biomedical applications center on understanding, predicting, and designing how proteins recognize their partners. While unfortunately the number of interactions of interest still vastly exceeds the capabilities of experimental determination methods, computational methods promise to fill the gap. My thesis pursues the development and application of computational methods for several protein interaction prediction and design tasks. First, to improve protein-glycan interaction specificity prediction, I developed GlyBERT, which learns biologically relevant glycan representations encapsulating the components most important for glycan recognition within their structures. GlyBERT encodes glycans with a branched …


Characterization Of Cell Type-Specific Molecular Heterogeneity In Cancer Using Multi-Omic Approaches, Min Kyung Lee Jan 2023

Characterization Of Cell Type-Specific Molecular Heterogeneity In Cancer Using Multi-Omic Approaches, Min Kyung Lee

Dartmouth College Ph.D Dissertations

Tumors are composed of heterogeneous cell types each with its own unique molecular profiles. Recent advances in single cell genomics technologies have begun to increase our understanding of the molecular heterogeneity that exists in tumors with particular focus on gene expression and chromatin accessibility profiles. However, due to limitations in methods for certain sample types and high cost for single cell genomics, bulk tumor molecular profiling has been and remains widely used. In addition, other facets of single cell epigenomic profiling, particularly methylation and hydroxymethylation, remains underexplored. Thus, investigations to understand the cell type specific epigenetic heterogeneity and the cooperation …


Multi-Omic And Single-Cell Characterization Of A 3d Skin-Like Tissue Model Of Systemic Sclerosis With A Focus On Epigenetics, Tamar R. Abel Jan 2023

Multi-Omic And Single-Cell Characterization Of A 3d Skin-Like Tissue Model Of Systemic Sclerosis With A Focus On Epigenetics, Tamar R. Abel

Dartmouth College Ph.D Dissertations

Systemic sclerosis (SSc) is a rare fibrotic autoimmune disease with high mortality and limited FDA approved therapies. Clinical concordance among twins is low; however, a modest familial heritability implicates complex interactions between polygenic risk alleles and the environment as causal – interactions that are mediated by epigenetics. Although mechanisms of tissue fibrosis have been successfully identified and targeted in established in vitro models for SSc, these molecules have ultimately failed in clinical trials. This likely reflects the lack of a reliable in vitro model that faithfully recapitulates the disease.

I utilized a 3D skin-like tissue model to study SSc skin …


The Roles Of Epithelial–Mesenchymal Plasticity In Tumor Heterogeneity, Metastasis, And Patient Survival In Breast Cancer, Meredith Septer Brown Jul 2022

The Roles Of Epithelial–Mesenchymal Plasticity In Tumor Heterogeneity, Metastasis, And Patient Survival In Breast Cancer, Meredith Septer Brown

Dartmouth College Ph.D Dissertations

The Epithelial-to-Mesenchymal transition, a critical cellular process in development, is frequently co-opted by solid tumors to promote invasion and metastasis. In particular, the hybrid or intermediate EMT state, possessing both epithelial and mesenchymal characteristics, is associated with increased cancer stemness and plasticity. Similarly, intra-tumoral heterogeneity in solid tumors, in particular breast cancer, is associated with poor prognosis, tumor growth, proliferation, drug resistance, and metastasis. We sought to understand the link between the generation of intra-tumoral heterogeneity and the intermediate EMT state and their impact on tumor progression and patient prognosis. As part of my thesis work, I developed a model …


Deciphering Taxa-Function Relationships In Population-Level Studies Of Human Gut Microbiomes, Quang P. Nguyen Jun 2022

Deciphering Taxa-Function Relationships In Population-Level Studies Of Human Gut Microbiomes, Quang P. Nguyen

Dartmouth College Ph.D Dissertations

The human gut microbiome is a complex and dynamic ecosystem, featuring a multitude of microbes all interacting with their hosts in an elaborate manner. Even though this exchange is often mediated through microbial metabolic and functional outputs, such as the production of certain metabolites, environmental exposures, and host lifestyle are highly influential in shaping the presence of microbial species irrespective of their individual roles. As such, a comprehensive understanding of the microbiome requires researchers to examine the relationship between taxonomic abundance and function simultaneously. Assessing microbial contributions to important ecosystem services can enable identification of robust functions supported by a …


Discretized Geometric Approaches To The Analysis Of Protein Structures, John Holland Sep 2021

Discretized Geometric Approaches To The Analysis Of Protein Structures, John Holland

Dartmouth College Ph.D Dissertations

Proteins play crucial roles in a variety of biological processes. While we know that their amino acid sequence determines their structure, which in turn determines their function, we do not know why particular sequences fold into particular structures. My work focuses on discretized geometric descriptions of protein structure—conceptualizing native structure space as composed of mostly discrete, geometrically defined fragments—to better understand the patterns underlying why particular sequence elements correspond to particular structure elements. This discretized geometric approach is applied to multiple levels of protein structure, from conceptualizing contacts between residues as interactions between discrete structural elements to treating protein structures …


A Multi-Resolution Graph Convolution Network For Contiguous Epitope Prediction, Lisa Oh Jan 2021

A Multi-Resolution Graph Convolution Network For Contiguous Epitope Prediction, Lisa Oh

Dartmouth College Master’s Theses

Computational methods for predicting binding interfaces between antigens and antibodies (epitopes and paratopes) are faster and cheaper than traditional experimental structure determination methods. A sufficiently reliable computational predictor that could scale to large sets of available antibody sequence data could thus inform and expedite many biomedical pursuits, such as better understanding immune responses to vaccination and natural infection and developing better drugs and vaccines. However, current state-of-the-art predictors produce discontiguous predictions, e.g., predicting the epitope in many different spots on an antigen, even though in reality they typically comprise a single localized region. We seek to produce contiguous predicted epitopes, …


Detecting Gene-Gene Interactions Using A Permutation-Based Random Forest Method, Jing Li, James D. Malley, Angeline S. Andrew, Margaret R. Karagas, Jason H. Moore Apr 2016

Detecting Gene-Gene Interactions Using A Permutation-Based Random Forest Method, Jing Li, James D. Malley, Angeline S. Andrew, Margaret R. Karagas, Jason H. Moore

Dartmouth Scholarship

Identifying gene-gene interactions is essential to understand disease susceptibility and to detect genetic architectures underlying complex diseases. Here, we aimed at developing a permutation-based methodology relying on a machine learning method, random forest (RF), to detect gene-gene interactions. Our approach called permuted random forest (pRF) which identified the top interacting single nucleotide polymorphism (SNP) pairs by estimating how much the power of a random forest classification model is influenced by removing pairwise interactions.


Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos Mar 2016

Fastpop: A Rapid Principal Component Derived Method To Infer Intercontinental Ancestry Using Genetic Data, Yafang Li, Jinyoung Byun, Guoshuai Cai, Xiangjun Xiao, Younghun Han, Olivier Cornelis, James E. Dinulos, Joe Dennis, Douglas Easton, Ivan Gorlov, Michael F. Seldin, Christopher I. Amos

Dartmouth Scholarship

Identifying subpopulations within a study and inferring intercontinental ancestry of the samples are important steps in genome wide association studies. Two software packages are widely used in analysis of substructure: Structure and Eigenstrat. Structure assigns each individual to a population by using a Bayesian method with multiple tuning parameters. It requires considerable computational time when dealing with thousands of samples and lacks the ability to create scores that could be used as covariates. Eigenstrat uses a principal component analysis method to model all sources of sampling variation. However, it does not readily provide information directly relevant to ancestral origin; the …


Identifying Gene-Gene Interactions That Are Highly Associated With Body Mass Index Using Quantitative Multifactor Dimensionality Reduction (Qmdr), Rishika De, Shefali S. Verma, Fotios Drenos, Emily R. Holzinger Dec 2015

Identifying Gene-Gene Interactions That Are Highly Associated With Body Mass Index Using Quantitative Multifactor Dimensionality Reduction (Qmdr), Rishika De, Shefali S. Verma, Fotios Drenos, Emily R. Holzinger

Dartmouth Scholarship

Despite heritability estimates of 40–70% for obesity, less than 2% of its variation is explained by Body Mass Index (BMI) associated loci that have been identified so far. Epistasis, or gene-gene interactions are a plausible source to explain portions of the missing heritability of BMI. Using genotypic data from 18,686 individuals across five study cohorts – ARIC, CARDIA, FHS, CHS, MESA – we filtered SNPs (Single Nucleotide Polymorphisms) using two parallel approaches. SNPs were filtered either on the strength of their main effects of association with BMI, or on the number of knowledge sources supporting a specific SNP-SNP interaction in …


Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore Mar 2015

Spectral Gene Set Enrichment (Sgse), H Robert Frost, Zhigang Li, Jason H. Moore

Dartmouth Scholarship

Gene set testing is typically performed in a supervised context to quantify the association between groups of genes and a clinical phenotype. In many cases, however, a gene set-based interpretation of genomic data is desired in the absence of a phenotype variable. Although methods exist for unsupervised gene set testing, they predominantly compute enrichment relative to clusters of the genomic variables with performance strongly dependent on the clustering algorithm and number of clusters. We propose a novel method, spectral gene set enrichment (SGSE), for unsupervised competitive testing of the association between gene sets and empirical data sources. SGSE first computes …


A Classification And Characterization Of Two-Locus, Pure, Strict, Epistatic Models For Simulation And Detection, Ryan J. Urbanowicz, Ambrose L. S. Granizo-Mackenzie, Jeff Kiralis, Jason H Moore Jun 2014

A Classification And Characterization Of Two-Locus, Pure, Strict, Epistatic Models For Simulation And Detection, Ryan J. Urbanowicz, Ambrose L. S. Granizo-Mackenzie, Jeff Kiralis, Jason H Moore

Dartmouth Scholarship

BackgroundThe statistical genetics phenomenon of epistasis is widely acknowledged to confound disease etiology. In order to evaluate strategies for detecting these complex multi-locus disease associations, simulation studies are required. The development of the GAMETES software for the generation of complex genetic models, has provided the means to randomly generate an architecturally diverse population of epistatic models that are both pure and strict, i.e. all n loci, but no fewer, are predictive of phenotype. Previous theoretical work characterizing complex genetic models has yet to examine pure, strict, epistasis which should be the most challenging to detect. This study addresses three goals: …


Integrated Assessment Of Predicted Mhc Binding And Cross-Conservation With Self Reveals Patterns Of Viral Camouflage, Lu He, Anne S. De Groot, Andres H. Gutierrez, William D. Martin, Lenny Moise, Chris Bailey-Kellogg Mar 2014

Integrated Assessment Of Predicted Mhc Binding And Cross-Conservation With Self Reveals Patterns Of Viral Camouflage, Lu He, Anne S. De Groot, Andres H. Gutierrez, William D. Martin, Lenny Moise, Chris Bailey-Kellogg

Dartmouth Scholarship

Immune recognition of foreign proteins by T cells hinges on the formation of a ternary complex sandwiching a constituent peptide of the protein between a major histocompatibility complex (MHC) molecule and a T cell receptor (TCR). Viruses have evolved means of "camouflaging" themselves, avoiding immune recognition by reducing the MHC and/or TCR binding of their constituent peptides. Computer-driven T cell epitope mapping tools have been used to evaluate the degree to which articular viruses have used this means of avoiding immune response, but most such analyses focus on MHC-facing ‘agretopes'. Here we set out a new means of evaluating the …


A Semantic-Based Method For Extracting Concept Definitions From Scientific Publications: Evaluation In The Autism Phenotype Domain, Saeed Hassanpour, Martin J. O’Connor, Amar K. Das Apr 2013

A Semantic-Based Method For Extracting Concept Definitions From Scientific Publications: Evaluation In The Autism Phenotype Domain, Saeed Hassanpour, Martin J. O’Connor, Amar K. Das

Dartmouth Scholarship

Background: A variety of informatics approaches have been developed that use information retrieval, NLP and text-mining techniques to identify biomedical concepts and relations within scientific publications or their sentences. These approaches have not typically addressed the challenge of extracting more complex knowledge such as biomedical definitions. In our efforts to facilitate knowledge acquisition of rule-based definitions of autism phenotypes, we have developed a novel semantic-based text-mining approach that can automatically identify such definitions within text.

Results: Using an existing knowledge base of 156 autism phenotype definitions and an annotated corpus of 26 source articles containing such definitions, we evaluated and …


How Long Is A Piece Of Loop?, Yoonjoo Choi, Sumeet Agarwal, Charlotte M. Deane Feb 2013

How Long Is A Piece Of Loop?, Yoonjoo Choi, Sumeet Agarwal, Charlotte M. Deane

Dartmouth Scholarship

Loops are irregular structures which connect two secondary structure elements in proteins. They often play important roles in function, including enzyme reactions and ligand binding. Despite their importance, their structure remains difficult to predict. Most protein loop structure prediction methods sample local loop segments and score them. In particular protein loop classifications and database search methods depend heavily on local properties of loops. Here we examine the distance between a loop's end points (span). We find that the distribution of loop span appears to be independent of the number of residues in the loop, in other words the separation between …


Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg Aug 2012

Algorithms For Optimizing Cross-Overs In Dna Shuffling, Lu He, Alan M. Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

DNA shuffling generates combinatorial libraries of chimeric genes by stochastically recombining parent genes. The resulting libraries are subjected to large-scale genetic selection or screening to identify those chimeras with favorable properties (e.g., enhanced stability or enzymatic activity). While DNA shuffling has been applied quite successfully, it is limited by its homology-dependent, stochastic nature. Consequently, it is used only with parents of sufficient overall sequence identity, and provides no control over the resulting chimeric library.

Results: This paper presents efficient methods to extend the scope of DNA shuffling to handle significantly more diverse parents and to generate more predictable, optimized libraries. …


Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore Jul 2012

Gene Ontology Analysis Of Pairwise Genetic Associations In Two Genome-Wide Studies Of Sporadic Als, Nora Chung Kim, Peter C. Andrews, Folkert W. Asselbergs, H Robert Frost, Scott M. Williams, Brent T. Harris, Cynthia Read, Kathleen D. Askland, Jason H. Moore

Dartmouth Scholarship

It is increasingly clear that common human diseases have a complex genetic architecture characterized by both additive and nonadditive genetic effects. The goal of the present study was to determine whether patterns of both additive and nonadditive genetic associations aggregate in specific functional groups as defined by the Gene Ontology (GO).


Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit May 2012

Dna Methylation Arrays As Surrogate Measures Of Cell Mixture Distribution, Eugene Houseman, William P. Accomando, Devin C. Koestler, Brock C. Christensen, Carmen J. Marsit

Dartmouth Scholarship

There has been a long-standing need in biomedical research for a method that quantifies the normally mixed composition of leukocytes beyond what is possible by simple histological or flow cytometric assessments. The latter is restricted by the labile nature of protein epitopes, requirements for cell processing, and timely cell analysis. In a diverse array of diseases and following numerous immune-toxic exposures, leukocyte composition will critically inform the underlying immuno-biology to most chronic medical conditions. Emerging research demonstrates that DNA methylation is responsible for cellular differentiation, and when measured in whole peripheral blood, serves to distinguish cancer cases from controls.


Data Sharing In Neuroimaging Research, Jean-Baptiste Poline, Janis L. Breeze, Satrajit Ghosh, Krzysztof Gorgolewski, Yaroslav O. Halchenko Apr 2012

Data Sharing In Neuroimaging Research, Jean-Baptiste Poline, Janis L. Breeze, Satrajit Ghosh, Krzysztof Gorgolewski, Yaroslav O. Halchenko

Dartmouth Scholarship

Significant resources around the world have been invested in neuroimaging studies of brain function and disease. Easier access to this large body of work should have profound impact on research in cognitive neuroscience and psychiatry, leading to advances in the diagnosis and treatment of psychiatric and neurological disease. A trend toward increased sharing of neuroimaging data has emerged in recent years. Nevertheless, a number of barriers continue to impede momentum. Many researchers and institutions remain uncertain about how to share data or lack the tools and expertise to participate in data sharing. The use of electronic data capture (EDC) methods …


Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg Nov 2011

Planning Combinatorial Disulfide Cross-Links For Protein Fold Determination, Fei Xiong, Alan M Friedman, Chris Bailey-Kellogg

Dartmouth Scholarship

Fold recognition techniques take advantage of the limited number of overall structural organizations, and have become increasingly effective at identifying the fold of a given target sequence. However, in the absence of sufficient sequence identity, it remains difficult for fold recognition methods to always select the correct model. While a native-like model is often among a pool of highly ranked models, it is not necessarily the highest-ranked one, and the model rankings depend sensitively on the scoring function used. Structure elucidation methods can then be employed to decide among the models based on relatively rapid biochemical/biophysical experiments.


Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore Jul 2011

Evolving Hard Problems: Generating Human Genetics Datasets With A Complex Etiology, Daniel S Himmelstein, Casey S Greene, Jason H Moore

Dartmouth Scholarship

BackgroundA goal of human genetics is to discover genetic factors that influence individuals' susceptibility to common diseases. Most common diseases are thought to result from the joint failure of two or more interacting components instead of single component failures. This greatly complicates both the task of selecting informative genetic variants and the task of modeling interactions between them. We and others have previously developed algorithms to detect and model the relationships between these genetic factors and disease. Previously these methods have been evaluated with datasets simulated according to pre-defined genetic models.


Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg Apr 2010

Optimization Algorithms For Functional Deimmunization Of Therapeutic Proteins, Andrew S. Parker, Wei Zheng, Karl E. Griswold, Chris Bailey-Kellogg

Dartmouth Scholarship

To develop protein therapeutics from exogenous sources, it is necessary to mitigate the risks of eliciting an anti-biotherapeutic immune response. A key aspect of the response is the recognition and surface display by antigen-presenting cells of epitopes, short peptide fragments derived from the foreign protein. Thus, developing minimal-epitope variants represents a powerful approach to deimmunizing protein therapeutics. Critically, mutations selected to reduce immunogenicity must not interfere with the protein's therapeutic activity.


Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore Mar 2009

Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore

Dartmouth Scholarship

The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation.


A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross Jul 2007

A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross

Dartmouth Scholarship

Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought.ResultsWe present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie Jan 2006

Gpnn: Power Studies And Applications Of A Neural Network Method For Detecting Gene-Gene Interactions In Studies Of Human Disease, Alison A. Motsinger, Stephen L. Lee, George Mellick, Marylyn D. Ritchie

Dartmouth Scholarship

The identification and characterization of genes that influence the risk of common, complex multifactorial disease primarily through interactions with other genes and environmental factors remains a statistical and computational challenge in genetic epidemiology. We have previously introduced a genetic programming optimized neural network (GPNN) as a method for optimizing the architecture of a neural network to improve the identification of gene combinations associated with disease risk. The goal of this study was to evaluate the power of GPNN for identifying high-order gene-gene interactions. We were also interested in applying GPNN to a real data analysis in Parkinson's disease.