Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Computational Biology

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman Jan 2021

Analysis Of Subtelomeric Rextal Assemblies Using Quast, Tunazzina Islam, Desh Ranjan, Mohammad Zubair, Eleanor Young, Ming Xiao, Harold Riethman

Computer Science Faculty Publications

Genomic regions of high segmental duplication content and/or structural variation have led to gaps and misassemblies in the human reference sequence, and are refractory to assembly from whole-genome short-read datasets. Human subtelomere regions are highly enriched in both segmental duplication content and structural variations, and as a consequence are both impossible to assemble accurately and highly variable from individual to individual. Recently, we developed a pipeline for improved region-specific assembly called Regional Extension of Assemblies Using Linked-Reads (REXTAL). In this study, we evaluate REXTAL and genome-wide assembly (Supernova) approaches on 10X Genomics linked-reads data sets partitioned and barcoded using the …


Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty Dec 2015

Leveraging Global Gene Expression Patterns To Predict Expression Of Unmeasured Genes, James Rudd, René A. Zelaya, Eugene Demidenko, Ellen L. Goode, Casey S. Greene S. Greene, Jennifer A. Doherty

Dartmouth Scholarship

BackgroundLarge collections of paraffin-embedded tissue represent a rich resource to test hypotheses based on gene expression patterns; however, measurement of genome-wide expression is cost-prohibitive on a large scale. Using the known expression correlation structure within a given disease type (in this case, high grade serous ovarian cancer; HGSC), we sought to identify reduced sets of directly measured (DM) genes which could accurately predict the expression of a maximized number of unmeasured genes.


Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein Aug 2014

Orthoclust: An Orthology-Based Network Framework For Clustering Data Across Multiple Species, Koon-Kiu Yan, Daifeng Wang, Joel Rozowsky, Henry Zheng, Chao Cheng, Mark Gerstein Gerstein

Dartmouth Scholarship

Increasingly, high-dimensional genomics data are becoming available for many organisms.Here, we develop OrthoClust for simultaneously clustering data across multiple species. OrthoClust is a computational framework that integrates the co-association networks of individual species by utilizing the orthology relationships of genes between species. It outputs optimized modules that are fundamentally cross-species, which can either be conserved or species-specific. We demonstrate the application of OrthoClust using the RNA-Seq expression profiles of Caenorhabditis elegans and Drosophila melanogaster from the modENCODE consortium. A potential application of cross-species modules is to infer putative analogous functions of uncharacterized elements like non-coding RNAs based on guilt-by-association.


Structural Features Of The Pseudomonas Fluorescens Biofilm Adhesin Lapa Required For Lapg-Dependent Cleavage, Biofilm Formation, And Cell Surface Localization, Chelsea D. Boyd, T. Jarrod Smith, Sofiane El-Kirat-Chatel, Peter D. Newell, Yves F. Dufrêne, George A. O'Toole May 2014

Structural Features Of The Pseudomonas Fluorescens Biofilm Adhesin Lapa Required For Lapg-Dependent Cleavage, Biofilm Formation, And Cell Surface Localization, Chelsea D. Boyd, T. Jarrod Smith, Sofiane El-Kirat-Chatel, Peter D. Newell, Yves F. Dufrêne, George A. O'Toole

Dartmouth Scholarship

The localization of the LapA protein to the cell surface is a key step required by Pseudomonas fluorescens Pf0-1 to irreversibly attach to a surface and form a biofilm. LapA is a member of a diverse family of predicted bacterial adhesins, and although lacking a high degree of sequence similarity, family members do share common predicted domains. Here, using mutational analysis, we determine the significance of each domain feature of LapA in relation to its export and localization to the cell surface and function in biofilm formation. Our previous work showed that the N terminus of LapA is required for …


Micrornas And The Advent Of Vertebrate Morphological Complexity, Alysha M. Heimberg, Lorenzo F. Sempere, Vanessa N. Moy, Phillip C. J. Donoghue, Kevin J. Peterson Feb 2008

Micrornas And The Advent Of Vertebrate Morphological Complexity, Alysha M. Heimberg, Lorenzo F. Sempere, Vanessa N. Moy, Phillip C. J. Donoghue, Kevin J. Peterson

Dartmouth Scholarship

The causal basis of vertebrate complexity has been sought in genome duplication events (GDEs) that occurred during the emergence of vertebrates, but evidence beyond coincidence is wanting. MicroRNAs (miRNAs) have recently been identified as a viable causal factor in increasing organismal complexity through the action of these ≈22-nt noncoding RNAs in regulating gene expression. Because miRNAs are continuously being added to animalian genomes, and, once integrated into a gene regulatory network, are strongly conserved in primary sequence and rarely secondarily lost, their evolutionary history can be accurately reconstructed. Here, using a combination of Northern analyses and genomic searches, we show …


A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross Jul 2007

A Novel Ensemble Learning Method For De Novo Computational Identification Of Dna Binding Sites, Arijit Chakravarty, Jonathan M. Carlson, Radhika S. Khetani, Robert H H. Gross

Dartmouth Scholarship

Despite the diversity of motif representations and search algorithms, the de novo computational identification of transcription factor binding sites remains constrained by the limited accuracy of existing algorithms and the need for user-specified input parameters that describe the motif being sought.ResultsWe present a novel ensemble learning method, SCOPE, that is based on the assumption that transcription factor binding sites belong to one of three broad classes of motifs: non-degenerate, degenerate and gapped motifs. SCOPE employs a unified scoring metric to combine the results from three motif finding algorithms each aimed at the discovery of one of these classes of motifs. …


Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross May 2006

Bounded Search For De Novo Identification Of Degenerate Cis-Regulatory Elements, Jonathan M. Carlson, Arijit Chakravarty, Radhika S. Khetani, Robert H. Gross

Dartmouth Scholarship

The identification of statistically overrepresented sequences in the upstream regions of coregulated genes should theoretically permit the identification of potential cis-regulatory elements. However, in practice many cis-regulatory elements are highly degenerate, precluding the use of an exhaustive word-counting strategy for their identification. While numerous methods exist for inferring base distributions using a position weight matrix, recent studies suggest that the independence assumptions inherent in the model, as well as the inability to reach a global optimum, limit this approach.


Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota Nov 2005

Principal Component Analysis For Predicting Transcription-Factor Binding Motifs From Array-Derived Data, Yunlong Liu, Matthew P Vincenti, Hiroki Yokota

Dartmouth Scholarship

The responses to interleukin 1 (IL-1) in human chondrocytes constitute a complex regulatory mechanism, where multiple transcription factors interact combinatorially to transcription-factor binding motifs (TFBMs). In order to select a critical set of TFBMs from genomic DNA information and an array-derived data, an efficient algorithm to solve a combinatorial optimization problem is required. Although computational approaches based on evolutionary algorithms are commonly employed, an analytical algorithm would be useful to predict TFBMs at nearly no computational cost and evaluate varying modelling conditions. Singular value decomposition (SVD) is a powerful method to derive primary components of a given matrix. Applying SVD …