Open Access. Powered by Scholars. Published by Universities.®

Computational Biology Commons

Open Access. Powered by Scholars. Published by Universities.®

2009

Discipline
Institution
Keyword
Publication
Publication Type
File Type

Articles 1 - 29 of 29

Full-Text Articles in Computational Biology

Genetic Effect Of The Dwarfing Genes On Some Culm Characteristics Associatcd With Lodging Resistance In Bread Wheat, Md. Mahbub Hasan Dec 2009

Genetic Effect Of The Dwarfing Genes On Some Culm Characteristics Associatcd With Lodging Resistance In Bread Wheat, Md. Mahbub Hasan

Md. Mahbub Hasan

Due to the challenge of screening traits related to lodging resistance under natural field conditions, selection for lodging resistant varieties in wheat breeding programs is difficult. The identification of easily measurable culm anatomical traits related to lodging resistance would simplify the selection process. The present study was conducted to determine the effect of dwarfing genes on culm anatomical traits related to lodging resistance in our of basal internode 1. Field and laboratory study was conducted in Shahjalal University of Science and Technology, Sylhet, Bangladesh with eight wheat genotypes having Rhr1, Rht2 dwarfing genes in them and a local land race …


Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham Nov 2009

Targeted Genomic Signature Profiling With Quasi-Alignment Statistics, Rao Mallik Kotamarti, Douglas W. Raiford, Michael Hahsler, Yuhang Wang, Monnie Mcgee, Maggie Dunham

COBRA Preprint Series

Genome databases continue to expand with no change in the basic format of sequence data. The prevalent use of the Classic alignment based search tools like BLAST have significantly pushed the limits of Genome Isolate research. The relatively new frontier of Metagenomic research deals with thousands of diverse genomes with newer demands beyond the current homologue search and analysis. Compressing sequence data into a complex form could facilitate a broader range of sequence analyses. To this end, this research explores reorganizing sequence data as complex Markov signatures also known as Extensible Markov Models. Markov models have found successful application in …


Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi Sep 2009

Integrative Clustering Of Multiple Genomic Data Types Using A Joint Latent Variable Model With Application To Breast And Lung Cancer Subtype Analysis, Ronglai Shen, Adam Olshen, Marc Ladanyi

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

The molecular complexity of a tumor manifests itself at the genomic, epigenomic, transcriptomic, and proteomic levels. Genomic profiling at these multiple levels should allow an integrated characterization of tumor etiology. However, there is a shortage of effective statistical and bioinformatic tools for truly integrative data analysis. The standard approach to integrative clustering is separate clustering followed by manual integration. A more statistically powerful approach would incorporate all data types simultaneously and generate a single integrated cluster assignment. We developed a joint latent variable model for integrative clustering. We call the resulting methodology iCluster. iCluster incorporates flexible modeling of the associations …


Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo Sep 2009

Model-Based Quality Assessment And Base-Calling For Second-Generation Sequencing Data, Rafael A. Irizarry, Hector Corrada Bravo

Johns Hopkins University, Dept. of Biostatistics Working Papers

Second-generation sequencing (sec-gen) technology can sequence millions of short fragments of DNA in parallel, and is capable of assembling complex genomes for a small fraction of the price and time of previous technologies. In fact, a recently formed international consortium, the 1,000 Genomes Project, plans to fully sequence the genomes of approximately 1,200 people. The prospect of comparative analysis at the sequence level of a large number of samples across multiple populations may be achieved within the next five years. These data present unprecedented challenges in statistical analysis. For instance, analysis operates on millions of short nucleotide sequences, or reads—strings …


A Classification Model For Distinguishing Copy Number Variants From Cancer-Related Alterations, Irina Ostrovnaya, Gouri Nanjangud, Adam Olshen Aug 2009

A Classification Model For Distinguishing Copy Number Variants From Cancer-Related Alterations, Irina Ostrovnaya, Gouri Nanjangud, Adam Olshen

Memorial Sloan-Kettering Cancer Center, Dept. of Epidemiology & Biostatistics Working Paper Series

Both somatic copy number alterations (CNAs) and germline copy number variants (CNVs) that are prevalent in healthy individuals can appear as recurrent changes in comparative genomic hybridization (CGH) analyses of tumors. In order to identify important cancer genes CNAs and CNVs must be distinguished. Although the Database of Genomic Variants (Iafrate et al., 2004) contains a list of all known CNVs, there is no standard methodology to use the database effectively.

We develop a prediction model that distinguishes CNVs from CNAs based on the information contained in the Database and several other variables, including potential CNV’s length, height, closeness to …


Artificial Intelligence – Ii: Network Path Optimization Using Ga Approach, Madiha Sarfraz, Shaleeza Sohail, Younus Javed, Almas Anjum Aug 2009

Artificial Intelligence – Ii: Network Path Optimization Using Ga Approach, Madiha Sarfraz, Shaleeza Sohail, Younus Javed, Almas Anjum

International Conference on Information and Communication Technologies

In this paper, we present a variation of Genetic Algorithm (GA) for finding the Optimized shortest path of the network. The algorithm finds the optimal path based on the bandwidth and utilization of the network. The main distinguishing element of this work is the use of ldquo2-point over 1-point crossoverrdquo. The population comprises of all chromosomes (feasible and infeasible). Moreover, it is of variable length, so that the algorithm can perform efficiently in all scenarios. Rankbased selection is used for cross-over operation. Therefore, the best chromosomes crossover and give the most suitable offsprings. If the resulting offsprings are least fitted, …


Artificial Intelligence – Ii: Retinal Image Blood Vessel Segmentation, M. Usman Akram, Anam Tariq, Shoab A. Khan Aug 2009

Artificial Intelligence – Ii: Retinal Image Blood Vessel Segmentation, M. Usman Akram, Anam Tariq, Shoab A. Khan

International Conference on Information and Communication Technologies

The appearance and structure of blood vessels in retinal images play an important role in diagnosis of eye diseases. This paper proposes a method for segmentation of blood vessels in color retinal images. We present a method that uses 2-D Gabor wavelet to enhance the vascular pattern. We locate and segment the blood vessels using adaptive thresholding. The technique is tested on publicly available DRIVE database of manually labeled images which has been established to facilitate comparative studies on segmentation of blood vessels in retinal images. The proposed method achieves an area under the receiver operating characteristic curve of 0.963 …


A Decomposition Of The Pure Parsimony Problem, Allen Holder, Thomas M. Langley Aug 2009

A Decomposition Of The Pure Parsimony Problem, Allen Holder, Thomas M. Langley

Mathematical Sciences Technical Reports (MSTR)

We partially order a collection of genotypes so that we can represent the problem of inferring the least number of haplotypes in terms of substructures we call g-lattices. This representation allows us to prove that if the genotypes partition into chains with certain structure, then the NP-Hard problem can be solved efficiently. Even without the specified structure, the decomposition shows how to separate the underlying integer programming model into smaller models.


Evaluation Of Annotation Performances Between Automated And Curated Databases Of E.Coli Using The Correlation Coefficient, Reddysalilaja Marpuri Aug 2009

Evaluation Of Annotation Performances Between Automated And Curated Databases Of E.Coli Using The Correlation Coefficient, Reddysalilaja Marpuri

Masters Theses & Specialist Projects

This project compared the performance of the correlation coefficient to show similarities in annotations between a predictive automated bacterial annotation database and the curated EcoCyc database. EcoCyc is a conservative multidimensional annotation system that is exclusively based on experimentally validated findings by over 15,000 publications. The automated annotation system, used in the comparison was BASys. It is often used as a first pass annotation tool that tries to add as many annotations as possible by drawing upon over 30 information sources. Gene ontology served as one basis of comparison between these databases because of the limited common terms in the …


Adding Upstream Sequence And A Downstream Reporter To The Bile Acid Inducible Promoter Of Clostridium Scindens Vpi 12708, Bryan Patrick Mason Aug 2009

Adding Upstream Sequence And A Downstream Reporter To The Bile Acid Inducible Promoter Of Clostridium Scindens Vpi 12708, Bryan Patrick Mason

Masters Theses & Specialist Projects

Bile acids in the small intestines of animals serve to breakdown fats and fatsoluble vitamins. Most of the bile acids are reabsorbed into the enterohepatic circulation, but approximately five percent of these bile acids pass into the large intestine. These bile acids are swiftly deconjugated by the bacterial population, and then subjected to further intestinal bacterial chemical modifications. The most significant of these modifications are 7α-dehydroxylations which form secondary bile acids (deoxycholate and lithocholate). Much research has illuminated the 7α-dehydroxylation pathway: of particular interest is the bile acid inducible operon, for which Clostridium scindens VPI 12708 serves as the model …


Searching For The Binding Partners For The Novel Phkg1 Variant Γ 181, Kishore Polireddy Aug 2009

Searching For The Binding Partners For The Novel Phkg1 Variant Γ 181, Kishore Polireddy

Masters Theses & Specialist Projects

No abstract provided.


Identification Of Synechococcus Sp. Iu 625 Phycocyanin Gene And Bioinformatic Analyses Of Cyanobacterial Phycocyanin., Tin-Chun Chu, Aline Oliveira, Arti Rana, Lee Lee Jun 2009

Identification Of Synechococcus Sp. Iu 625 Phycocyanin Gene And Bioinformatic Analyses Of Cyanobacterial Phycocyanin., Tin-Chun Chu, Aline Oliveira, Arti Rana, Lee Lee

Tin-Chun Chu, Ph.D.

No abstract provided.


Subset Quantile Normalization Using Negative Control Features, Zhijin Wu Jun 2009

Subset Quantile Normalization Using Negative Control Features, Zhijin Wu

Johns Hopkins University, Dept. of Biostatistics Working Papers

No abstract provided.


Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry May 2009

Frozen Robust Multi-Array Analysis (Frma), Matthew N. Mccall, Benjamin M. Bolstad, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Robust Multi-array Analysis (RMA) is the most widely used preprocessing algorithm for Affymetrix and Nimblegen gene-expression microarrays. RMA performs background correction, normalization, and summarization in a modular way. The last two steps require multiple arrays to be analyzed simultaneously. The ability to borrow information across samples provides RMA various advantages. For example, the summarization step fits a parametric model that accounts for probe-effects, assumed to be fixed across arrays, and improves outlier detection. Residuals, obtained from the fitted model, permit the creation of useful quality metrics. However, the dependence on multiple arrays has two drawbacks: (1) RMA can- not be …


Minimum Criteria For Dna Damage-Induced Phase Advances In Circadian Rhythms, Christian I. Hong, Judit Zámborszky, Attila Csikász-Nagy May 2009

Minimum Criteria For Dna Damage-Induced Phase Advances In Circadian Rhythms, Christian I. Hong, Judit Zámborszky, Attila Csikász-Nagy

Dartmouth Scholarship

Robust oscillatory behaviors are common features of circadian and cell cycle rhythms. These cyclic processes, however, behave distinctively in terms of their periods and phases in response to external influences such as light, temperature, nutrients, etc. Nevertheless, several links have been found between these two oscillators. Cell division cycles gated by the circadian clock have been observed since the late 1950s. On the other hand, ionizing radiation (IR) treatments cause cells to undergo a DNA damage response, which leads to phase shifts (mostly advances) in circadian rhythms. Circadian gating of the cell cycle can be attributed to the cell cycle …


Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Resampling-Based Multiple Hypothesis Testing With Applications To Genomics: New Developments In The R/Bioconductor Package Multtest, Houston N. Gilbert, Katherine S. Pollard, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

The multtest package is a standard Bioconductor package containing a suite of functions useful for executing, summarizing, and displaying the results from a wide variety of multiple testing procedures (MTPs). In addition to many popular MTPs, the central methodological focus of the multtest package is the implementation of powerful joint multiple testing procedures. Joint MTPs are able to account for the dependencies between test statistics by effectively making use of (estimates of) the test statistics joint null distribution. To this end, two additional bootstrap-based estimates of the test statistics joint null distribution have been developed for use in the …


Evaluation Of Statistical Methods For Normalization And Differential Expression In Mrna-Seq Experiments, James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Sandrine Dudoit Apr 2009

Evaluation Of Statistical Methods For Normalization And Differential Expression In Mrna-Seq Experiments, James H. Bullard, Elizabeth A. Purdom, Kasper D. Hansen, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

The focus of this article is on the design and analysis of mRNA-Seq experiments, with the aim of inferring transcript levels and identifying differentially expressed genes. We investigate two mRNA-Seq datasets obtained using Illumina's Genome Analyzer platform to measure transcript levels in reference samples considered in the MicroArray Quality Control (MAQC) Project. We address the following four main issues: (1) exploratory data analysis for mapped reads, relating read counts to variables describing input samples and genomic regions of interest; (2) assessment and quantitation of biological effects (e.g., expression levels in Brain vs. UHR) and nuisance experimental effects (e.g., library preparation, …


Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed Apr 2009

Gene Set Enrichment Analysis Made Simple, Rafael A. Irizarry, Chi Wang, Yun Zhou, Terence P. Speed

Johns Hopkins University, Dept. of Biostatistics Working Papers

Among the many applications of microarray technology, one of the most popular is the identification of genes that are differentially expressed in two conditions. A common statistical approach is to quantify the interest of each gene with a p-value, adjust these p-values for multiple comparisons, chose an appropriate cut-off, and create a list of candidate genes. This approach has been criticized for ignoring biological knowledge regarding how genes work together. Recently a series of methods, that do incorporate biological knowledge, have been proposed. However, many of these methods seem overly complicated. Furthermore, the most popular method, Gene Set Enrichment Analysis …


Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani Apr 2009

Generalized Liquid Association, Yen-Yi Ho, Leslie Cope, Thomas A. Louis, Giovanni Parmigiani

Johns Hopkins University, Dept. of Biostatistics Working Papers

The analysis of interactions among a group of genes is fundamental to fur- ther our understanding of their biological interactions in a cell. Several studies suggested that the co-expression relationship of two genes can be modulated by a third controller gene. These controller genes and the corresponding modulated co-expressed gene pairs are the subjects of interests in this study. This described \controller-modulated genes" three-way interactions is referred as liquid association in the literature. Analysis of gene expression data has suggested that these interactions are present in many biological systems.

To quantify the magnitude of liquid association for a given gene …


Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit Apr 2009

Joint Multiple Testing Procedures For Graphical Model Selection With Applications To Biological Networks, Houston N. Gilbert, Mark J. Van Der Laan, Sandrine Dudoit

U.C. Berkeley Division of Biostatistics Working Paper Series

Gaussian graphical models have become popular tools for identifying relationships between genes when analyzing microarray expression data. In the classical undirected Gaussian graphical model setting, conditional independence relationships can be inferred from partial correlations obtained from the concentration matrix (= inverse covariance matrix) when the sample size n exceeds the number of parameters p which need to estimated. In situations where n < p, another approach to graphical model estimation may rely on calculating unconditional (zero-order) and first-order partial correlations. In these settings, the goal is to identify a lower-order conditional independence graph, sometimes referred to as a ‘0-1 graphs’. For either choice of graph, model selection may involve a multiple testing problem, in which edges in a graph are drawn only after rejecting hypotheses involving (saturated or lower-order) partial correlation parameters. Most multiple testing procedures applied in previously proposed graphical model selection algorithms rely on standard, marginal testing methods which do not take into account the joint distribution of the test statistics derived from (partial) correlations. We propose and implement a multiple testing framework useful when testing for edge inclusion during graphical model selection. Two features of our methodology include (i) a computationally efficient and asymptotically valid test statistics joint null distribution derived from influence curves for correlation-based parameters, and (ii) the application of empirical Bayes joint multiple testing procedures which can effectively control a variety of popular Type I error rates by incorpo- rating joint null distributions such as those described here (Dudoit and van der Laan, 2008). Using a dataset from Arabidopsis thaliana, we observe that the use of more sophisticated, modular approaches to multiple testing allows one to identify greater numbers of edges when approximating an undirected graphical model using a 0-1 graph. Our framework may also be extended to edge testing algorithms for other types of graphical models (e.g., for classical undirected, bidirected, and directed acyclic graphs).


Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore Mar 2009

Multifactor Dimensionality Reduction Analysis Identifies Specific Nucleotide Patterns Promoting Genetic Polymorphisms, Eric Arehart, Scott Gleim, Bill White, John Hwa, Jason H. Moore

Dartmouth Scholarship

The fidelity of DNA replication serves as the nidus for both genetic evolution and genomic instability fostering disease. Single nucleotide polymorphisms (SNPs) constitute greater than 80% of the genetic variation between individuals. A new theory regarding DNA replication fidelity has emerged in which selectivity is governed by base-pair geometry through interactions between the selected nucleotide, the complementary strand, and the polymerase active site. We hypothesize that specific nucleotide combinations in the flanking regions of SNP fragments are associated with mutation.


A Novel Topology For Representing Protein Folds, Mark R. Segal Mar 2009

A Novel Topology For Representing Protein Folds, Mark R. Segal

COBRA Preprint Series

Various topologies for representing three dimensional protein structures have been advanced for purposes ranging from prediction of folding rates to ab initio structure prediction. Examples include relative contact order, Delaunay tessellations, and backbone torsion angle distributions. Here we introduce a new topology based on a novel means for operationalizing three dimensional proximities with respect to the underlying chain. The measure involves first interpreting a rank-based representation of the nearest neighbors of each residue as a permutation, then determining how perturbed this permutation is relative to an unfolded chain. We show that the resultant topology provides improved association with folding and …


Ab Initio Exon Definition Using An Information Theory-Based Approach, Peter K. Rogan Mar 2009

Ab Initio Exon Definition Using An Information Theory-Based Approach, Peter K. Rogan

Biochemistry Publications

Transcribed exons in genes are joined together at donor and acceptor splice sites precisely and efficiently to generate mRNAs capa ble of being translated into proteins. The sequence variability in individual splice sites can be modeled using Shannon information theory. In the laboratory, the degree of individual splice site use is inferred from the structures of mRNAs and their relative abundance. These structures can be predicted using a bipartite information theory framework that is guided by current knowledge of biological mechanisms for exon recognition. We present the results of this analysis for the complete dataset of all expressed human exons.


Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin Jan 2009

Sparse Linear Discriminant Analysis For Simultaneous Testing For The Significance Of A Gene Set/Pathway And Gene Selection, Michael C. Wu, Lingson Zhang, Zhaoxi Wang, David C. Christiani, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Letter From The Dean, Lalit Verma Jan 2009

Letter From The Dean, Lalit Verma

Discovery, The Student Journal of Dale Bumpers College of Agricultural, Food and Life Sciences

No abstract provided.


Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal Dec 2008

Identification Of Yeast Transcriptional Regulation Networks Using Multivariate Random Forests, Yuanyuan Xiao, Mark Segal

Mark R Segal

The recent availability of whole-genome scale data sets that investigate complementary and diverse aspects of transcriptional regulation has spawned an increased need for new and effective computational approaches to analyze and integrate these large scale assays. Here, we propose a novel algorithm, based on random forest methodology, to relate gene expression (as derived from expression microarrays) to sequence features residing in gene promoters (as derived from DNA motif data) and transcription factor binding to gene promoters (as derived from tiling microarrays). We extend the random forest approach to model a multivariate response as represented, for example, by time-course gene expression …


The Evolution Of Flight In Insects: Insights From Mayflies And Dna, T. Heath Ogden Dec 2008

The Evolution Of Flight In Insects: Insights From Mayflies And Dna, T. Heath Ogden

T. Heath Ogden

No abstract provided.


Towards A New Paradigm In Mayfly Phylogeny (Ephemeroptera): Combined Analysis Of Morphological And Molecular Data, T. Heath Ogden Dec 2008

Towards A New Paradigm In Mayfly Phylogeny (Ephemeroptera): Combined Analysis Of Morphological And Molecular Data, T. Heath Ogden

T. Heath Ogden

This study represents the first formal morphological and combined (morphological and molecular) phylogenetic analyses of the order Ephemeroptera. Taxonomic sampling comprised 112 species in 107 genera, including 42 recognized families (all major lineages of Ephemeroptera). Morphological data consisted of 101 morphological characters. Molecular data were acquired from DNA sequences of the 12S, 16S, 18S, 28S and H3 genes. The Asian genus Siphluriscus (Siphluriscidae) was supported as sister to all other mayflies. The lineages Carapacea, Furcatergalia, Fossoriae, Pannota, Caenoidea and Ephemerelloidea were supported as monophyletic, as were many of the families. However, some recognized families (for example, Ameletopsidae and Coloburiscidae) and …


Combined Morphological And Molecular Phylogeny Of Ephemerellidae (Ephemeroptera), T. Heath Ogden Dec 2008

Combined Morphological And Molecular Phylogeny Of Ephemerellidae (Ephemeroptera), T. Heath Ogden

T. Heath Ogden

This study represents the first combined molecular and morphological analysis for the mayfly family Ephemerellidae (Ephemeroptera), with a focus on the relationships of genera and species groups of the subfamily Ephemerellinae. The phylogeny was constructed based on DNA sequence data from 3 nuclear (18S rDNA, 28S rDNA, histone H3) and 2 mitochondrial (12S rDNA, 16S rDNA) genes, and 23 morphological characters. Taxon sampling for Ephemerellidae included exemplars from all 25 extant genus groups and additional representatives from those genera with the highest diversity. Ephemerellidae appears to consist of three major clades. Ephemerella, the largest genus of Ephemerellidae, and Serratella were …