Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics

Series

2006

Institution
Keyword
Publication
File Type

Articles 1 - 30 of 77

Full-Text Articles in Life Sciences

The Plant Structure Ontology, A Unified Vocabulary Of Anatomy And Morphology Of A Flowering Plant, Katica Ilic, Elizabeth Kellogg, Pankaj Jaiswal, Felipe Zapata, Peter Stevens, Leszek Vincent, Shulamit Avraham, Leonore Reiser, Anuradha Pujar, Martin Sachs, Noah Whitman, Susan Mccouch, Mary Schaeffer, Doreen Ware, Lincoln Stein, Seung Rhee Dec 2006

The Plant Structure Ontology, A Unified Vocabulary Of Anatomy And Morphology Of A Flowering Plant, Katica Ilic, Elizabeth Kellogg, Pankaj Jaiswal, Felipe Zapata, Peter Stevens, Leszek Vincent, Shulamit Avraham, Leonore Reiser, Anuradha Pujar, Martin Sachs, Noah Whitman, Susan Mccouch, Mary Schaeffer, Doreen Ware, Lincoln Stein, Seung Rhee

Biology Department Faculty Works

Formal description of plant phenotypes and standardized annotation of gene expression and protein localization data require uniform terminology that accurately describes plant anatomy and morphology. This facilitates cross species comparative studies and quantitative comparison of phenotypes and expression patterns. A major drawback is variable terminology that is used to describe plant anatomy and morphology in publications and genomic databases for different species. The same terms are sometimes applied to different plant structures in different taxonomic groups. Conversely, similar structures are named by their species-specific terms. To address this problem, we created the Plant Structure Ontology (PSO), the first generic ontological …


High Sensitivity Rna Pseudoknot Prediction, Xiaolu Huang, Hesham Ali Dec 2006

High Sensitivity Rna Pseudoknot Prediction, Xiaolu Huang, Hesham Ali

Information Systems and Quantitative Analysis Faculty Publications

Most ab initio pseudoknot predicting methods provide very few folding scenarios for a given RNA sequence and have low sensitivities. RNA researchers, in many cases, would rather sacrifice the specificity for a much higher sensitivity for pseudoknot detection. In this study, we introduce the Pseudoknot Local Motif Model and Dynamic Partner Sequence Stacking (PLMM_DPSS) algorithm which predicts all PLM model pseudoknots within an RNA sequence in a neighboring-region-interferencefree fashion. The PLM model is derived from the existing Pseudobase entries. The innovative DPSS approach calculates the optimally lowest stacking energy between two partner sequences. Combined with the Mfold, PLMM_DPSS can also …


Development Of Computations In Bioscience And Bioinformatics And Its Application: Review Of The Symposium Of Computations In Bioinformatics And Bioscience (Scbb06), Youping Deng, Jun Ni, Chaoyang Zhang Dec 2006

Development Of Computations In Bioscience And Bioinformatics And Its Application: Review Of The Symposium Of Computations In Bioinformatics And Bioscience (Scbb06), Youping Deng, Jun Ni, Chaoyang Zhang

Faculty Publications

The first symposium of computations in bioinformatics and bioscience (SCBB06) was held in Hangzhou, China on June 21-22, 2006. Twenty-six peer-reviewed papers were selected for publication in this special issue of BMC Bioinformatics. These papers cover a broad range of topics including bioinformatics theories, algorithms, applications and tool development. The main technical topics contain gene expression analysis, sequence analysis, genome analysis, phylogenetic analysis, gene function prediction, molecular interaction and system biology, genetics and population study, immune strategy, protein structure prediction and proteomics.


Svm Classifier: A Comprehensive Java Interface For Support Vector Machine Classification Of Microarray Data, Mehdi Pirooznia, Youping Deng Dec 2006

Svm Classifier: A Comprehensive Java Interface For Support Vector Machine Classification Of Microarray Data, Mehdi Pirooznia, Youping Deng

Faculty Publications

Motivation

Graphical user interface (GUI) software promotes novelty by allowing users to extend the functionality. SVM Classifier is a cross-platform graphical application that handles very large datasets well. The purpose of this study is to create a GUI application that allows SVM users to perform SVM training, classification and prediction.

Results

The GUI provides user-friendly access to state-of-the-art SVM methods embodied in the LIBSVM implementation of Support Vector Machine. We implemented the java interface using standard swing libraries.

We used a sample data from a breast cancer study for testing classification accuracy. We achieved 100% accuracy in classification among the …


Implicit Online Learning With Kernels, Li Cheng, S. V. N. Vishwanathan, Dale Schuurmans, Shaojun Wang, Terry Caelli Dec 2006

Implicit Online Learning With Kernels, Li Cheng, S. V. N. Vishwanathan, Dale Schuurmans, Shaojun Wang, Terry Caelli

Kno.e.sis Publications

We present two new algorithms for online learning in reproducing kernel Hilbert spaces. Our first algorithm, ILK (implicit online learning with kernels), employs a new, implicit update technique that can be applied to a wide variety of convex loss functions. We then introduce a bounded memory version, SILK (sparse ILK), that maintains a compact representation of the predictor without compromising solution quality, even in non-stationary environments. We prove loss bounds and analyze the convergence rate of both. Experimental evidence shows that our proposed algorithms outperform current methods on synthetic and real data.


Regression Cubes With Lossless Compression And Aggregation, Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah, Jianyong Wang Dec 2006

Regression Cubes With Lossless Compression And Aggregation, Yixin Chen, Guozhu Dong, Jiawei Han, Jian Pei, Benjamin W. Wah, Jianyong Wang

Kno.e.sis Publications

As OLAP engines are widely used to support multidimensional data analysis, it is desirable to support in data cubes advanced statistical measures, such as regression and filtering, in addition to the traditional simple measures such as count and average. Such new measures will allow users to model, smooth, and predict the trends and patterns of data. Existing algorithms for simple distributive and algebraic measures are inadequate for efficient computation of statistical measures in a multidimensional space. In this paper, we propose a fundamentally new class of measures, compressible measures, in order to support efficient computation of the statistical models. For …


Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh Nov 2006

Semiparametric Regression Of Multi-Dimensional Genetic Pathway Data: Least Squares Kernel Machines And Linear Mixed Models, Dawei Liu, Xihong Lin, Debashis Ghosh

Harvard University Biostatistics Working Paper Series

No abstract provided.


Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann Nov 2006

Penalized Likelihood And Bayesian Methods For Sparse Contingency Tables: An Analysis Of Alternative Splicing In Full-Length Cdna Libraries, Corinne Dahinden, Giovanni Parmigiani, Mark C. Emerick, Peter Buhlmann

Johns Hopkins University, Dept. of Biostatistics Working Papers

We develop methods to perform model selection and parameter estimation in loglinear models for the analysis of sparse contingency tables to study the interaction of two or more factors. Typically, datasets arising from so-called full-length cDNA libraries, in the context of alternatively spliced genes, lead to such sparse contingency tables. Maximum Likelihood estimation of log-linear model coefficients fails to work because of zero cell entries. Therefore new methods are required to estimate the coefficients and to perform model selection. Our suggestions include computationally efficient penalization (Lasso-type) approaches as well as Bayesian methods using MCMC. We compare these procedures in a …


Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch Nov 2006

Multiple Testing With An Empirical Alternative Hypothesis, James E. Signorovitch

Harvard University Biostatistics Working Paper Series

An optimal multiple testing procedure is identified for linear hypotheses under the general linear model, maximizing the expected number of false null hypotheses rejected at any significance level. The optimal procedure depends on the unknown data-generating distribution, but can be consistently estimated. Drawing information together across many hypotheses, the estimated optimal procedure provides an empirical alternative hypothesis by adapting to underlying patterns of departure from the null. Proposed multiple testing procedures based on the empirical alternative are evaluated through simulations and an application to gene expression microarray data. Compared to a standard multiple testing procedure, it is not unusual for …


Active Semantic Electronic Medical Record, Amit P. Sheth, Sangeeta Agrawal, Jonathan Lathem, Nicole Oldham, H. Wingate, K. Gallagher Nov 2006

Active Semantic Electronic Medical Record, Amit P. Sheth, Sangeeta Agrawal, Jonathan Lathem, Nicole Oldham, H. Wingate, K. Gallagher

Kno.e.sis Publications

The healthcare industry is rapidly advancing towards the widespread use of electronic medical records systems to manage the increasingly large amount of patient data and reduce medical errors. In addition to patient data there is a large amount of data describing procedures, treatments, diagnoses, drugs, insurance plans, coverage, formularies and the relationships between these data sets. While practices have benefited from the use of EMRs, infusing these essential programs with rich domain knowledge and rules can greatly enhance their performance and ability to support clinical decisions. Active Semantic Electronic Medical Record (ASEMR) application discussed here uses Semantic Web technologies to …


{Ontology: Resource} X {Matching : Mapping} X {Schema : Instance} :: Components Of The Same Challenge, Amit P. Sheth Nov 2006

{Ontology: Resource} X {Matching : Mapping} X {Schema : Instance} :: Components Of The Same Challenge, Amit P. Sheth

Kno.e.sis Publications

Ontologies enable us to elevate syntactic and structural processing in an information system/Web to an information system/Web powered with semantic processing. Experience has shown that monolithic and tightly coupled approaches seldom succeed, and majority of information systems and applications will need to deal with plurality of ontologies in a loosely coupled environment (i.e., independently evolving ontologies and inter-ontology relationships, existence of different contexts for different users/applications etc.) Development of such loosely-coupled multi-ontology environments entails development of techniques for ontology mapping/alignment, multi-ontology query processing, and much more.


How To Reason With Owl In A Logic Programming System, Markus Krotzsch, Pascal Hitzler, Denny Vrandecic, Michael Sintek Nov 2006

How To Reason With Owl In A Logic Programming System, Markus Krotzsch, Pascal Hitzler, Denny Vrandecic, Michael Sintek

Computer Science and Engineering Faculty Publications

Logic programming has always been a major ontology modeling paradigm, and is frequently being used in large research projects and industrial applications, e.g., by means of the F-Logic reasoning engine OntoBroker or the TRIPLE query, inference, and transformation language and system. At the same time, the Web Ontology Language OWL has been recommended by the W3C for modeling ontologies for the Web. Naturally, it is desirable to investigate the interoperability between both paradigms. In this paper, we do so by studying an expressive fragment of OWL DL for which reasoning can be reduced to the evaluation of Horn logic programs. …


On The Complexity Of Horn Description Logics, Markus Krotzsch, Sebastian Rudolph, Pascal Hitzler Nov 2006

On The Complexity Of Horn Description Logics, Markus Krotzsch, Sebastian Rudolph, Pascal Hitzler

Computer Science and Engineering Faculty Publications

Horn-SHIQ has been identified as a fragment of the description logic SHIQ for which inferencing is in PTIME with respect to the size of the ABox. This enables reasoning with larger ABoxes in situations where the TBox is static, and represents one approach towards tractable description logic reasoning. In this paper, we show that reasoning in Horn-SHIQ, in spite of its low datacomplexity, is ExpTIME-hard with respect to the overall size of the knowledge base. While this result is not unexpected, the proof is not a mere modification of existing reductions since …


A Framework For Schema-Driven Relationship Discovery From Unstructured Text, Cartic Ramakrishnan, Krzysztof Kochut, Amit P. Sheth Nov 2006

A Framework For Schema-Driven Relationship Discovery From Unstructured Text, Cartic Ramakrishnan, Krzysztof Kochut, Amit P. Sheth

Kno.e.sis Publications

We address the issue of extracting implicit and explicit relationships between entities in biomedical text. We argue that entities seldom occur in text in their simple form and that relationships in text relate the modified, complex forms of entities with each other. We present a rule-based method for (1) extraction of such complex entities and (2) relationships between them and (3) the conversion of such relationships into RDF. Furthermore, we present results that clearly demonstrate the utility of the generated RDF in discovering knowledge from text corpora by means of locating paths composed of the extracted relationships.


Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry Oct 2006

Estimating Genome-Wide Copy Number Using Allele Specific Mixture Models, Wenyi Wang , Benilton Caravalho, Nate Miller, Jonathan Pevsner, Aravinda Chakravarti, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

Genomic changes such as copy number alterations are thought to be one of the major underlying causes of human phenotypic variation among normal and disease subjects [23,11,25,26,5,4,7,18]. These include chromosomal regions with so-called copy number alterations: instead of the expected two copies, a section of the chromosome for a particular individual may have zero copies (homozygous deletion), one copy (hemizygous deletions), or more than two copies (amplifications). The canonical example is Down syndrome which is caused by an extra copy of chromosome 21. Identification of such abnormalities in smaller regions has been of great interest, because it is believed to …


Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli Oct 2006

Exploration Of Distributional Models For A Novel Intensity-Dependent Normalization , Nicola Lama, Patrizia Boracchi, Elia Mario Biganzoli

COBRA Preprint Series

Currently used gene intensity-dependent normalization methods, based on regression smoothing techniques, usually approach the two problems of location bias detrending and data re-scaling without taking into account the censoring characteristic of certain gene expressions produced by experiment measurement constraints or by previous normalization steps. Moreover, the bias vs variance balance control of normalization procedures is not often discussed but left to the user's experience. Here an approximate maximum likelihood procedure to fit a model smoothing the dependences of log-fold gene expression differences on average gene intensities is presented. Central tendency and scaling factor were modeled by means of B-splines smoothing …


Single Molecule Detection Systems And Methods, John G. K. Williams, Gregory R. Bashford Oct 2006

Single Molecule Detection Systems And Methods, John G. K. Williams, Gregory R. Bashford

Biomedical Imaging and Biosignal Analysis Laboratory

A micofluidic system is provided that includes a substrate, a first microchannel disposed in the substrate for providing a reactant to a reaction zone, a second microchannel disposed in the substrate, the third microchannel providing fluid communication between the first and second microchannels. The system also typically includes first and second electrodes, positioned at opposite ends of the second microchannel, for providing an electric field within the second microchannel. In operation, when the reactant is in the reaction zone, a reaction product is produced having a net electric charge different from the electric of the reactant.


A Fourier Transformation Based Method To Mine Peptide Space For Antimicrobial Activity, Vijayaraj Nagarajan, Navodit Kaushik, Beddhu Murali, Chaoyang Zhang, Sanyogita Lakhera, Mohamed O. Elasri, Youping Deng Sep 2006

A Fourier Transformation Based Method To Mine Peptide Space For Antimicrobial Activity, Vijayaraj Nagarajan, Navodit Kaushik, Beddhu Murali, Chaoyang Zhang, Sanyogita Lakhera, Mohamed O. Elasri, Youping Deng

Faculty Publications

Background

Naturally occurring antimicrobial peptides are currently being explored as potential candidate peptide drugs. Since antimicrobial peptides are part of the innate immune system of every living organism, it is possible to discover new candidate peptides using the available genomic and proteomic data. High throughput computational techniques could also be used to virtually scan the entire peptide space for discovering out new candidate antimicrobial peptides.

Result

We have identified a unique indexing method based on biologically distinct characteristic features of known antimicrobial peptides. Analysis of the entries in the antimicrobial peptide databases, based on our indexing method, using Fourier transformation …


Semantic Interoperability Of Web Services - Challenges And Experiences, Meenakshi Nagarajan, Kunal Verma, Amit P. Sheth, John A. Miller, Jonathan Lathem Sep 2006

Semantic Interoperability Of Web Services - Challenges And Experiences, Meenakshi Nagarajan, Kunal Verma, Amit P. Sheth, John A. Miller, Jonathan Lathem

Kno.e.sis Publications

With the rising popularity of Web services, both academia and industry have invested considerably in Web service description standards, discovery, and composition techniques. The standards based approach utilized by Web services has supported interoperability at the syntax level. However, issues of structural and semantic heterogeneity between messages exchanged by Web services are far more complex and crucial to interoperability. It is for these reasons that we recognize the value that schema/data mappings bring to Web service descriptions. In this paper, we examine challenges to interoperability; classify the types of heterogeneities that can occur between interacting services and present a possible …


Optimal Adaptation In Web Processes With Coordination Constraints, Kunal Verma, Prashant Doshi, Karthik Gomadam, John A. Miller, Amit P. Sheth Sep 2006

Optimal Adaptation In Web Processes With Coordination Constraints, Kunal Verma, Prashant Doshi, Karthik Gomadam, John A. Miller, Amit P. Sheth

Kno.e.sis Publications

We present methods for optimally adapting Web processes to exogenous events while preserving inter-service constraints that necessitate coordination. For example, in a supply chain process, orders placed by a manufacturer may get delayed in arriving. In response to this event, the manufacturer has the choice of either waiting out the delay or changing the supplier. Additionally, there may be compatibility constraints between the different orders, thereby introducing the problem of coordination between them if the manufacturer chooses to change the suppliers. We focus on formulating the decision making models of the managers, who must adapt to external events while satisfying …


Flexible Querying Of Xml Documents, Krishnaprasad Thirunarayan, Trivikram Immaneni Sep 2006

Flexible Querying Of Xml Documents, Krishnaprasad Thirunarayan, Trivikram Immaneni

Kno.e.sis Publications

Text search engines are inadequate for indexing and searching XML documents because they ignore metadata and aggregation structure implicit in the XML documents. On the other hand, the query languages supported by specialized XML search engines are very complex. In this paper, we present a simple yet flexible query language, and develop its semantics to enable intuitively appealing extraction of relevant fragments of information while simultaneously falling back on retrieval through plain text search if necessary. We also present a simple yet robust relevance ranking for heterogeneous document-centric XML.


Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng Aug 2006

Structural Inference In Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Xihong Lin, Donglin Zeng

Harvard University Biostatistics Working Paper Series

No abstract provided.


Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin Aug 2006

Estimation In Semiparametric Transition Measurement Error Models For Longitudinal Data, Wenqin Pan, Donglin Zeng, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin Aug 2006

Nonparametric Regression Using Local Kernel Estimating Equations For Correlated Failure Time Data, Zhangsheng Yu, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin Aug 2006

Causal Inference In Hybrid Intervention Trials Involving Treatment Choice, Qi Long, Rod Little, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin Aug 2006

A Comparison Of Methods For Estimating The Causal Effect Of A Treatment In Randomized Clinical Trials Subject To Noncompliance, Rod Little, Qi Long, Xihong Lin

Harvard University Biostatistics Working Paper Series

No abstract provided.


Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li Aug 2006

Group Additive Regression Models For Genomic Data Analysis, Yihui Luan, Hongzhe Li

UPenn Biostatistics Working Papers

One important problem in genomic research is to identify genomic features such as gene expression data or DNA single nucleotide polymorphisms (SNPs) that are related to clinical phenotypes. Often these genomic data can be naturally divided into biologically meaningful groups such as genes belonging to the same pathways or SNPs within genes. In this paper, we propose group additive regression models and a group gradient descent boosting procedure for identifying groups of genomic features that are related to clinical phenotypes. Our simulation results show that by dividing the variables into appropriate groups, we can obtain better identification of the group …


Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman Aug 2006

Extensions To Gene Set Enrichment, Zhen Jiang, Robert Gentleman

Bioconductor Project Working Papers

Motivation: Gene Set Enrichment Analysis (GSEA) has been developed recently to capture moderate but coordinated changes in the expression of sets of functionally related genes. We propose number of extensions to GSEA, which uses different statistics to describe the association between genes and phenotype of interest. We make use of dimension reduction procedures, such as principle component analysis to identify gene sets containing coordinated genes. We also address the problem of overlapping among gene sets in this paper.

Results: We applied our methods to the data come from a clinical trial in acute lymphoblastic leukemia (ALL) [1]. We identified interesting …


Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice Jul 2006

Fdr And Bayesian Multiple Comparisons Rules, Peter Muller, Giovanni Parmigiani, Kenneth Rice

Johns Hopkins University, Dept. of Biostatistics Working Papers

We discuss Bayesian approaches to multiple comparison problems, using a decision theoretic perspective to critically compare competing approaches. We set up decision problems that lead to the use of FDR-based rules and generalizations. Alternative definitions of the probability model and the utility function lead to different rules and problem-specific adjustments. Using a loss function that controls realized FDR we derive an optimal Bayes rule that is a variation of the Benjamini and Hochberg (1995) procedure. The cutoff is based on increments in ordered posterior probabilities instead of ordered p- values. Throughout the discussion we take a Bayesian perspective. In particular, …


Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry Jul 2006

Exploration, Normalization, And Genotype Calls Of High Density Oligonucleotide Snp Array Data, Benilton Carvalho, Terence P. Speed, Rafael A. Irizarry

Johns Hopkins University, Dept. of Biostatistics Working Papers

In most microarray technologies, a number of critical steps are required to convert raw intensity measurements into the data relied upon by data analysts, biologists and clinicians. These data manipulations, referred to as preprocessing, can influence the quality of the ultimate measurements. In the last few years, the high-throughput measurement of gene expression is the most popular application of microarray technology. For this application, various groups have demonstrated that the use of modern statistical methodology can substantially improve accuracy and precision of gene expression measurements, relative to ad-hoc procedures introduced by designers and manufacturers of the technology. Currently, other applications …