Open Access. Powered by Scholars. Published by Universities.®

Bioinformatics Commons

Open Access. Powered by Scholars. Published by Universities.®

2004

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 30 of 37

Full-Text Articles in Bioinformatics

An Svd-Based Comparison Of Nine Whole Eukaryotic Genomes Supports A Coelomate Rather Than Ecdysozoan Lineage, Gary W. Stuart, Michael W. Berry Dec 2004

An Svd-Based Comparison Of Nine Whole Eukaryotic Genomes Supports A Coelomate Rather Than Ecdysozoan Lineage, Gary W. Stuart, Michael W. Berry

Faculty Publications and Other Works -- General Biology

Background

Eukaryotic whole genome sequences are accumulating at an impressive rate. Effective methods for comparing multiple whole eukaryotic genomes on a large scale are needed. Most attempted solutions involve the production of large scale alignments, and many of these require a high stringency pre-screen for putative orthologs in order to reduce the effective size of the dataset and provide a reasonably high but unknown fraction of correctly aligned homologous sites for comparison. As an alternative, highly efficient methods that do not require the pre-alignment of operationally defined orthologs are also being explored.

Results

A non-alignment method based on the Singular …


Incremental Genetic K-Means Algorithm And Its Application In Gene Expression Data Analysis, Yi Lu, Shiyong Lu, Farhad Fotouhi, Youping Deng, Susan J. Brown Oct 2004

Incremental Genetic K-Means Algorithm And Its Application In Gene Expression Data Analysis, Yi Lu, Shiyong Lu, Farhad Fotouhi, Youping Deng, Susan J. Brown

Faculty Publications

Background

In recent years, clustering algorithms have been effectively applied in molecular biology for gene expression data analysis. With the help of clustering algorithms such as K-means, hierarchical clustering, SOM, etc, genes are partitioned into groups based on the similarity between their expression profiles. In this way, functionally related genes are identified. As the amount of laboratory data in molecular biology grows exponentially each year due to advanced technologies such as Microarray, new efficient and effective methods for clustering must be developed to process this growing amount of biological data.

Results

In this paper, we propose a new clustering algorithm, …


Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh Oct 2004

Finding Cancer Subtypes In Microarray Data Using Random Projections, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

One of the benefits of profiling of cancer samples using microarrays is the generation of molecular fingerprints that will define subtypes of disease. Such subgroups have typically been found in microarray data using hierarchical clustering. A major problem in interpretation of the output is determining the number of clusters. We approach the problem of determining disease subtypes using mixture models. A novel estimation procedure of the parameters in the mixture model is developed based on a combination of random projections and the expectation-maximization algorithm. Because the approach is probabilistic, our approach provides a measure for the number of true clusters …


Semantic Web Technology In Support Of Bioinformatics For Glycan Expression, Amit P. Sheth, William S. York, Christopher Thomas, Meenakshi Nagarajan, John A. Miller, Krzysztof Kochut, Satya S. Sahoo, Xiaochuan Yi Oct 2004

Semantic Web Technology In Support Of Bioinformatics For Glycan Expression, Amit P. Sheth, William S. York, Christopher Thomas, Meenakshi Nagarajan, John A. Miller, Krzysztof Kochut, Satya S. Sahoo, Xiaochuan Yi

Kno.e.sis Publications

Due to the complexity of biological systems, interpretation of data obtained by a single experimental approach can often be interpreted only if viewed from a broader context, taking into account the information obtained by many diverse techniques. The vast amount of interpreted experimental data that is now available via the internet opens the possibility of collecting the relevant pieces of information that will enable scientists to form hypotheses based on the integration of this diverse information. However, the sheer volume of data that is available makes it very difficult to select the information necessary to make a coherent model of …


Lsdis: Large Scale Distributed Information Systems Lab, Amit P. Sheth Oct 2004

Lsdis: Large Scale Distributed Information Systems Lab, Amit P. Sheth

Kno.e.sis Publications

The LSDIS (Large Scale Distributed Information Systems) lab was established in 1994 with the guidance and direction provided by Dr. Amit P. Sheth with the help of Dr. John A. Miller and Dr. Krzysztof J. Kochut. In 1998 this faculty group was further strengthened by the addition of Dr. Ismailcem B. Arpinar. LSDIS is the largest research group in Computer Science at UGA and one of the strongest in its area. During Fall 2004, it is funding 15 students (majority of them PhD), and has one research staff.

Over the years LSDIS has been actively involved in research projects in …


Logic Programs, Iterated Function Systems, And Recurrent Radial Basis Function Networks, Sebastian Bader, Pascal Hitzler Sep 2004

Logic Programs, Iterated Function Systems, And Recurrent Radial Basis Function Networks, Sebastian Bader, Pascal Hitzler

Computer Science and Engineering Faculty Publications

Graphs of the single-step operator for first-order logic programs—displayed in the real plane—exhibit self-similar structures known from topological dynamics, i.e., they appear to be fractals, or more precisely, attractors of iterated function systems. We show that this observation can be made mathematically precise. In particular, we give conditions which ensure that those graphs coincide with attractors of suitably chosen iterated function systems, and conditions which allow the approximation of such graphs by iterated function systems or by fractal interpolation. Since iterated function systems can easily be encoded using recurrent radial basis function networks, we eventually obtain connectionist systems which …


Flash Artifact Suppression In Two-Dimensional Ultrasound Imaging, Richard Yung Chiao, Gregory Ray Bashford, Mark Peter Feilen, Cynthia Andrews Owen Jul 2004

Flash Artifact Suppression In Two-Dimensional Ultrasound Imaging, Richard Yung Chiao, Gregory Ray Bashford, Mark Peter Feilen, Cynthia Andrews Owen

Biomedical Imaging and Biosignal Analysis Laboratory

Flash artifacts in ultrasound flow images are suppressed to achieve enhanced flow discrimination. Flash artifacts typically occur as region of elevated signal strength (brightness or equivalent color) within an image. A flash suppression algorithm included the steps of estimating the flash within an image and then suppressing the estimated flash. The mechanism for flash suppression is spatial filtering. An extension of this basic method used information from adjacent frames to estimate the flash and/or to smooth the resulting image sequence. Temporal information from adjacent frames is used as an adjunct to improve performance.


Enhancing Web Services Description And Discovery To Facilitate Composition, Preeda Rajasekaran, John A. Miller, Kunal Verma, Amit P. Sheth Jul 2004

Enhancing Web Services Description And Discovery To Facilitate Composition, Preeda Rajasekaran, John A. Miller, Kunal Verma, Amit P. Sheth

Kno.e.sis Publications

Web services are in the midst of making the transition from being a promising technology to being widely used in the industry. However, most efforts to use Web services have been manual, thus slowing down the ever changing and dynamic businesses of today. In this paper, we contend that more expressive descriptions of Web services will lead to greater automation and thus provide more agility to businesses. We present the METEOR-S front-end tools for source code annotation and semantic Web service description generation. We also present WSDL-S, a language created for incorporating semantic descriptions in the industry wide accepted WSDL, …


Learning Mixture Models With The Regularized Latent Maximum Entropy Principle, Shaojun Wang, Dale Schuurmans, Fuchun Peng, Yunxin Zhao Jul 2004

Learning Mixture Models With The Regularized Latent Maximum Entropy Principle, Shaojun Wang, Dale Schuurmans, Fuchun Peng, Yunxin Zhao

Kno.e.sis Publications

This paper presents a new approach to estimating mixture models based on a recent inference principle we have proposed: the latent maximum entropy principle (LME). LME is different from Jaynes' maximum entropy principle, standard maximum likelihood, and maximum a posteriori probability estimation. We demonstrate the LME principle by deriving new algorithms for mixture model estimation, and show how robust new variants of the expectation maximization (EM) algorithm can be developed. We show that a regularized version of LME (RLME), is effective at estimating mixture models. It generally yields better results than plain LME, which in turn is often better than …


Workflow Management Systems And Erp Systems: Differences, Commonalities, And Applications, Jorge Cardoso, Robert P. Bostrom, Amit P. Sheth Jul 2004

Workflow Management Systems And Erp Systems: Differences, Commonalities, And Applications, Jorge Cardoso, Robert P. Bostrom, Amit P. Sheth

Kno.e.sis Publications

Two important classes of information systems, Workflow Management Systems(WfMSs) and Enterprise Resource Planning (ERP) systems, have been used to support e-business process redesign, integration, and management. While both technologies can help with business process automation, data transfer, and information sharing, the technological approach and features of solutions provided by WfMS and ERP are different. Currently, there is a lack of understanding of these two classes of information systems in the industry and academia, thus hindering their effective applications. In this paper, we present a comprehensive comparison between these two classes of systems. We discuss how the two types of systems …


Discovery Of Web Services In A Federated Registry Environment, Kaarthik Sivashanmugam, Kunal Verma, Amit P. Sheth Jul 2004

Discovery Of Web Services In A Federated Registry Environment, Kaarthik Sivashanmugam, Kunal Verma, Amit P. Sheth

Kno.e.sis Publications

The potential of a large scale growth of private and semi-private registries is creating the need for an infrastructure which can support discovery and publication over a group of autonomous registries. Recent versions of UDDI have made changes to accommodate interactions between distributed registries. In this paper, we discuss METEOR-S Web service Discovery Infrastructure, which provides an ontology-based infrastructure to access a group of registries that are divided based on business domains and grouped into federations. We also discuss how Web service discovery is carried out within a federation.


Differential Expression With The Bioconductor Project, Anja Von Heydebreck, Wolfgang Huber, Robert Gentleman Jun 2004

Differential Expression With The Bioconductor Project, Anja Von Heydebreck, Wolfgang Huber, Robert Gentleman

Bioconductor Project Working Papers

A basic, yet challenging task in the analysis of microarray gene expression data is the identification of changes in gene expression that are associated with particular biological conditions. We discuss different approaches to this task and illustrate how they can be applied using software from the Bioconductor Project. A central problem is the high dimensionality of gene expression space, which prohibits a comprehensive statistical analysis without focusing on particular aspects of the joint distribution of the genes expression levels. Possible strategies are to do univariate gene-by-gene analysis, and to perform data-driven nonspecific filtering of genes before the actual statistical analysis. …


Mechanisms And Integration Of Signal Pathway: A Role For Calpains?, Dorothy E. Croall Jun 2004

Mechanisms And Integration Of Signal Pathway: A Role For Calpains?, Dorothy E. Croall

University of Maine Office of Research Administration: Grant Reports

In order to survive cells must sense and respond to changes in their environment. Environmental cues trigger a variety of events within cells. The concentration and movements of calcium ions are essential regulators of many of these cellular responses. Proper control of intracellular calcium is essential because at thigh levels calcium can lead to cell damage or death. Calcium accomplishes it effects through binding to specific proteins such as calmodulin and calpain. Calmodulin, named for its ability to bind calcium and to modulate the activity of other cellular components, is an important mediator of calcium signals and its mechanism of …


Sweto: Large-Scale Semantic Web Test-Bed, Boanerges Aleman-Meza, Chris Halaschek, Amit P. Sheth, I. Budak Arpinar, Gowtham Sannapareddy Jun 2004

Sweto: Large-Scale Semantic Web Test-Bed, Boanerges Aleman-Meza, Chris Halaschek, Amit P. Sheth, I. Budak Arpinar, Gowtham Sannapareddy

Kno.e.sis Publications

The emergent Semantic Web community needs a common infrastructure for testing the scalability and quality of new techniques and software which use machine processable data. Since ontologies are a centerpiece of most approaches, we believe that for an accurate evaluation of tools for quality, scalability and performance, the research community needs a freely available ontology with a large description base. If the use of tools is to be for advanced semantic applications, such as those in business intelligence and national security, then instances in the knowledge base should be highly interconnected. Thus, we propose and describe a Semantic WEb Technology …


Statistical Analyses And Reproducible Research, Robert Gentleman, Duncan Temple Lang May 2004

Statistical Analyses And Reproducible Research, Robert Gentleman, Duncan Temple Lang

Bioconductor Project Working Papers

For various reasons, it is important, if not essential, to integrate the computations and code used in data analyses, methodological descriptions, simulations, etc. with the documents that describe and rely on them. This integration allows readers to both verify and adapt the statements in the documents. Authors can easily reproduce them in the future, and they can present the document's contents in a different medium, e.g. with interactive controls. This paper describes a software framework for authoring and distributing these integrated, dynamic documents that contain text, code, data, and any auxiliary content needed to recreate the computations. The documents are …


A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer May 2004

A Model Based Background Adjustment For Oligonucleotide Expression Arrays, Zhijin Wu, Rafael A. Irizarry, Robert Gentleman, Francisco Martinez Murillo, Forrest Spencer

Johns Hopkins University, Dept. of Biostatistics Working Papers

High density oligonucleotide expression arrays are widely used in many areas of biomedical research. Affymetrix GeneChip arrays are the most popular. In the Affymetrix system, a fair amount of further pre-processing and data reduction occurs following the image processing step. Statistical procedures developed by academic groups have been successful at improving the default algorithms provided by the Affymetrix system. In this paper we present a solution to one of the pre-processing steps, background adjustment, based on a formal statistical framework. Our solution greatly improves the performance of the technology in various practical applications.

Affymetrix GeneChip arrays use short oligonucleotides to …


Semantic Web Technology Evaluation Ontology (Sweto): A Test Bed For Evaluating Tools And Benchmarking Applications, Boanerges Aleman-Meza, Amit P. Sheth, I. Budak Arpinar, Chris Halaschek May 2004

Semantic Web Technology Evaluation Ontology (Sweto): A Test Bed For Evaluating Tools And Benchmarking Applications, Boanerges Aleman-Meza, Amit P. Sheth, I. Budak Arpinar, Chris Halaschek

Kno.e.sis Publications

No abstract provided.


Reproducible Research: A Bioinformatics Case Study, Robert Gentleman May 2004

Reproducible Research: A Bioinformatics Case Study, Robert Gentleman

Bioconductor Project Working Papers

While scientific research and the methodologies involved have gone through substantial technological evolution the technology involved in the publication of the results of these endeavors has remained relatively stagnant. Publication is largely done in the same manner today as it was fifty years ago. Many journals have adopted electronic formats, however, their orientation and style is little different from a printed document. The documents tend to be static and take little advantage of computational resources that might be available. Recent work, Gentleman and Temple Lang (2004), suggests a methodology and basic infrastructure that can be used to publish documents in …


Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman May 2004

Classification Using Generalized Partial Least Squares, Beiying Ding, Robert Gentleman

Bioconductor Project Working Papers

The advances in computational biology have made simultaneous monitoring of thousands of features possible. The high throughput technologies not only bring about a much richer information context in which to study various aspects of gene functions but they also present challenge of analyzing data with large number of covariates and few samples. As an integral part of machine learning, classification of samples into two or more categories is almost always of interest to scientists. In this paper, we address the question of classification in this setting by extending partial least squares (PLS), a popular dimension reduction tool in chemometrics, in …


Indexing Genomic Databases, Gina Cooper, Michael L. Raymer, Travis E. Doom, Dan E. Krane, Natsuhiko Futamura May 2004

Indexing Genomic Databases, Gina Cooper, Michael L. Raymer, Travis E. Doom, Dan E. Krane, Natsuhiko Futamura

Kno.e.sis Publications

Current biological sequence comparison tools utilize full database searches to find approximate matches between a database and a query. A new approach to sequence comparisons can be performed by indexing the database using a novel indexing scheme. An indexed scheme can immediately eliminate highly mismatched sequences thereby improving performance and accuracy. iBlast is proposed as an indexed version of BLAST. In its initial implementation, iBlast uses a sequence-based index to catalog genomic databases in an NCR Teradata RDBMS. Several types of indexes and querying methods are explored to determine the most efficient solution utilizing the parallel nature of the Teradata …


Sempl: A Semantic Portal, Matthew Perry, Eric Stiles May 2004

Sempl: A Semantic Portal, Matthew Perry, Eric Stiles

Kno.e.sis Publications

Semantic Web technology is intended for the retrieval, collection, and analysis of meaningful data with significant automation afforded by machine understandability of data [1]. As one illustration of semantic web technology in action, we present SEMPL, a semantic web portal for the Large Scale Distributed Information Systems lab (LSDIS) at the University of Georgia. SEMPL, which is powered by a state of the art commercial system, Semagix Freedom [7], uses an ontology-driven approach to provide semantic browsing, linking, and contextual querying of content within the portal. By using the ontology based information integration technique, SEMPL can specify the context of …


Meteor-S Web Service Annotation Framework, Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth, Kunal Verma May 2004

Meteor-S Web Service Annotation Framework, Abhijit A. Patil, Swapna A. Oundhakar, Amit P. Sheth, Kunal Verma

Kno.e.sis Publications

The World Wide Web is emerging not only as an infrastructure for data, but also for a broader variety of resources that are increasingly being made available as Web services. Relevant current standards like UDDI, WSDL, and SOAP are in their fledgling years and form the basis of making Web services a workable and broadly adopted technology. However, realizing the fuller scope of the promise of Web services and associated service oriented architecture will requite further technological advances in the areas of service interoperation, service discovery, service composition, and process orchestration. Semantics, especially as supported by the use of ontologies, …


Quality Of Service For Workflows And Web Service Processes, Jorge Cardoso, Amit P. Sheth, John A. Miller, Jonathan Arnold, Krzysztof J. Kochut Apr 2004

Quality Of Service For Workflows And Web Service Processes, Jorge Cardoso, Amit P. Sheth, John A. Miller, Jonathan Arnold, Krzysztof J. Kochut

Kno.e.sis Publications

Workflow management systems (WfMSs) have been used to support various types of business processes for more than a decade now. In workflows or Web processes for e-commerce and Web service applications, suppliers and customers define a binding agreement or contract between the two parties, specifying quality of service (QoS) items such as products or services to be delivered, deadlines, quality of products, and cost of services. The management of QoS metrics directly impacts the success of organizations participating in e-commerce. Therefore, when services or products are created or managed using workflows or Web processes, the underlying workflow engine must accept …


Prediction Of Rna-Binding Proteins From Primary Sequence By A Support Vector Machine Approach., Lian Yi Han, Cong Zhong Cai, Siaw Ling Lo, Maxey Chung, Yu Zong Chen Mar 2004

Prediction Of Rna-Binding Proteins From Primary Sequence By A Support Vector Machine Approach., Lian Yi Han, Cong Zhong Cai, Siaw Ling Lo, Maxey Chung, Yu Zong Chen

Research Collection School Of Computing and Information Systems

Elucidation of the interaction of proteins with different molecules is of significance in the understanding of cellular processes. Computational methods have been developed for the prediction of protein-protein interactions. But insufficient attention has been paid to the prediction of protein-RNA interactions, which play central roles in regulating gene expression and certain RNA-mediated enzymatic processes. This work explored the use of a machine learning method, support vector machines (SVM), for the prediction of RNA-binding proteins directly from their primary sequence. Based on the knowledge of known RNA-binding and non-RNA-binding proteins, an SVM system was trained to recognize RNA-binding proteins. A total …


Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh Feb 2004

Mixture Models For Assessing Differential Expression In Complex Tissues Using Microarray Data, Debashis Ghosh

The University of Michigan Department of Biostatistics Working Paper Series

The use of DNA microarrays has become quite popular in many scientific and medical disciplines, such as in cancer research. One common goal of these studies is to determine which genes are differentially expressed between cancer and healthy tissue, or more generally, between two experimental conditions. A major complication in the molecular profiling of tumors using gene expression data is that the data represent a combination of tumor and normal cells. Much of the methodology developed for assessing differential expression with microarray data has assumed that tissue samples are homogeneous. In this article, we outline a general framework for determining …


Immunity Regulatory Dnas Share Common Organizational Features In Drosophila, Kate Senger, Grant W. Armstrong, William J. Rowell, Jennifer M. Kwan, Michele Markstein, Michael Levine Jan 2004

Immunity Regulatory Dnas Share Common Organizational Features In Drosophila, Kate Senger, Grant W. Armstrong, William J. Rowell, Jennifer M. Kwan, Michele Markstein, Michael Levine

Michele Markstein

Infection results in the rapid activation of immunity genes in the Drosophila fat body. Two classes of transcription factors have been implicated in this process: the REL-containing proteins, Dorsal, Dif, and Relish, and the GATA factor Serpent. Here we present evidence that REL-GATA synergy plays a pervasive role in the immune response. SELEX assays identified consensus binding sites that permitted the characterization of several immunity regulatory DNAs. The distribution of REL and GATA sites within these DNAs suggests that most or all fat-specific immunity genes contain a common organization of regulatory elements: closely linked REL and GATA binding sites positioned …


Service Oriented Architectures And Semantic Web Processes, Francisco Cubera, Kunal Verma, Amit P. Sheth Jan 2004

Service Oriented Architectures And Semantic Web Processes, Francisco Cubera, Kunal Verma, Amit P. Sheth

Kno.e.sis Publications

No abstract provided.


Semantic Web Research Center Report: Lsdis Lab, Research In Semantic Bioinformatics, Semantic Analytics And Semantic Web Processes, Amit P. Sheth Jan 2004

Semantic Web Research Center Report: Lsdis Lab, Research In Semantic Bioinformatics, Semantic Analytics And Semantic Web Processes, Amit P. Sheth

Kno.e.sis Publications

The LSDIS Lab advances the field of distributed information systems by researching semantic techniques for exploiting heterogeneous multimedia information and improving processes, encompassing the central promise of the Semantic Web initiative. It is pursuing cutting edge research in ontology development for demanding scientific domains, semantic heterogeneity and integration, complex relationships discovery and semantic analytics, and Semantic Web services and processes. Past work of the LSDIS lab can be characterized by keywords: semantic interoperability, syntactic and semantic metadata for text and digital media, metadata based integration of Web content, ontology-driven information systems, multi-ontology query processing, transactional workflows and workflow management. Significant …


Bioconductor: Open Software Development For Computational Biology And Bioinformatics, Robert C. Gentleman, Vincent J. Carey, Douglas J. Bates, Benjamin M. Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Guenther Sawitzki, Colin Smith, Gordon K. Smyth, Luke Tierney, Yee Hwa Yang, Jianhua Zhang Jan 2004

Bioconductor: Open Software Development For Computational Biology And Bioinformatics, Robert C. Gentleman, Vincent J. Carey, Douglas J. Bates, Benjamin M. Bolstad, Marcel Dettling, Sandrine Dudoit, Byron Ellis, Laurent Gautier, Yongchao Ge, Jeff Gentry, Kurt Hornik, Torsten Hothorn, Wolfgang Huber, Stefano Iacus, Rafael Irizarry, Friedrich Leisch, Cheng Li, Martin Maechler, Anthony J. Rossini, Guenther Sawitzki, Colin Smith, Gordon K. Smyth, Luke Tierney, Yee Hwa Yang, Jianhua Zhang

Bioconductor Project Working Papers

The Bioconductor project is an initiative for the collaborative creation of extensible software for computational biology and bioinformatics. We detail some of the design decisions, software paradigms and operational strategies that have allowed a small number of researchers to provide a wide variety of innovative, extensible, software solutions in a relatively short time. The use of an object oriented programming paradigm, the adoption and development of a software package system, designing by contract, distributed development and collaboration with other projects are elements of this project's success. Individually, each of these concepts are useful and important but when combined they have …


A Framework For Implementing Bioinformatics Knowledge-Exploration Systems, John A. Hayes Jan 2004

A Framework For Implementing Bioinformatics Knowledge-Exploration Systems, John A. Hayes

Dissertations, Theses, and Masters Projects

No abstract provided.