Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 31 - 36 of 36

Full-Text Articles in Entire DC Network

Semi-Supervised Conditional Random Fields For Improved Sequence Segmentation And Labeling, Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Greiner, Dale Schuurmans Jan 2006

Semi-Supervised Conditional Random Fields For Improved Sequence Segmentation And Labeling, Feng Jiao, Shaojun Wang, Chi-Hoon Lee, Russell Greiner, Dale Schuurmans

Kno.e.sis Publications

We present a new semi-supervised training procedure for conditional random fields (CRFs) that can be used to train sequence segmentors and labelers from a combination of labeled and unlabeled training data. Our approach is based on extending the minimum entropy regularization framework to the structured prediction case, yielding a training objective that combines unlabeled conditional entropy with labeled conditional likelihood. Although the training objective is no longer concave, it can still be used to improve an initial model (e.g. obtained from supervised training) by iterative ascent. We apply our new training algorithm to the problem of identifying gene and protein …


An Investigation Of Codon Usage Bias Including Visualization And Quantification In Organisms Exhibiting Multiple Biases, Douglas W. Raiford, Travis E. Doom, Dan E. Krane, Michael L. Raymer Jan 2006

An Investigation Of Codon Usage Bias Including Visualization And Quantification In Organisms Exhibiting Multiple Biases, Douglas W. Raiford, Travis E. Doom, Dan E. Krane, Michael L. Raymer

Kno.e.sis Publications

Prokaryotic genomic sequence data provides a rich resource for bioinformatic analytic algorithms. Information can be extracted in many ways from the sequence data. One often overlooked process involves investigating an organism’s codon usage. Degeneracy in the genetic code leads to multiple codons coding for the same amino acids. Organism’s often preferentially utilize specific codons when coding for an amino acid. This biased codon usage can be a useful trait when predicting a gene’s expressivity or whether the gene originated from horizontal transfer. There can be multiple biases at play in a genome causing errors in the predictive process. For this …


Clustering Similarity Comparison Using Density Profiles, Eric Bae, James Bailey, Guozhu Dong Jan 2006

Clustering Similarity Comparison Using Density Profiles, Eric Bae, James Bailey, Guozhu Dong

Kno.e.sis Publications

The unsupervised nature of cluster analysis means that objects can be clustered in many ways, allowing different clustering algorithms to generate vastly different results. To address this, clustering comparison methods have traditionally been used to quantify the degree of similarity between alternative clusterings. However, existing techniques utilize only the point memberships to calculate the similarity, which can lead to unintuitive results. They also cannot be applied to analyze clusterings which only partially share points, which can be the case in stream clustering. In this paper we introduce a new measure named ADCO, which takes into account density profiles for each …


Predicting Domain Specific Entities With Limited Background Knowledge, Christopher Thomas, Amit P. Sheth Jan 2006

Predicting Domain Specific Entities With Limited Background Knowledge, Christopher Thomas, Amit P. Sheth

Kno.e.sis Publications

This paper proposes a framework for automatic recognition of domain-specific entities from text, given limited background knowledge, e.g. in form of an ontology. The algorithm exploits several lightweight natural language processing techniques, such as tokenization and stemming, as well as statistical techniques, such as singular value decomposition (SVD) to suggest domain relatedness of unknown entities.


Driving Deep Semantics In Middleware And Networks: What, Why And How?, Amit P. Sheth Jan 2006

Driving Deep Semantics In Middleware And Networks: What, Why And How?, Amit P. Sheth

Kno.e.sis Publications

No abstract provided.


Knowledge Modeling And Its Application In Life Sciences: A Tale Of Two Ontologies, Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William S. York, Samir Tartir Jan 2006

Knowledge Modeling And Its Application In Life Sciences: A Tale Of Two Ontologies, Satya S. Sahoo, Christopher Thomas, Amit P. Sheth, William S. York, Samir Tartir

Kno.e.sis Publications

High throughput glycoproteomics, similar to genomics and proteomics, involves extremely large volumes of distributed, heterogeneous data as a basis for identification and quantification of a structurally diverse collection of biomolecules. The ability to share, compare, query for and most critically correlate datasets using the native biological relationships are some of the challenges being faced by glycobiology researchers. As a solution for these challenges, we are building a semantic structure, using a suite of ontologies, which supports management of data and information at each step of the experimental lifecycle. This framework will enable researchers to leverage the large scale of glycoproteomics …