Open Access. Powered by Scholars. Published by Universities.®

Life Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Genetics and Genomics

PDF

SelectedWorks

Protein domain annotation, Gene prediction

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Life Sciences

Pitfalls Of Ascertainment Biases In Genome Annotations—Computing Comparable Protein Domain Distributions In Eukarya, Arli A. Parikesit, Lydia Steiner, Peter F. Stadler, Sonja J. Prohaska Jan 2014

Pitfalls Of Ascertainment Biases In Genome Annotations—Computing Comparable Protein Domain Distributions In Eukarya, Arli A. Parikesit, Lydia Steiner, Peter F. Stadler, Sonja J. Prohaska

Arli A Parikesit

Most investigations into the large-scale patterns of protein evolution are based on gene annotations that have been compiled in reference databases. The use of these resources for quantitative comparisons, however, is complicated by sometimes vast differences in coverage. More importantly, however, we also observe substantial ascertainment biases that cannot be removed by simple normalization procedures. A striking example is provided by the correlations between protein domains. We observe that statistics derived from different computational gene annotation procedure show dramatic discrepancies, and even qualitative changes from negative to positive correlation, when compared to statistics obtained from annotation databases.


Evolution And Quantitative Comparison Of Genome-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska Jan 2011

Evolution And Quantitative Comparison Of Genome-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska

Arli A Parikesit

The metabolic and regulatory capabilities of an organism are implicit in its protein content. This is often hard to estimate, however, due to ascertainment biases inherent in the available genome annotations. Its complement of recognizable functional protein domains and their combinations convey essentially the same information and at the same time are much more readily accessible, although protein domain models trained for one phylogenetic group frequently fail on distantly related sequences. Pooling related domain models based on their GO-annotation in combination with de novo gene prediction methods provides estimates that seem to be less affected by phylogenetic biases. We show …


Quantitative Comparison Of Genomic-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska Jan 2010

Quantitative Comparison Of Genomic-Wide Protein Domain Distributions, Arli A. Parikesit, Peter F. Stadler, Sonja J. Prohaska

Arli A Parikesit

Investigations into the origins and evolution of regulatory mechanisms require quantitative estimates of the abundance and co-occurrence of functional protein domains among distantly related genomes. Currently available databases, such as the SUPERFAMILY, are not designed for quantitative comparisons since they are built upon transcript and protein annotations provided by the various different genome annotation projects. Large biases are introduced by the differences in genome annotation protocols, which strongly depend on the availability of transcript information and well-annotated closely related organisms. Here we show that the combination of de novo gene predictors and subsequent HMM-based annotation of SCOP domains in the …