Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Electrical and Computer Engineering

2005

BRC

Articles 1 - 4 of 4

Full-Text Articles in Engineering

Truncated Profile Hidden Markov Models, Jennifer A. Smith Nov 2005

Truncated Profile Hidden Markov Models, Jennifer A. Smith

Electrical and Computer Engineering Faculty Publications and Presentations

The profile hidden Markov model (HMM) is a powerful method for remote homolog database search. However, evaluating the score of each database sequence against a profile HMM is computationally demanding. The computation time required for score evaluation is proportional to the number of states in the profile HMM. This paper examines whether the number of states can be truncated without reducing the ability of the HMM to find proteins containing members of a protein domain family. A genetic algorithm (GA) is presented which finds a good truncation of the HMM states. The results of using truncation on searches of the …


A Study Of Style Effects On Ocr Errors In The Medline Database, Penny Garrison, Diane Davis, Tim Andersen, Elisa Barney Smith Jan 2005

A Study Of Style Effects On Ocr Errors In The Medline Database, Penny Garrison, Diane Davis, Tim Andersen, Elisa Barney Smith

Electrical and Computer Engineering Faculty Publications and Presentations

The National Library of Medicine has developed a system for the automatic extraction of data from scanned journal articles to populate the MEDLINE database. Although the 5-engine OCR system used in this process exhibits good performance overall, it does make errors in character recognition that must be corrected in order for the process to achieve the requisite accuracy. The correction process works by feeding words that have characters with less than 100% confidence (as determined automatically by the OCR engine) to a human operator who then must manually verify the word or correct the error. The majority of these errors …


Text Degradations And Ocr Training, Elisa H. Barney Smith, Tim Andersen Jan 2005

Text Degradations And Ocr Training, Elisa H. Barney Smith, Tim Andersen

Electrical and Computer Engineering Faculty Publications and Presentations

Printing and scanning of text documents introduces degradations to the characters which can be modeled. Interestingly, certain combinations of the parameters that govern the degradations introduced by the printing and scanning process affect characters in such a way that the degraded characters have a similar appearance, while other degradations leave the characters with an appearance that is very different. It is well known that (generally speaking) a test set that more closely matches a training set will be recognized with higher accuracy than one that matches the training set less well. Likewise, classifiers tend to perform better on data sets …


Searching For Protein Classification Features, Jennifer A. Smith Jan 2005

Searching For Protein Classification Features, Jennifer A. Smith

Electrical and Computer Engineering Faculty Publications and Presentations

A genetic algorithm is used to search for a set of classification features for a protein superfamily which is as unique as possible to the superfamily. These features may then be used for very fast classification of a query sequence into a protein superfamily. The features are based on windows onto modified consensus sequences of multiple aligned members of a training set for the protein superfamily. The efficacy of the method is demonstrated using receiver operating characteristic (ROC) values and the performance of resulting algorithm is compared with other database search algorithms.