Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Biology computing (7)
- Covariance models (6)
- Genetic algorithms (6)
- Bioinformatics (5)
- Covariance analysis (3)
-
- RNA database search (3)
- Macromolecules (2)
- Non-coding RNA gene search (2)
- Proteins (2)
- Sequence analysis (2)
- Approximation theory (1)
- Artificial intelligence (1)
- Biophysics (1)
- CMOS digital integrated circuits (1)
- Computational intelligence (1)
- Coprocessors high level synthesis (1)
- Covariance model (1)
- Database search (1)
- Decision trees (1)
- Dynamic programming (1)
- Fuzzy systems (1)
- Genetics (1)
- Genetics macromolecules (1)
- Hidden Markov models (1)
- Linear programming (1)
- Molecular (1)
- Non-coding RNA (1)
- Pattern classification (1)
- Query processing (1)
- RNA gene search (1)
- File Type
Articles 1 - 16 of 16
Full-Text Articles in Engineering
Non-Coding Rna Covariance Model Combination Using Mixed Primary-Secondary Structure Alignment, Jennifer Smith
Non-Coding Rna Covariance Model Combination Using Mixed Primary-Secondary Structure Alignment, Jennifer Smith
Jennifer A. Smith
Covariance models are very effective for finding new members of non-coding RNA sequence families in genomic data. However, the computation burden of applying CM-based search algorithms can be prohibitive. When annotating the genome of a newly sequenced organism it is usually desired to search the sequence data using a large number of ncRNA families. Computational burden can be reduced if the families are clustered into statistically similar models and a single cluster-average representative model produced. The database is then searched with the representative model for each cluster at a relatively low detection threshold. The output of this pre-filtered database is …
Integrating Thermodynamic And Observed-Frequency Data For Non-Coding Rna Gene Search, Jennifer Smith, Kay Wiese
Integrating Thermodynamic And Observed-Frequency Data For Non-Coding Rna Gene Search, Jennifer Smith, Kay Wiese
Jennifer A. Smith
Among the most powerful and commonly used methods for finding new members of non-coding RNA gene families in genomic data are covariance models. The parameters of these models are estimated from the observed position-specific frequencies of insertions, deletions, and mutations in a multiple alignment of known non-coding RNA family members. Since the vast majority of positions in the multiple alignment have no observed changes, yet there is no reason to rule them out, some form of prior is applied to the estimate. Currently, observed-frequency priors are generated from non-family members based on model node type and child node type allowing …
Efficient Non-Coding Rna Gene Searches Through Classical And Evolutionary Methods, Jennifer Smith
Efficient Non-Coding Rna Gene Searches Through Classical And Evolutionary Methods, Jennifer Smith
Jennifer A. Smith
Successful non-coding RNA gene searching requires examination of long-range intramolecular base pairing possibilities. This results in search algorithms with extremely long run times such that large-scale use of the algorithms often becomes computationally infeasible. Methods for the efficient search of the solution space are examined. A review of the standard dynamic-programming covariance model search algorithm is given. An analysis of the statistically probable regions of the search space is undertaken and a method of limiting the traditional dynamic-programming algorithm to this region is shown. An alternative search method using a Genetic Algorithm (GA) which favours the probable region of the …
Joint Loop End Modeling Improves Covariance Model Based Non-Coding Rna Gene Search, Jennifer Smith
Joint Loop End Modeling Improves Covariance Model Based Non-Coding Rna Gene Search, Jennifer Smith
Jennifer A. Smith
The effect of more detailed modeling of the interface between stem and loop in non-coding RNA hairpin structures on efficacy of covariance-model-based non-coding RNA gene search is examined. Currently, the prior probabilities of the two stem nucleotides and two loop-end nucleotides at the interface are treated the same as any other stem and loop nucleotides respectively. Laboratory thermodynamic studies show that hairpin stability is dependent on the identities of these four nucleotides, but this is not taken into account in current covariance models. It is shown that separate estimation of emission priors for these nucleotides and joint treatment of substitution …
Computation Intelligence Method To Find Generic Non-Coding Rna Search Models, Jennifer A. Smith
Computation Intelligence Method To Find Generic Non-Coding Rna Search Models, Jennifer A. Smith
Jennifer A. Smith
Fairly effective methods exist for finding new noncoding RNA genes using search models based on known families of ncRNA genes (for example covariance models). However, these models only find new members of the existing families and are not useful in finding potential members of novel ncRNA families. Other problems with family-specific search include large processing requirements, ambiguity in defining which sequences form a family and lack of sufficient numbers of known sequences to properly estimate model parameters. An ncRNA search model is proposed which includes a collection of non-overlapping RNA hairpin structure covariance models. The hairpin models are chosen from …
Rna Search With Decision Trees And Partial Covariance Models, Jennifer A. Smith
Rna Search With Decision Trees And Partial Covariance Models, Jennifer A. Smith
Jennifer A. Smith
The use of partial covariance models to search for RNA family members in genomic sequence databases is explored. The partial models are formed from contiguous subranges of the overall RNA family multiple alignment columns. A binary decision-tree framework is presented for choosing the order to apply the partial models and the score thresholds on which to make the decisions. The decision trees are chosen to minimize computation time subject to the constraint that all of the training sequences are passed to the full covariance model for final evaluation. Computational intelligence methods are suggested to select the decision tree since the …
Improved Covariance Model Parameter Estimation Using Rna Thermodynamic Properties, Jennifer A. Smith, Kay C. Wiese
Improved Covariance Model Parameter Estimation Using Rna Thermodynamic Properties, Jennifer A. Smith, Kay C. Wiese
Jennifer A. Smith
Covariance models are a powerful description of non-coding RNA (ncRNA) families that can be used to search nucleotide databases for new members of these ncRNA families. Currently, estimation of the parameters of a covariance model (state transition and emission scores) is based only on the observed frequencies of mutations, insertions, and deletions in known ncRNA sequences. For families with very few known members, this can result in rather uninformative models where the consensus sequence has a good score and most deviations from consensus have a fairly uniform poor score. It is proposed here to combine the traditional observed-frequency information with …
Protein Family Classification Using Structural And Sequence Information, Jennifer A. Smith
Protein Family Classification Using Structural And Sequence Information, Jennifer A. Smith
Jennifer A. Smith
Protein family classification usually relies on sequence information (as in the case of hidden Markov models and position-specific scoring matrices) or on structural information where some sort of average positional error between the atomic locations is used. The positional error method requires that the structure of all the proteins to be classified is known. Sequence methods have the advantage that a much larger number of proteins can be classified (since far more sequences are know than structures). However, sequence methods discard a large amount of useful information contained in the structures of the subset of proteins in the family for …
Rna Gene Finding With Biased Mutation Operators, Jennifer A. Smith
Rna Gene Finding With Biased Mutation Operators, Jennifer A. Smith
Jennifer A. Smith
The use of genetic algorithms for non-coding RNA gene finding has previously been investigated and found to be a potentially viable method for accelerating covariance-model-based database search relative to full dynamic-programming methods. The mutation operators in previous work chose new alignment insertion and deletion locations uniformly over the length of the model consensus sequence. Since the covariance models are estimated from multiple known members of a non-coding RNA family, information is available as to the likelihood of insertions or deletions at the individual model positions. This information is implicit in the state-transition parameters of the estimated covariance models. In the …
Searching For Protein Classification Features, Jennifer A. Smith
Searching For Protein Classification Features, Jennifer A. Smith
Jennifer A. Smith
A genetic algorithm is used to search for a set of classification features for a protein superfamily which is as unique as possible to the superfamily. These features may then be used for very fast classification of a query sequence into a protein superfamily. The features are based on windows onto modified consensus sequences of multiple aligned members of a training set for the protein superfamily. The efficacy of the method is demonstrated using receiver operating characteristic (ROC) values and the performance of resulting algorithm is compared with other database search algorithms.
Accelerated Non-Coding Rna Searches With Covariance Model Approximations, Jennifer A. Smith
Accelerated Non-Coding Rna Searches With Covariance Model Approximations, Jennifer A. Smith
Jennifer A. Smith
Covariance models (CMs) are a very sensitive tool for finding non-coding RNA (ncRNA) genes in DNA sequence data. However, CMs are extremely slow. One reason why CMs are so slow is that they allow all possible combinations of insertions and deletions relative to the consensus model even though the vast majority of these are never seen in practice. In this paper we examine reduction in the number of states in covariance models. A simplified CM with reduced states which can be scored much faster is introduced. A comparison of the results of a full CM versus a reduced-state model found …
Truncated Profile Hidden Markov Models, Jennifer A. Smith
Truncated Profile Hidden Markov Models, Jennifer A. Smith
Jennifer A. Smith
The profile hidden Markov model (HMM) is a powerful method for remote homolog database search. However, evaluating the score of each database sequence against a profile HMM is computationally demanding. The computation time required for score evaluation is proportional to the number of states in the profile HMM. This paper examines whether the number of states can be truncated without reducing the ability of the HMM to find proteins containing members of a protein domain family. A genetic algorithm (GA) is presented which finds a good truncation of the HMM states. The results of using truncation on searches of the …
A Genetic Algorithms Approach To Non-Coding Rna Gene Searches, Jennifer A. Smith
A Genetic Algorithms Approach To Non-Coding Rna Gene Searches, Jennifer A. Smith
Jennifer A. Smith
A genetic algorithm is proposed as an alternative to the traditional linear programming method for scoring covariance models in non-coding RNA (ncRNA) gene searches. The standard method is guaranteed to find the best score, but it is too slow for general use. The observation that most of the search space investigated by the linear programming method does not even remotely resemble any observed sequence in real sequence data can be used to motivate the use of genetic algorithms (GAs) to quickly reject regions of the search space. A search space with many local minima makes gradient decent an unattractive alternative. …
An Asynchronous Gals Interface With Applications, Jennifer A. Smith
An Asynchronous Gals Interface With Applications, Jennifer A. Smith
Jennifer A. Smith
A low-latency asynchronous interface for use in globally-asynchronous locally-synchronous (GALS) integrated circuits is presented. The interface is compact and does not alter the local clocks of the interfaced local clock domains in any way (unlike many existing GALS interfaces). Two applications of the interface to GALS systems are shown. The first is a single-chip shared-memory multiprocessor for generic supercomputing use. The second is an application-specific coprocessor for hardware acceleration of the Smith-Waterman algorithm. This is a bioinformatics algorithm used for sequence alignment (similarity searching) between DNA or amino acid (protein) sequences and sequence databases such as the recently completed human …
Covariance Searches For Ncrna Gene Finding, Jennifer A. Smith
Covariance Searches For Ncrna Gene Finding, Jennifer A. Smith
Jennifer A. Smith
The use of covariance models for non-coding RNA gene finding is extremely powerful and also extremely computationally demanding. A major reason for the high computational burden of this algorithm is that the search proceeds through every possible start position in the database and every possible sequence length between zero and a user-defined maximum length at every one of these start positions. Furthermore, for every start position and sequence length, all possible combinations of insertions and deletions leading to the given sequence length are searched. It has been previously shown that a large portion of this search space is nowhere near …
Efficient Non-Coding Rna Gene Searches Through Classical And Evolutionary Methods, Jennifer Smith
Efficient Non-Coding Rna Gene Searches Through Classical And Evolutionary Methods, Jennifer Smith
Jennifer A. Smith
Successful non-coding RNA gene searching requires examination of long-range intramolecular base pairing possibilities. This results in search algorithms with extremely long run times such that large-scale use of the algorithms often becomes computationally infeasible. Methods for the efficient search of the solution space are examined. A review of the standard dynamic-programming covariance model search algorithm is given. An analysis of the statistically probable regions of the search space is undertaken and a method of limiting the traditional dynamic-programming algorithm to this region is shown. An alternative search method using a Genetic Algorithm (GA) which favours the probable region of the …