Open Access. Powered by Scholars. Published by Universities.®

Medicine and Health Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Engineering

Faculty & Staff Scholarship

K-mers

Articles 1 - 2 of 2

Full-Text Articles in Medicine and Health Sciences

Ssaw: A New Sequence Similarity Analysis Method Based On The Stationary Discrete Wavelet Transform, Jie Lin, Donald Adjeroh, Bing-Hua Jiang, Yue Jiang Jan 2018

Ssaw: A New Sequence Similarity Analysis Method Based On The Stationary Discrete Wavelet Transform, Jie Lin, Donald Adjeroh, Bing-Hua Jiang, Yue Jiang

Faculty & Staff Scholarship

Background: Alignment-free sequence similarity analysis methods often lead to significant savings in computational time over alignment-based counterparts.

Results: Anewalignment-freesequencesimilarityanalysismethod,calledSSAWisproposed.SSAWstandsfor Sequence Similarity Analysis using the Stationary Discrete Wavelet Transform (SDWT). It extracts k-mers from a sequence, then maps each k-mer to a complex number field. Then, the series of complex numbers formed are transformed into feature vectors using the stationary discrete wavelet transform. After these steps, the original sequence is turned into a feature vector with numeric values, which can then be used for clustering and/or classification.

Conclusions: Usingtwodifferenttypesofapplications,namely,clusteringandclassification,wecomparedSSAW against the the-state-of-the-art alignment free sequence analysis methods. SSAW demonstrates competitive or …


Feature-Based And String-Based Models For Predicting Rna-Protein Interaction, Donald Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, Ahmed Abbasi, Xiaobo Zhou Jan 2018

Feature-Based And String-Based Models For Predicting Rna-Protein Interaction, Donald Adjeroh, Maen Allaga, Jun Tan, Jie Lin, Yue Jiang, Ahmed Abbasi, Xiaobo Zhou

Faculty & Staff Scholarship

In this work, we study two approaches for the problem of RNA-Protein Interaction (RPI). In the first approach, we use a feature-based technique by combining extracted features from both sequences and secondary structures. The feature-based approach enhanced the prediction accuracy as it included much more available information about the RNA-protein pairs. In the second approach, we apply search algorithms and data structures to extract effective string patterns for prediction of RPI, using both sequence information (protein and RNA sequences), and structure information (protein and RNA secondary structures). This led to different string-based models for predicting interacting RNA-protein pairs. We show …