Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Engineering
Suffix Tree, Minwise Hashing And Streaming Algorithms For Big Data Analysis In Bioinformatics, Sairam Behera
Suffix Tree, Minwise Hashing And Streaming Algorithms For Big Data Analysis In Bioinformatics, Sairam Behera
Department of Computer Science and Engineering: Dissertations, Theses, and Student Research
In this dissertation, we worked on several algorithmic problems in bioinformatics using mainly three approaches: (a) a streaming model, (b) sux-tree based indexing, and (c) minwise-hashing (minhash) and locality-sensitive hashing (LSH). The streaming models are useful for large data problems where a good approximation needs to be achieved with limited space usage. We developed an approximation algorithm (Kmer-Estimate) using the streaming approach to obtain a better estimation of the frequency of k-mer counts. A k-mer, a subsequence of length k, plays an important role in many bioinformatics analyses such as genome distance estimation. We also developed new methods that use …