Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Engineering

University of Nebraska - Lincoln

Series

2020

Assembly

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Suffix Tree, Minwise Hashing And Streaming Algorithms For Big Data Analysis In Bioinformatics, Sairam Behera Dec 2020

Suffix Tree, Minwise Hashing And Streaming Algorithms For Big Data Analysis In Bioinformatics, Sairam Behera

Department of Computer Science and Engineering: Dissertations, Theses, and Student Research

In this dissertation, we worked on several algorithmic problems in bioinformatics using mainly three approaches: (a) a streaming model, (b) sux-tree based indexing, and (c) minwise-hashing (minhash) and locality-sensitive hashing (LSH). The streaming models are useful for large data problems where a good approximation needs to be achieved with limited space usage. We developed an approximation algorithm (Kmer-Estimate) using the streaming approach to obtain a better estimation of the frequency of k-mer counts. A k-mer, a subsequence of length k, plays an important role in many bioinformatics analyses such as genome distance estimation. We also developed new methods that use …