Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

LSU Doctoral Dissertations

Computer Sciences

2017

Dictionary Matching

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Succinct Data Structures For Parameterized Pattern Matching And Related Problems, Arnab Ganguly Jan 2017

Succinct Data Structures For Parameterized Pattern Matching And Related Problems, Arnab Ganguly

LSU Doctoral Dissertations

Let T be a fixed text-string of length n and P be a varying pattern-string of length |P| <= n. Both T and P contain characters from a totally ordered alphabet Sigma of size sigma <= n. Suffix tree is the ubiquitous data structure for answering a pattern matching query: report all the positions i in T such that T[i + k - 1] = P[k], 1 <= k <= |P|. Compressed data structures support pattern matching queries, using much lesser space than the suffix tree, mainly by relying on a crucial property of the leaves in the tree. Unfortunately, in many suffix tree variants (such as parameterized suffix tree, order-preserving suffix tree, and 2-dimensional suffix tree), this property does not hold. Consequently, compressed representations of these suffix tree variants have been elusive. We present the first compressed data structures for two important variants of the pattern matching problem: (1) Parameterized Matching -- report a position i in T if T[i + k - 1] = f(P[k]), 1 <= k <= |P|, for a one-to-one function f that renames the characters in P to the characters in T[i,i+|P|-1], and (2) Order-preserving Matching -- report a position i in T if T[i + j - 1] and T[i + k -1] have the same relative order as that of P[j] and P[k], 1 <= j < k <= |P|. For each of these two problems, the existing suffix tree variant requires O(n*log n) bits of space and answers a query in O(|P|*log sigma + occ) time, where occ is the number of starting positions where a match exists. We present data structures that require O(n*log sigma) bits of space and answer a query in O((|P|+occ) poly(log n)) time. As a byproduct, we obtain compressed data structures for a few other variants, as well as introduce two new techniques (of independent interest) for designing compressed data structures for pattern matching.