Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Ankur Gupta

Selected Works

2008

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

On Searching Compressed String Collections Cache-Obliviously, Ankur Gupta, Paolo Ferragina, Roberto Grossi, Rahul Shah, Jeffrey Vitter Apr 2008

On Searching Compressed String Collections Cache-Obliviously, Ankur Gupta, Paolo Ferragina, Roberto Grossi, Rahul Shah, Jeffrey Vitter

Ankur Gupta

Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious …


Nearly Tight Bounds On The Encoding Length Of The Burrows-Wheeler Transform., Roberto Grossi, Ankur Gupta, Jeffery Vitter Dec 2007

Nearly Tight Bounds On The Encoding Length Of The Burrows-Wheeler Transform., Roberto Grossi, Ankur Gupta, Jeffery Vitter

Ankur Gupta

In this paper, we present a nearly tight analysis of the encoding length of the Burrows-Wheeler Transform (BWT) that is motivated by the text indexing setting. For a text T of n symbols drawn from an alphabet Σ, our encoding scheme achieves bounds in terms of the hth-order empirical entropy Hh of the text, and takes linear time for encoding and decoding. We also describe a lower bound on the encoding length of the BWT that constructs an infinite (non-trivial) class of texts that are among the hardest to compress using the BWT. We then show that our upper …