Physical Sciences and Mathematics | Open Access Articles

A Framework For Dynamizing Succinct Data Structures, Ankur Gupta, Wing K. Hon, Rahul Shah, Jeffery S. Vitter Feb 2010

A Framework For Dynamizing Succinct Data Structures, Ankur Gupta, Wing K. Hon, Rahul Shah, Jeffery S. Vitter

Ankur Gupta

We present a framework to dynamize succinct data structures, to encourage their use over non-succinct versions in a wide variety of important application areas. Our framework can dynamize most stateof-the-art succinct data structures for dictionaries, ordinal trees, labeled trees, and text collections.

Go to article

On Searching Compressed String Collections Cache-Obliviously, Ankur Gupta, Paolo Ferragina, Roberto Grossi, Rahul Shah, Jeffrey Vitter Apr 2008

On Searching Compressed String Collections Cache-Obliviously, Ankur Gupta, Paolo Ferragina, Roberto Grossi, Rahul Shah, Jeffrey Vitter

Ankur Gupta

Current data structures for searching large string collections either fail to achieve minimum space or cause too many cache misses. In this paper we discuss some edge linearizations of the classic trie data structure that are simultaneously cache-friendly and compressed. We provide new insights on front coding [24], introduce other novel linearizations, and study how close their space occupancy is to the information-theoretic minimum. The moral is that they are not just heuristics. Our second contribution is a novel dictionary encoding scheme that builds upon such linearizations and achieves nearly optimal space, offers competitive I/O-search time, and is also conscious …

Go to article

Nearly Tight Bounds On The Encoding Length Of The Burrows-Wheeler Transform., Roberto Grossi, Ankur Gupta, Jeffery Vitter Dec 2007

Nearly Tight Bounds On The Encoding Length Of The Burrows-Wheeler Transform., Roberto Grossi, Ankur Gupta, Jeffery Vitter

Ankur Gupta

In this paper, we present a nearly tight analysis of the encoding length of the Burrows-Wheeler Transform (BWT) that is motivated by the text indexing setting. For a text T of n symbols drawn from an alphabet Σ, our encoding scheme achieves bounds in terms of the hth-order empirical entropy Hh of the text, and takes linear time for encoding and decoding. We also describe a lower bound on the encoding length of the BWT that constructs an infinite (non-trivial) class of texts that are among the hardest to compress using the BWT. We then show that our upper …

Go to article

On The Size Of Succinct Indices, Alexander Golynski, Ankur Gupta, Roberto Grossi, Rajeev Raman, Satti Rao Dec 2006

On The Size Of Succinct Indices, Alexander Golynski, Ankur Gupta, Roberto Grossi, Rajeev Raman, Satti Rao

Ankur Gupta

A succinct data structure occupies an amount of space that is close to the information-theoretic minimum plus an additional term. The latter is not necessarily a lower-order term and, in several cases, completely dominates the space occupancy both in theory and in practice. In this paper, we present several solutions to partially overcome this problem, introducing new techniques of independent interest that allow us to improve over previously known upper and lower bounds.

Go to article

Compressed Dictionaries: Space Measures, Data Sets, And Experiments Dec 2005

Compressed Dictionaries: Space Measures, Data Sets, And Experiments

Ankur Gupta

In this paper, we present an experimental study of the spacetime tradeoffs for the dictionary problem, where we design a data structure to represent set data, which consist of a subset S of n items out of a universe U = {0, 1,...,u − 1} supporting various queries on S. Our primary goal is to reduce the space required for such a dictionary data structure. Many compression schemes have been developed for dictionaries, which fall generally in the categories of combinatorial encodings and data-aware methods and still support queries efficiently. We show that for many (real-world) datasets, data-aware methods lead …

Go to article

When Indexing Equals Compression: Experiments With Compressing Suffix Arrays And Applications, Roberto Grossi, Ankur Gupta, Jeffrey Vitter Dec 2003

When Indexing Equals Compression: Experiments With Compressing Suffix Arrays And Applications, Roberto Grossi, Ankur Gupta, Jeffrey Vitter

Ankur Gupta

We report on a new and improved version of high-order entropy-compressed suffix arrays, which has theoretical performance guarantees similar to those in our earlier work [16], yet represents an improvement in practice. Our experiments indicate that the resulting text index offers state-of-the-art compression. In particular, we require roughly 20 % of the original text size—without requiring a separate instance of the text—and support fast and powerful searches. To our knowledge, this is the best known method in terms of space for fast searching. 1

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

A Framework For Dynamizing Succinct Data Structures, Ankur Gupta, Wing K. Hon, Rahul Shah, Jeffery S. Vitter

Ankur Gupta

On Searching Compressed String Collections Cache-Obliviously, Ankur Gupta, Paolo Ferragina, Roberto Grossi, Rahul Shah, Jeffrey Vitter

Ankur Gupta

Nearly Tight Bounds On The Encoding Length Of The Burrows-Wheeler Transform., Roberto Grossi, Ankur Gupta, Jeffery Vitter

Ankur Gupta

On The Size Of Succinct Indices, Alexander Golynski, Ankur Gupta, Roberto Grossi, Rajeev Raman, Satti Rao

Ankur Gupta

Compressed Dictionaries: Space Measures, Data Sets, And Experiments

Ankur Gupta

When Indexing Equals Compression: Experiments With Compressing Suffix Arrays And Applications, Roberto Grossi, Ankur Gupta, Jeffrey Vitter

Ankur Gupta