Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Entire DC Network
Compressing Semi-Structured Text Using Hierarchical Phrase Identifications, Dan R. Olsen Jr., Craig G. Nevill-Manning, Ian H. Witten
Compressing Semi-Structured Text Using Hierarchical Phrase Identifications, Dan R. Olsen Jr., Craig G. Nevill-Manning, Ian H. Witten
Faculty Publications
The structure of this paper is as follows. We begin by identifying some characteristics of semi-structured text that have special relevance to data compression. We then give a brief account of a particular large textual database, and describe a compression scheme that exploits its structure. In addition to providing compression, the system gives some insight into the structure of the database. Finally we show how the hierarchical grammar can be generalized, first manually and then automatically, to yield further improvements in compression performance.