Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

1996

Brigham Young University

Database

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Compressing Semi-Structured Text Using Hierarchical Phrase Identifications, Dan R. Olsen Jr., Craig G. Nevill-Manning, Ian H. Witten Apr 1996

Compressing Semi-Structured Text Using Hierarchical Phrase Identifications, Dan R. Olsen Jr., Craig G. Nevill-Manning, Ian H. Witten

Faculty Publications

The structure of this paper is as follows. We begin by identifying some characteristics of semi-structured text that have special relevance to data compression. We then give a brief account of a particular large textual database, and describe a compression scheme that exploits its structure. In addition to providing compression, the system gives some insight into the structure of the database. Finally we show how the hierarchical grammar can be generalized, first manually and then automatically, to yield further improvements in compression performance.