Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

PDF

University of Massachusetts Amherst

Selected Works

Metadata

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Table Extraction Using Conditional Random Fields, David Pinto, Andrew Mccallum, Xing Wei, W. Bruce Croft Jan 2003

Table Extraction Using Conditional Random Fields, David Pinto, Andrew Mccallum, Xing Wei, W. Bruce Croft

Andrew McCallum

The ability to find tables and extract information from them is a necessary component of data mining, question answering, and other information retrieval tasks. Documents often contain tables in order to communicate densely packed, multi-dimensional information. Tables do this by employing layout patterns to efficiently indicate fields and records in two-dimensional form. Their rich combination of formatting and content present difficulties for traditional language modeling techniques, however. This paper presents the use of conditional random fields (CRFs) for table extraction, and compares them with hidden Markov models (HMMs). Unlike HMMs, CRFs support the use of many rich and overlapping layout …