Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Engineering
Data Extraction From Web Tables: The Devil Is In The Details, George Nagy, Sharad C. Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy
Data Extraction From Web Tables: The Devil Is In The Details, George Nagy, Sharad C. Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy
CSE Conference and Workshop Papers
We present a method based on header paths for efficient and complete extraction of labeled data from tables meant for humans. Although many table configurations yield to the proposed syntactic analysis, some require access to semantic knowledge. Clicking on one or two critical cells per table, through a simple interface, is sufficient to resolve most of these problem tables. Header paths, a purely syntactic representation of visual tables, can be transformed (“factored”) into existing representations of structured data such as category trees, relational tables, and RDF triples. From a random sample of 200 web tables from ten large statistical web …