Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

University of Nebraska - Lincoln

2017

Header factoring

Articles 1 - 1 of 1

Full-Text Articles in Computer Engineering

End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth Jul 2017

End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth

CSE Conference and Workshop Papers

Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).