Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Engineering

University of Nebraska - Lincoln

CSE Conference and Workshop Papers

Series

Table segmentation

Publication Year

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth Jul 2017

End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth

CSE Conference and Workshop Papers

Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).


Transforming Web Tables To A Relational Database, David W. Embley, George Nagy, Sharad C. Seth Jan 2014

Transforming Web Tables To A Relational Database, David W. Embley, George Nagy, Sharad C. Seth

CSE Conference and Workshop Papers

HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.