Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 2 of 2
Full-Text Articles in Physical Sciences and Mathematics
End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth
End-To-End Conversion Of Html Tables For Populating A Relational Database, George Nagy, David W. Embley, Sharad C. Seth
CSE Conference and Workshop Papers
Automating the conversion of human-readable HTML tables into machine-readable relational tables will enable end-user query processing of the millions of data tables found on the web. Theoretically sound and experimentally successful methods for index-based segmentation, extraction of category hierarchies, and construction of a canonical table suitable for direct input to a relational database are demonstrated on 200 heterogeneous web tables. The methods are scalable: the program generates the 198 Access compatible CSV files in ~0.1s per table (two tables could not be indexed).
Transforming Web Tables To A Relational Database, David W. Embley, George Nagy, Sharad C. Seth
Transforming Web Tables To A Relational Database, David W. Embley, George Nagy, Sharad C. Seth
CSE Conference and Workshop Papers
HTML tables represent a significant fraction of web data. The often complex headers of such tables are determined accurately using their indexing property. Isolated headers are factored to extract category hierarchies. Web tables are then transformed into a canonical form and imported into a relational database. The proposed processing allows for the formulation of arbitrary SQL queries over the collection of induced relational tables.