Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

CSE Conference and Workshop Papers

Category structure of table headers

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Clustering Header Categories Extracted From Web Tables, George Nagy, David W. Embley, Mukkai Krishnamoorthy, Sharad C. Seth Feb 2015

Clustering Header Categories Extracted From Web Tables, George Nagy, David W. Embley, Mukkai Krishnamoorthy, Sharad C. Seth

CSE Conference and Workshop Papers

Revealing related content among heterogeneous web tables is part of our long term objective of formulating queries over multiple sources of information. Two hundred HTML tables from institutional web sites are segmented and each table cell is classified according to the fundamental indexing property of row and column headers. The categories that correspond to the multi-dimensional data cube view of a table are extracted by factoring the (often multi-row/column) headers. To reveal commonalities between tables from diverse sources, the Jaccard distances between pairs of category headers (and also table titles) are computed. We show how about one third of our …