Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

Data Extraction From Web Tables: The Devil Is In The Details, George Nagy, Sharad C. Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy Jul 2017

Data Extraction From Web Tables: The Devil Is In The Details, George Nagy, Sharad C. Seth, Dongpu Jin, David W. Embley, Spencer Machado, Mukkai Krishnamoorthy

CSE Conference and Workshop Papers

We present a method based on header paths for efficient and complete extraction of labeled data from tables meant for humans. Although many table configurations yield to the proposed syntactic analysis, some require access to semantic knowledge. Clicking on one or two critical cells per table, through a simple interface, is sufficient to resolve most of these problem tables. Header paths, a purely syntactic representation of visual tables, can be transformed (“factored”) into existing representations of structured data such as category trees, relational tables, and RDF triples. From a random sample of 200 web tables from ten large statistical web …


Adding Escience Assets To The Data Web, Herbert H. Van De Sompel, Carl Lagoze, Michael L. Nelson, Simeon Warner, Robert Sanderson, Pete Johnston Apr 2009

Adding Escience Assets To The Data Web, Herbert H. Van De Sompel, Carl Lagoze, Michael L. Nelson, Simeon Warner, Robert Sanderson, Pete Johnston

Computer Science Faculty Publications

Aggregations of Web resources are increasingly important in scholarship as it adopts new methods that are data-centric, collaborative, and networked-based. The same notion of aggregations of resources is common to the mashed-up, socially networked information environment of Web 2.0. We present a mechanism to identify and describe aggregations of Web resources that has resulted from the Open Archives Initiative - Object Reuse and Exchange (OAI-ORE) project. The OAI-ORE specifications are based on the principles of the Architecture of the World Wide Web, the Semantic Web, and the Linked Data effort. Therefore, their incorporation into the cyberinfrastructure that supports eScholarship will …