Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Brigham Young University

Theses/Dissertations

2003

Data extraction

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Schema Matching And Data Extraction Over Html Tables, Cui Tao Sep 2003

Schema Matching And Data Extraction Over Html Tables, Cui Tao

Theses and Dissertations

Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem for the case of mostly structured data in the form of HTML tables, based on document-independent extraction ontologies. The solution entails elements of table location and table understanding, data integration, and wrapper creation. Table location and understanding allows us to locate the table of interest, recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records …


Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand Mar 2003

Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand

Theses and Dissertations

The simplicity and proliferation of the World Wide Web (WWW) has taken the availability of information to an unprecedented level. The next generation of the Web, the Semantic Web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. One hinderance to the Semantic Web is the lack of existing semantically marked-up data. Until there is a critical mass of Semantic Web data, few people will develop and use Semantic Web applications. This project helps promote the Semantic Web by providing content. We apply existing information-extraction techniques, in particular, the BYU ontologybased data-extraction …