Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Brigham Young University

Theses/Dissertations

2005

Data extraction

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Generating Data-Extraction Ontologies By Example, Yuanqiu Zhou Nov 2005

Generating Data-Extraction Ontologies By Example, Yuanqiu Zhou

Theses and Dissertations

Ontology-based data-extraction is a resilient web data-extraction approach. A major limitation of this approach is that ontology experts must manually develop and maintain data-extraction ontologies. The limitation prevents ordinary users who have little knowledge of conceptual models from making use of this resilient approach. In this thesis we have designed and implemented a general framework, OntoByE, to generate data-extraction ontologies semi-automatically through a small set of examples collected by users. With the assistance of a limited amount of prior knowledge, experimental evidence shows that OntoByE is capable of interacting with users to generate data-extraction ontologies for domains of interest to …


A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman Jan 2005

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman

Theses and Dissertations

Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow.

In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas …