Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Brigham Young University

Theses/Dissertations

Information extraction

Articles 1 - 5 of 5

Full-Text Articles in Entire DC Network

Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park Jan 2015

Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park

Theses and Dissertations

The tasks of entity recognition through ontological commitment, fact extraction and organization with respect to a target schema, and entity deduplication have all been examined in recent years, and systems exist that can perform each individual task. A framework combining all these tasks, however, is still needed to accomplish the goal of automatically extracting and organizing biographical facts about persons found in historical documents into disambiguated entity records. We introduce FROntIER (Fact Recognizer for Ontologies with Inference and Entity Resolution) as the framework to recognize and extract facts using an ontology and organize facts of interest through inferring implicit facts …


Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer Oct 2014

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer

Theses and Dissertations

Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, …


Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury Jun 2010

Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury

Theses and Dissertations

Family history research on the web is increasing in popularity, and many competing genealogical websites host large amounts of data-rich, unstructured, primary genealogical records. It is labor-intensive, however, even after making these records machine-readable, for humans to make these records easily searchable. What we need are computer tools that can automatically produce indices and databases from these genealogical records and can automatically identify individuals and events, determine relationships, and put families together. We propose here a possible solution—specialized ontologies, built specifically for extracting information from primary genealogical records, with expert logic and rules to infer genealogical facts and assemble relationship …


A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman Jan 2005

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman

Theses and Dissertations

Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow.

In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas …


Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand Mar 2003

Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand

Theses and Dissertations

The simplicity and proliferation of the World Wide Web (WWW) has taken the availability of information to an unprecedented level. The next generation of the Web, the Semantic Web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. One hinderance to the Semantic Web is the lack of existing semantically marked-up data. Until there is a critical mass of Semantic Web data, few people will develop and use Semantic Web applications. This project helps promote the Semantic Web by providing content. We apply existing information-extraction techniques, in particular, the BYU ontologybased data-extraction …