Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

PDF

Brigham Young University

Information extraction

Articles 1 - 6 of 6

Full-Text Articles in Entire DC Network

Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park Jan 2015

Frontier: A Framework For Extracting And Organizing Biographical Facts In Historical Documents, Joseph Park

Theses and Dissertations

The tasks of entity recognition through ontological commitment, fact extraction and organization with respect to a target schema, and entity deduplication have all been examined in recent years, and systems exist that can perform each individual task. A framework combining all these tasks, however, is still needed to accomplish the goal of automatically extracting and organizing biographical facts about persons found in historical documents into disambiguated entity records. We introduce FROntIER (Fact Recognizer for Ontologies with Inference and Entity Resolution) as the framework to recognize and extract facts using an ontology and organize facts of interest through inferring implicit facts …


Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer Oct 2014

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer

Theses and Dissertations

Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, …


Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury Jun 2010

Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury

Theses and Dissertations

Family history research on the web is increasing in popularity, and many competing genealogical websites host large amounts of data-rich, unstructured, primary genealogical records. It is labor-intensive, however, even after making these records machine-readable, for humans to make these records easily searchable. What we need are computer tools that can automatically produce indices and databases from these genealogical records and can automatically identify individuals and events, determine relationships, and put families together. We propose here a possible solution—specialized ontologies, built specifically for extracting information from primary genealogical records, with expert logic and rules to infer genealogical facts and assemble relationship …


A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman Jan 2005

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman

Theses and Dissertations

Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow.

In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas …


Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand Mar 2003

Ontology-Based Extraction Of Rdf Data From The World Wide Web, Timothy Adam Chartrand

Theses and Dissertations

The simplicity and proliferation of the World Wide Web (WWW) has taken the availability of information to an unprecedented level. The next generation of the Web, the Semantic Web, seeks to make information more usable by machines by introducing a more rigorous structure based on ontologies. One hinderance to the Semantic Web is the lack of existing semantically marked-up data. Until there is a critical mass of Semantic Web data, few people will develop and use Semantic Web applications. This project helps promote the Semantic Web by providing content. We apply existing information-extraction techniques, in particular, the BYU ontologybased data-extraction …


Peppering Knowledge Sources With Salt: Boosting Conceptual Content For Ontology Generation, Deryle W. Lonsdale, Yihong Ding, David W. Embley, Alan Melby Jan 2002

Peppering Knowledge Sources With Salt: Boosting Conceptual Content For Ontology Generation, Deryle W. Lonsdale, Yihong Ding, David W. Embley, Alan Melby

Faculty Publications

This paper describes work done to explore the common ground between two different ongoing research projects: the standardization of lexical and terminological resources, and the use of conceptual ontologies for information extraction and data integration. Specifically, this paper explores improving the generation of extraction ontologies through use of a comprehensive terminology database that has been represented in a standardized format for easy tool-based implementation. We show how, via the successful integration of these two distinct efforts, it is possible to leverage large-scale terminological and conceptual information having relationship-rich semantic resources in order to reformulate, match, and merge retrieved information of …