Physical Sciences and Mathematics | Open Access Articles

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer Oct 2014

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer

Theses and Dissertations

Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, …

Go to article

Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury Jun 2010

Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury

Theses and Dissertations

Family history research on the web is increasing in popularity, and many competing genealogical websites host large amounts of data-rich, unstructured, primary genealogical records. It is labor-intensive, however, even after making these records machine-readable, for humans to make these records easily searchable. What we need are computer tools that can automatically produce indices and databases from these genealogical records and can automatically identify individuals and events, determine relationships, and put families together. We propose here a possible solution—specialized ontologies, built specifically for extracting information from primary genealogical records, with expert logic and rules to infer genealogical facts and assemble relationship …

Go to article

Automating Mini-Ontology Generation From Canonical Tables, Stephen G. Lynn Apr 2008

Automating Mini-Ontology Generation From Canonical Tables, Stephen G. Lynn

Theses and Dissertations

In this thesis work we develop and test MOGO (a Mini-Ontology GeneratOr.) MOGO automates the generation of mini-ontologies from canonicalized tables of data. This will help anyone trying to organize large amounts of existing data into a more searchable and accessible form. By using a number of different heuristic rules for selecting, enhancing, and modifying ontology elements, MOGO allows users to automatically, semi-automatically, or manually generate conceptual mini-ontologies from canonicalized tables of data. Ideally, MOGO operates fully automatically while allowing users to intervene to direct and correct when necessary so that they can always satisfactorily complete the translation of canonicalized …

Go to article

A Tool To Support Ontology Creation Based On Incremental Mini-Ontology Merging, Zonghui Lian Mar 2008

A Tool To Support Ontology Creation Based On Incremental Mini-Ontology Merging, Zonghui Lian

Theses and Dissertations

This thesis addresses the problem of tool support for semi-automatic ontology mapping and merging. Solving this problem contributes to ontology creation and evolution by relieving users from tedious and time-consuming work. This thesis shows that a tool can be built that will take a “mini-ontology” and a “growing ontology” as input and make it possible to produce manually, semi-automatically, or automatically an extended growing ontology as output. Characteristics of this tool include: (1) a graphical, interactive user interface with features that will allow users to map and merge ontologies, and (2) a framework supporting pluggable, semi-automatic, and automatic mapping and …

Go to article

Ontology-Based Free-Form Query Processing For The Semantic Web, Mark S. Vickers Jun 2006

Ontology-Based Free-Form Query Processing For The Semantic Web, Mark S. Vickers

Theses and Dissertations

With the onset of the semantic web, the problem of making semantic content effectively searchable for the general public emerges. Demanding an understanding of ontologies or familiarity with a new query language would likely frustrate semantic web users and prevent widespread success. Given this need, this thesis describes AskOntos, which is a system that uses extraction ontologies to convert conjunctive, free-form queries into structured queries for semantically annotated web pages. AskOntos then executes these structured queries and provides answers as tables of extracted values. In experiments conducted AskOntos was able to translate queries with a precision of 88% and a …

Go to article

Generating Data-Extraction Ontologies By Example, Yuanqiu Zhou Nov 2005

Generating Data-Extraction Ontologies By Example, Yuanqiu Zhou

Theses and Dissertations

Ontology-based data-extraction is a resilient web data-extraction approach. A major limitation of this approach is that ontology experts must manually develop and maintain data-extraction ontologies. The limitation prevents ordinary users who have little knowledge of conceptual models from making use of this resilient approach. In this thesis we have designed and implemented a general framework, OntoByE, to generate data-extraction ontologies semi-automatically through a small set of examples collected by users. With the assistance of a limited amount of prior knowledge, experimental evidence shows that OntoByE is capable of interacting with users to generate data-extraction ontologies for domains of interest to …

Go to article

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman Jan 2005

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman

Theses and Dissertations

Extraction of information from semi-structured or unstructured documents, such as Web pages, is a useful yet complex task. Research has demonstrated that ontologies may be used to achieve a high degree of accuracy in data extraction while maintaining resiliency in the face of document changes. Ontologies do not, however, diminish the complexity of a data-extraction system. As research in the field progresses, the need for a modular data-extraction system that de-couples the various functional processes involved continues to grow.

In this thesis we propose a framework for such a system. The nature of the framework allows new algorithms and ideas …

Go to article

Automated Agent Ontology Creation For Distributed Databases, Austin A. Bartolo Mar 2004

Automated Agent Ontology Creation For Distributed Databases, Austin A. Bartolo

Theses and Dissertations

In distributed database environments, the combination of resources from multiple sources requiring different interfaces is a universal problem. The current solution requires an expert to generate an ontology, or mapping, which contains all interconnections between the various fields in the databases. This research proposes the application of software agents in automating the ontology creation for distributed database environments with minimal communication. The automatic creation of a domain ontology alleviates the need for experts to manually map one database to other databases in the environment. Using several combined comparison methods, these agents communicate and negotiate similarities between information sources and retain …

Go to article

Schema Matching And Data Extraction Over Html Tables, Cui Tao Sep 2003

Schema Matching And Data Extraction Over Html Tables, Cui Tao

Theses and Dissertations

Data on the Web in HTML tables is mostly structured, but we usually do not know the structure in advance. Thus, we cannot directly query for data of interest. We propose a solution to this problem for the case of mostly structured data in the form of HTML tables, based on document-independent extraction ontologies. The solution entails elements of table location and table understanding, data integration, and wrapper creation. Table location and understanding allows us to locate the table of interest, recognize attributes and values, pair attributes with values, and form records. Data-integration techniques allow us to match source records …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer

Theses and Dissertations

Automatic Extraction From And Reasoning About Genealogical Records: A Prototype, Charla Jean Woodbury

Theses and Dissertations

Automating Mini-Ontology Generation From Canonical Tables, Stephen G. Lynn

Theses and Dissertations

A Tool To Support Ontology Creation Based On Incremental Mini-Ontology Merging, Zonghui Lian

Theses and Dissertations

Ontology-Based Free-Form Query Processing For The Semantic Web, Mark S. Vickers

Theses and Dissertations

Generating Data-Extraction Ontologies By Example, Yuanqiu Zhou

Theses and Dissertations

A Framework For Extraction Plans And Heuristics In An Ontology-Based Data-Extraction System, Alan E. Wessman

Theses and Dissertations

Automated Agent Ontology Creation For Distributed Databases, Austin A. Bartolo

Theses and Dissertations

Schema Matching And Data Extraction Over Html Tables, Cui Tao

Theses and Dissertations