Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer
Scalable Detection And Extraction Of Data In Lists In Ocred Text For Ontology Population Using Semi-Supervised And Unsupervised Active Wrapper Induction, Thomas L. Packer
Theses and Dissertations
Lists of records in machine-printed documents contain much useful information. As one example, the thousands of family history books scanned, OCRed, and placed on-line by FamilySearch.org probably contain hundreds of millions of fact assertions about people, places, family relationships, and life events. Data like this cannot be fully utilized until a person or process locates the data in the document text, extracts it, and structures it with respect to an ontology or database schema. Yet, in the family history industry and other industries, data in lists goes largely unused because no known approach adequately addresses all of the costs, challenges, …