Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

Information Behaviors Of Nuclear Scientists At Korea Atomic Energy Research Institute, Youngchoon Chun, Jiho Yi, Jung-Ran Park, Sangki Choi Oct 2015

Information Behaviors Of Nuclear Scientists At Korea Atomic Energy Research Institute, Youngchoon Chun, Jiho Yi, Jung-Ran Park, Sangki Choi

Journal of East Asian Libraries

The goal of the study was to analyze the information use behaviors of researchers in the science and technology domain. A survey and interviews were conducted targeting nuclear scientists at the Korea Atomic Energy Research Institute. Study results indicate that the nuclear scientists mainly use the Institute library/information center and Internet portal/search engines during information acquisition. Easy access to information, accuracy, currency and cost are the most critical factors in selecting and obtaining information. The most frequently used database for executing research is the Institute’s electronic library (NUCLIS21) followed by the Citation Index SCOPUS. The results of the study indicate …


How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker Jan 2014

How Well Does Multiple Ocr Error Correction Generalize?, William B. Lund, Eric K. Ringger, Daniel D. Walker

Faculty Publications

As the digitization of historical documents, such as newspapers, becomes more common, the need of the archive patron for accurate digital text from those documents increases. Building on our earlier work, the contributions of this paper are: 1. in demonstrating the applicability of novel methods for correcting optical character recognition (OCR) on disparate data sets, including a new synthetic training set, 2. enhancing the correction algorithm with novel features, and 3. assessing the data requirements of the correction learning method. First, we correct errors using conditional random fields (CRF) trained on synthetic training data sets in order to demonstrate the …


Building An Access Database For Cookstove Research, Margaret L. Weddle Aug 2013

Building An Access Database For Cookstove Research, Margaret L. Weddle

Student Works

This paper takes the reader through the thought process and actual instructions to create your own Microsoft Access database, or how to use the one provided with this paper. Also, instructions to use the HBLL resources of Compendex and RefWorks are covered. While this work was built specifically for Cookstoves research, it could be adapted to any research where you would need to maintain a record of the journal articles that you are using. It has been discovered that building a database is a time consuming and difficult work, but once done, Access provides an easy way to work with …


A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger Jan 2012

A Synthetic Document Image Dataset For Developing And Evaluating Historical Document Processing Methods, Daniel Walker, William Lund, Eric Ringger

Faculty Publications

Document images accompanied by OCR output text and ground truth transcriptions are useful for developing and evaluating document recognition and processing methods, especially for historical document images. Additionally, research into improving the performance of such methods often requires further annotation of training and test data (e.g., topical document labels). However, transcribing and labeling historical documents is expensive. As a result, existing real-world document image datasets with such accompanying resources are rare and often relatively small. We introduce synthetic document image datasets of varying levels of noise that have been created from standard (English) text corpora using an existing document degradation …


Evaluating Models Of Latent Document Semantics In The Presence Of Ocr Errors, Daniel D. Walker, William B. Lund, Eric K. Ringger Jan 2010

Evaluating Models Of Latent Document Semantics In The Presence Of Ocr Errors, Daniel D. Walker, William B. Lund, Eric K. Ringger

Faculty Publications

Models of latent document semantics such as the mixture of multinomials model and Latent Dirichlet Allocation have received substantial attention for their ability to discover topical semantics in large collections of text. In an effort to apply such models to noisy optical character recognition (OCR) text output, we endeavor to understand the effect that character-level noise can have on unsupervised topic modeling. We show the effects both with document-level topic analysis (document clustering) and with word-level topic analysis (LDA) on both synthetic and real-world OCR data. As expected, experimental results show that performance declines as word error rates increase. Common …


A Sophisticated Library Search Strategy Using Folksonomies And Similarity Matching, William Lund, Yiu-Kai D. Ng, Maria Soledad Pera Jul 2009

A Sophisticated Library Search Strategy Using Folksonomies And Similarity Matching, William Lund, Yiu-Kai D. Ng, Maria Soledad Pera

Faculty Publications

Libraries, private and public, offer valuable resources to library patrons. As of today the only way to locate information archived exclusively in libraries is through their catalogs. Library patrons, however, often find it difficult to formulate a proper query, which requires using specific keywords assigned to different fields of desired library catalog records, to obtain relevant results. These improperly formulated queries often yield irrelevant results or no results at all. This negative experience in dealing with existing library systems turn library patrons away from library catalogs; instead, they rely on Web search engines to perform their searches first and upon …