Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

34,936 Full-Text Articles 35,375 Authors 10,216,521 Downloads 307 Institutions

All Articles in Computer Sciences

Faceted Search

34,936 full-text articles. Page 1053 of 1054.

Nearest Neighbor Based Collection Ocr, Pramod Sankar K., C. V. Jawahar, R. Manmatha 2009 University of Massachusetts - Amherst

Nearest Neighbor Based Collection Ocr, Pramod Sankar K., C. V. Jawahar, R. Manmatha

R. Manmatha

Conventional optical character recognition (OCR) systems operate on individual characters and words, and do not normally exploit document or collection context. We describe a Collection OCR which takes advantage of the fact that multiple examples of the same word (often in the same font) may occur in a document or collection. The idea here is that an OCR or a reCAPTCHA like process generates a partial set of recognized words. In the second stage, a nearest neighbor algorithm compares the remaining word-images to those already recognized and propagates labels from the nearest neighbors. It is shown that by using an ...


Improving State-Of-The-Art Ocr Through High-Precision Document-Specific Modeling, Andrew Kae, Gary B. Huang, Carl Doersch, Erik G. Learned-Miller 2009 University of Massachusetts - Amherst

Improving State-Of-The-Art Ocr Through High-Precision Document-Specific Modeling, Andrew Kae, Gary B. Huang, Carl Doersch, Erik G. Learned-Miller

Andrew Kae

Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models that are vulnerable to cases in which the docu- ment is noisy or is written in a font dissimilar to the stored fonts. We address these problems by learning character models directly from the document itself, rather than using pre-stored font models. This method has had some success in the past, but we are able to achieve substantial improve- ment in error reduction through a novel method for creating nearly error-free document-specifictraining data and ...


Information Cost Tradeoffs For Augmented Index And Streaming Language Recognition, Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, Andrew McGregor 2009 University of Massachusetts - Amherst

Information Cost Tradeoffs For Augmented Index And Streaming Language Recognition, Amit Chakrabarti, Graham Cormode, Ranganath Kondapally, Andrew Mcgregor

Andrew McGregor

This paper makes three main contributions to the theory of communication complexity and stream computation. First, we present new bounds on the information complexity of AUGMENTED-INDEX. In contrast to analogous results for INDEX by Jain, Radhakrishnan and Sen [J. ACM, 2009], we have to overcome the significant technical challenge that protocols for AUGMENTED-INDEX may violate the “rectangle property” due to the inherent input sharing. Second, we use these bounds to resolve an open problem of Magniez, Mathieu and Nayak [STOC, 2010] that asked about the multi-pass complexity of recognizing Dyck languages. This results in a natural separation between the standard ...


The Genomics Education Partnership: Successful Integration Of Research Into Laboratory Classes At A Diverse Group Of Undergraduate Institutions, Elizabeth Shoop, et al 2009 Macalester College

The Genomics Education Partnership: Successful Integration Of Research Into Laboratory Classes At A Diverse Group Of Undergraduate Institutions, Elizabeth Shoop, Et Al

Elizabeth Shoop

No abstract provided.


Optimizing Semantic Coherence In Topic Models, D. Mimno, H. Wallach, E. Talley, M. Leenders, Andrew McCallum 2009 University of Massachusetts - Amherst

Optimizing Semantic Coherence In Topic Models, D. Mimno, H. Wallach, E. Talley, M. Leenders, Andrew Mccallum

Andrew McCallum

Large organizations often face the critical challenge of sharing information and maintaining connections between disparate subunits. Tools for automated analysis of document collections, such as topic models, can provide an important means for communication. The value of topic modeling is in its ability to discover interpretable, coherent themes from unstructured document sets, yet it is not unusual to find semantic mismatches that substantially reduce user confidence. In this paper, we first present an expert-driven topic annotation study, undertaken in order to obtain an annotated set of baseline topics and their distinguishing characteristics. We then present a metric for detecting poor-quality ...


Resource-Bounded Information Extraction: Acquiring Missing Feature Values On Demand, Pallika Kanani, Andrew McCallum, Shaohan Hu 2009 University of Massachusetts - Amherst

Resource-Bounded Information Extraction: Acquiring Missing Feature Values On Demand, Pallika Kanani, Andrew Mccallum, Shaohan Hu

Andrew McCallum

We present a general framework for the task of extracting specific information ``on demand'' from a large corpus such as the Web under resource-constraints. Given a database with missing or uncertain information, the proposed system automatically formulates queries, issues them to a search interface, selects a subset of the documents, extracts the required information from them, and fills the missing values in the original database. We also exploit inherent dependency within the data to obtain useful information with fewer computational resources. We build such a system in the citation database domain that extracts the missing publication years using limited resources ...


Rollout Sampling Policy Iteration For Decentralized Pomdps, Feng Wu, Shlomo Zilberstein, Xiaoping Chen 2009 University of Massachusetts - Amherst

Rollout Sampling Policy Iteration For Decentralized Pomdps, Feng Wu, Shlomo Zilberstein, Xiaoping Chen

Shlomo Zilberstein

We present decentralized rollout sampling policy iteration (DecRSPI)--a new algorithm for multiagent decision problems formalized as DEC-POMDPs. DecRSPI is designed to improve scalability and tackle problems that lack an explicit model. The algorithm uses Monte-Carlo methods to generate a sample of reachable belief states. Then it computes a joint policy for each belief state based on the rollout estimations. A new policy representation allows us to represent solutions compactly. The key benefits of the algorithm are its linear time complexity over the number of agents, its bounded memory usage and good solution quality. It can solve larger problems that ...


Improving State-Of-The-Art Ocr Through High-Precision Document-Specific Modeling, Andrew Kae, Gary B. Huang, Carl Doersch, Erik G. Learned-Miller 2009 University of Massachusetts - Amherst

Improving State-Of-The-Art Ocr Through High-Precision Document-Specific Modeling, Andrew Kae, Gary B. Huang, Carl Doersch, Erik G. Learned-Miller

Erik G Learned-Miller

Optical character recognition (OCR) remains a difficult problem for noisy documents or documents not scanned at high resolution. Many current approaches rely on stored font models that are vulnerable to cases in which the docu- ment is noisy or is written in a font dissimilar to the stored fonts. We address these problems by learning character models directly from the document itself, rather than using pre-stored font models. This method has had some success in the past, but we are able to achieve substantial improve- ment in error reduction through a novel method for creating nearly error-free document-specifictraining data and ...


Constructing Skill Trees For Reinforcement Learning Agents From Demonstration Trajectories, George Konidaris, Scott Kuindersma, Andrew Barto, Roderic Grupen 2009 University of Massachusetts - Amherst

Constructing Skill Trees For Reinforcement Learning Agents From Demonstration Trajectories, George Konidaris, Scott Kuindersma, Andrew Barto, Roderic Grupen

Roderic Grupen

We introduce CST, an algorithm for constructing skill trees from demonstration trajectories in continuous reinforcement learning domains. CST uses a change-point detection method to segment each trajectory into a skill chain by detecting a change of appropriate abstraction, or that a segment is too complex to model as a single skill. The skill chains from each trajectory are then merged to form a skill tree. We demonstrate that CST constructs an appropriate skill tree that can be further refined through learning in a challenging continuous domain, and that it can be used to segment demonstration trajectories on a mobile manipulator ...


Learning From A Single Demonstration: Motion Planning With Skill Segmentation, Scott Kuindersma, George Konidaris, Roderic Grupen, Andrew Barto 2009 University of Massachusetts - Amherst

Learning From A Single Demonstration: Motion Planning With Skill Segmentation, Scott Kuindersma, George Konidaris, Roderic Grupen, Andrew Barto

Roderic Grupen

We propose an approach to control learning from demonstration that first segments demonstration trajectories to identify subgoals to solve the overall task. Using this approach, we show that a mobile robot is able to solve a combined navigation and manipulation task robustly after observing only a single successful trajectory.


Joint Feature Selection And Classification For Taxonomic Problems Within Fish Species Complexes, Yixin Chen, Shuqing Huang, Huimin Chen, Henry L. Bart 2009 University of Mississippi

Joint Feature Selection And Classification For Taxonomic Problems Within Fish Species Complexes, Yixin Chen, Shuqing Huang, Huimin Chen, Henry L. Bart

Huimin Chen

It is estimated that 90% of the world’s species are yet to be discovered and described. The main reason for the slow pace of new species description is that the science of taxonomy can be very laborious. To formally describe a new species, taxonomists have to manually gather and analyze data from large numbers of specimens and identify the smallest subset of external body characters that uniquely diagnose the new species as distinct from all its known relatives. In this paper, we present an automated feature selection and classification scheme using logistic regression with controlled false discovery rate to ...


Using Description Logics For The Provision Of Context-Driven Content Adaptation Services, Stephen J.H. Yang, Jia Zhang, Jeff J.S. Huang, Jeffrey J.P. Tsai 2009 Carnegie Mellon University, Silicon Valley

Using Description Logics For The Provision Of Context-Driven Content Adaptation Services, Stephen J.H. Yang, Jia Zhang, Jeff J.S. Huang, Jeffrey J.P. Tsai

Jia Zhang

No abstract provided.


Constraint-Driven Rank-Based Learning For Information Extraction, Sameer Singh, Limin Yao, Sebastian Riedel, Andrew McCallum 2009 University of Massachusetts - Amherst

Constraint-Driven Rank-Based Learning For Information Extraction, Sameer Singh, Limin Yao, Sebastian Riedel, Andrew Mccallum

Andrew McCallum

Most learning algorithms for factor graphs require complete inference over the dataset or an instance before making an update to the parameters. SampleRank is a rank-based learning framework that alleviates this problem by updating the parameters during inference. Most semi-supervised learning algorithms also rely on the complete inference, i.e. calculating expectations or MAP configurations. We extend the SampleRank framework to the semi-supervised learning, avoiding these inference bottlenecks. Different approaches for incorporating unlabeled data and prior knowledge into this framework are explored. We evaluated our method on a standard information extraction dataset. Our approach outperforms the supervised method significantly and ...


Modeling Relations And Their Mentions Without Labeled Text, Sebastian Riedel, Limin Yao, Andrew McCallum 2009 University of Massachusetts - Amherst

Modeling Relations And Their Mentions Without Labeled Text, Sebastian Riedel, Limin Yao, Andrew Mccallum

Andrew McCallum

Several recent works on relation extraction have been applying the distant supervision paradigm: instead of relying on annotated text to learn how to predict relations, they employ existing knowledge bases (KBs) as source of supervision. Crucially, these approaches are trained based on the assumption that each sentence which mentions the two related entities is an expression of the given relation. Here we argue that this leads to noisy patterns that hurt precision, in particular if the knowledge base is not directly related to the text we are working with. We present a novel approach to distant supervision that can alleviate ...


A Novel Peer-To-Peer Sms Security Solution Using A Hybrid Technique Of Ntru And Aes-Rijndael, Miss Laiha Mat Kiah 2009 University of Malaya

A Novel Peer-To-Peer Sms Security Solution Using A Hybrid Technique Of Ntru And Aes-Rijndael, Miss Laiha Mat Kiah

Miss Laiha Mat Kiah

Short message service (SMS) is a very popular and easy to use communications technology for mobile phone devices. Originally, this service was not designed to transmit secured data, so the security was not an important issue during its design. Yet today, it is sometimes used to exchange sensitive information between communicating parties. This paper proposes an alternative solution that provides a peer-to-peer SMS security that guarantees provision of confidentiality, authentication, integrity and non-repudiation security services. A hybrid cryptographic scheme has been used which combines the NTRU and AES-Rijndael algorithms to achieve more robust functionality. For implementation, a mobile information device ...


High Performance Computing Instrumentation And Research Productivity In U.S. Universities, Linh B. Ngo, Amy W. Apon, Stanley Ahalt, Vijay Dantuluri, Constantin Gurdgiev, Moez Limayem, Michael Stealey 2009 Clemson University

High Performance Computing Instrumentation And Research Productivity In U.S. Universities, Linh B. Ngo, Amy W. Apon, Stanley Ahalt, Vijay Dantuluri, Constantin Gurdgiev, Moez Limayem, Michael Stealey

Linh B Ngo

This paper studies the relationship between investments in High-Performance Computing (HPC) instrumentation and research competitiveness. Measures of institutional HPC investment are computed from data that is readily available from the Top 500 list, a list that has been published twice a year since 1993 that lists the fastest 500 computers in the world at that time. Institutions that are studied include US doctoral-granting institutions that fall into the very high or high research rankings according to the Carnegie Foundation classifications and additional institutions that have had entries in the Top 500 list. Research competitiveness is derived from federal funding data ...


Designing Digital Library Of Perso-Arabic Script: An Experiment., Nadim Akhtar Khan, Rosy Jan, Sheikh Shazia 2009 University of Kashmir

Designing Digital Library Of Perso-Arabic Script: An Experiment., Nadim Akhtar Khan, Rosy Jan, Sheikh Shazia

NADIM AKHTAR KHAN

The Greenstone digital library software is a comprehensive system for building and distributing digital library collections providing a new way for organizing and publishing information on the Internet. The paper describes how multilingual digital library collections can be created and customized using various features available in Greenstone Digital Library Software. It is an attempt towards creating and managing digital library collection in different scripts for M. Phil and Ph. D theses submitted to University of Kashmir in Arabic, Persian, Kashmiri and Urdu.


Knoor: Knowledge Repository Open Network., S M. Shafi, Nadim Akhtar Khan, Rosy Jan 2009 university of Kashmir, Srinagar, INDIA, 190006

Knoor: Knowledge Repository Open Network., S M. Shafi, Nadim Akhtar Khan, Rosy Jan

NADIM AKHTAR KHAN

The paper discusses KNoor (Knowledge Repository Open Network) which aims at harvesting and aggregating the scholarly products emanating from research and scientific institutions of Jammu & Kashmir. The paper highlights genesis, significance, cooperation and various modules of access etc with screen shots as a unique cooperative, multilingual repository of research papers, ETDs and conference proceedings of three premier institutes of valley (University of Kashmir, Sher-e-Kashmir Institute of Medical sciences, Sher-e-Kashmir University of Agricultural Sciences) in the first phase. It also discusses challenges and lessons learnt in the process with a view to help other institutions to advance the mission of Open ...


Integrating Software Assurance Into The Software Development Life Cycle (Sdlc), Maurice Dawson, Darrell N. Burrell, Emad Rahim, Stephen Brewster 2009 Oklahoma State University - Main Campus

Integrating Software Assurance Into The Software Development Life Cycle (Sdlc), Maurice Dawson, Darrell N. Burrell, Emad Rahim, Stephen Brewster

Maurice Dawson

This article examines the integration of secure coding practices into the overall Software Development Life Cycle (SDLC). Also detailed is a proposed methodology for integrating software assurance into the Department of Defense Information Assurance Certification & Accreditation Process (DIACAP). This method for integrating software assurance helps in properly securing the application layer as that is where more than half of the vulnerabilities lie in a system.


Reverse Engineering For Mobile Systems Forensics With Ares, John Tuttle, Robert J. Walls, Erik G. Learned-Miller, Brian Neil Levine 2009 University of Massachusetts - Amherst

Reverse Engineering For Mobile Systems Forensics With Ares, John Tuttle, Robert J. Walls, Erik G. Learned-Miller, Brian Neil Levine

Erik G Learned-Miller

We present Ares ,areverseengineeringtechniqueforassist- ing in the analysis of data recovered for the investigation of mobile and embedded systems. The focus of investigations into insider activity is most often on the data stored on the insider’s computers and digital devices — call logs, email messaging, calendar entries, text messages, and browser his- tory — rather than on the status of the system’s security. Ares is novel in that it uses a data-driven approach that in- corporates natural language processing techniques to infer the layout of input data that has been created according to some unknown specification. While some other reverse ...


Digital Commons powered by bepress