Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Physical Sciences and Mathematics

A Hidden Markov Model For Alphabet-Soup Word Recognition, Shaolei Feng, Nicholas Howe, R. Manmatha Jan 2008

A Hidden Markov Model For Alphabet-Soup Word Recognition, Shaolei Feng, Nicholas Howe, R. Manmatha

R. Manmatha

Recent work on the ``alphabet soup'' paradigm has demonstrated effective segmentation-free character-based recognition of cursive handwritten historical text documents. The approach first uses a joint boosting technique to detect potential characters - the alphabet soup. A second stage uses a dynamic programming algorithm to recover the correct sequence of characters. Despite experimental success, the ad hoc dynamic programming method previously lacked theoretical justification. This paper puts the method on a sounder footing by recasting the dynamic programming as inference on an ensemble of hidden Markov models (HMMs). Although some work has questioned the use of score outputs from classifiers like …


Distributed Image Search In Camera Sensor Networks, Tingxin Yan, Deepak Ganesan, R. Manmatha Jan 2008

Distributed Image Search In Camera Sensor Networks, Tingxin Yan, Deepak Ganesan, R. Manmatha

R. Manmatha

Recent advances in sensor networks permit the use of a large number of relatively inexpensive distributed computational nodes with camera sensors linked in a network and possibly linked to one or more central servers. We argue that the full potential of such a distributed system can be realized if it is designed as a distributed search engine where images from different sensors can be captured, stored, searched and queried. However, unlike traditional image search engines that are focused on resource-rich situations, the resource limitations of camera sensor networks in terms of energy, band- width, computational power, and memory capacity present …


Bayesian Modeling Of Dependency Trees Using Hierarchical Pitman-Yor Priors, Hanna Wallach, Charles Sutton, Andrew Mccallum Jan 2008

Bayesian Modeling Of Dependency Trees Using Hierarchical Pitman-Yor Priors, Hanna Wallach, Charles Sutton, Andrew Mccallum

Hanna M. Wallach

In this paper, we introduce two hierarchical Bayesian dependency parsing models. First, we show that a classic dependency parser can be substantially improved by (a) using a hierarchical Pitman-Yor process prior over the distribution over dependents of a word, and (b) sampling the model hyperparameters. Second, we present a parsing model in which latent state variables mediate the relationships between words and their dependents. The model clusters dependencies into states using a similar approach to that used by Bayesian topic models when clustering words into topics. The inferred states have a syntactic character, and lead to modestly improved parse accuracy …


Bayesian Modeling Of Dependency Trees Using Hierarchical Pitman-Yor Priors, Hanna Wallach, Charles Sutton, Andrew Mccallum Jan 2008

Bayesian Modeling Of Dependency Trees Using Hierarchical Pitman-Yor Priors, Hanna Wallach, Charles Sutton, Andrew Mccallum

Hanna M. Wallach

In this paper, we introduce two hierarchical Bayesian dependency parsing models. First, we show that a classic dependency parser can be substantially improved by (a) using a hierarchical Pitman-Yor process prior over the distribution over dependents of a word, and (b) sampling the model hyperparameters. Second, we present a parsing model in which latent state variables mediate the relationships between words and their dependents. The model clusters dependencies into states using a similar approach to that used by Bayesian topic models when clustering words into topics. The inferred states have a syntactic character, and lead to modestly improved parse accuracy …


Intelligent Email: Aiding Users With Ai, Mark Dredze, Hanna Wallach, Danny Puller, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer, Fernando Pereira Jan 2008

Intelligent Email: Aiding Users With Ai, Mark Dredze, Hanna Wallach, Danny Puller, Tova Brooks, Josh Carroll, Joshua Magarick, John Blitzer, Fernando Pereira

Hanna M. Wallach

Email occupies a central role in the modern workplace. This has led to a vast increase in the number of email messages that users are expected to handle daily. Furthermore, email is no longer simply a tool for asynchronous online communication---email is now used for task management, personal archiving, as well both synchronous and asynchronous online communication. This explosion can lead to ``email overload''---many users are overwhelmed by the large quantity of information in their mailboxes. In the human--computer interaction community, there has been much research on tackling email overload. Recently, similar efforts have emerged in the artificial intelligence (AI) …


Gibbs Sampling For Logistic Normal Topic Models With Graph-Based Priors, David Mimno, Hanna Wallach, Andrew Mccallum Jan 2008

Gibbs Sampling For Logistic Normal Topic Models With Graph-Based Priors, David Mimno, Hanna Wallach, Andrew Mccallum

Hanna M. Wallach

Previous work on probabilistic topic models has either focused on models with relatively simple conjugate priors that support Gibbs sampling or models with non-conjugate priors that typically require variational inference. Gibbs sampling is more accurate than variational inference and better supports the construction of composite models. We present a method for Gibbs sampling in non-conjugate logistic normal topic models, and demonstrate it on a new class of topic models with arbitrary graph-structured priors that reflect the complex relationships commonly found in document collections, while retaining simple, robust inference.