Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Automatic Segmentation And Indexing Of Specialized Databases, Madirakshi Das, R. Manmatha Jan 2002

Automatic Segmentation And Indexing Of Specialized Databases, Madirakshi Das, R. Manmatha

R. Manmatha

The aim of this work is to index images based on color, in domain specific databases using colors computed from the object of interest only, instead of using the whole image. The main problem in this task is the segmentation of the region of interest from the background. Viewing segmentation as a figure/ground segregation problem leads to a new approach--successful elimination of the background leaves the figure or object of interest. The background elements are eliminated using general observations true for any photograph where there is a single, prominent object of interest. First, we form a hypothesis about possible background …


Modeling Score Distributions For Meta Search, R. Manmatha, T. Rath, F. Feng Jan 2002

Modeling Score Distributions For Meta Search, R. Manmatha, T. Rath, F. Feng

R. Manmatha

In this paper the score distributions of a number of text search engines are modeled. It is shown empirically that the score distributions on a per query basis may be modeled using an exponential distribution for the set of non-relevant documents and a normal distribution for the set of relevant documents. Experiments show that this model fits TREC-3 and TREC-4 data for a wide variety of different search engines including INQUERY a probabilistic search engine, SMART a vector space engine, and search engines based on latent semantic indexing and language modeling. The model also works when search engines index other …


Features For Word Spotting In Historical Manuscripts, Toni M. Rath, R. Manmatha Dec 2001

Features For Word Spotting In Historical Manuscripts, Toni M. Rath, R. Manmatha

R. Manmatha

For the transition from traditional to digital libraries, the large number of handwritten manuscripts that exist pose a great challenge. Easy access to such collections requires an index, which is currently created manually at great cost. Because automatic handwriting recognizers fail on historical manuscripts, the word spotting technique has been developed: the words in a collection are matched as images and grouped into clusters which contain all instances of the same word. By annotating ``interesting clusters," an index that links words to the locations where they occur can be built automatically.

Due to the noise in historical documents, selecting the …


Word Image Matching Using Dynamic Time Warping, Toni M. Rath, R. Manmatha Dec 2001

Word Image Matching Using Dynamic Time Warping, Toni M. Rath, R. Manmatha

R. Manmatha

Libraries and other institutions are interested in providing access to scanned versions of their large collections of handwritten historical manuscripts on the web or on CDROMs. Providing convenient access to a collection requires an index which is manually created at great labour and expense. Since current handwriting recognizers do not perform well on historical documents, a technique called word spotting has been developed. It addresses the need for indexing single-author handwritten historical manuscripts in a new way: word images are matched to form clusters which contain occurrences of the same word throughout a collection. By annotating ``interesting" clusters, an index …


A Formal Approach To Score Normalization For Metasearch, H. Sever, R. Manmatha Dec 2001

A Formal Approach To Score Normalization For Metasearch, H. Sever, R. Manmatha

R. Manmatha

Meta-search, or the combination of the outputs of different search engines in response to a query, has been shown to improve performance. Since the scores produced by different search engines are not comparable, researchers have often decomposed the metasearch problem into a score normalization step followed by a combination step. Combination has been studied by many researchers. While appropriate normalization can affect performance, most of the normalization schemes suggested are ad hoc in nature.

In this paper, we propose a formal approach to normalizing scores for meta-search by taking the distributions of the scores into account. Recently, it has been …


A Critical Examination Of Tdt’S Cost Function, R. Manmatha, Ao Feng, James Allan Dec 2001

A Critical Examination Of Tdt’S Cost Function, R. Manmatha, Ao Feng, James Allan

R. Manmatha

Topic Detection and Tracking (TDT) tasks are evaluated using a cost function. The standard TDT cost function assumes a constant probability of relevance P(rel) across all topics. In practice, P(rel) varies widely across topics. We argue using both theoretical and experimental evidence that the cost function should be modified to account for the varying P(rel).


Indexing For A Digital Library Of George Washington’S Manuscripts: A Study Of Word Matching Techniques, T. M. Rath, S. Kane, A. Lehman, E. Partridge, R. Manmatha Dec 2001

Indexing For A Digital Library Of George Washington’S Manuscripts: A Study Of Word Matching Techniques, T. M. Rath, S. Kane, A. Lehman, E. Partridge, R. Manmatha

R. Manmatha

In a multimedia world, one would like electronic access to all kinds of information. But a lot of important information still only exists on paper and it is a challenge to efficiently access or navigate this information even if it is scanned in. The previously proposed \word spotting" idea is an approach for accessing and navigating a collection of handwritten documents available as images using an index automatically generated by matching words as pictures. The most difficult task in solving this problem is the matching of word images. The quality of the aged documents and the variations in handwriting make …