Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Combining Text And Audio-Visual Features In Video Indexing, Shih-Fu Chang, R. Manmatha, Tat-Seng Chua Dec 2004

Combining Text And Audio-Visual Features In Video Indexing, Shih-Fu Chang, R. Manmatha, Tat-Seng Chua

R. Manmatha

We discuss the opportunities, state of the art, and open research issues in using multi-modal features in video indexing. Specifically, we focus on how imperfect text data obtained by automatic speech recognition (ASR) may be used to help solve challenging problems, such as story segmentation, concept detection, retrieval, and topic clustering. We review the frameworks and machine learning techniques that are used to fuse the text features with audio-visual features. Case studies showing promising performance will be described, primarily in the broadcast news video domain.


Boosted Decision Trees For Word Recognition In Handwritten Document Retrieval, Nicholas R. Howe, Toni M. Rath, R. Manmatha Dec 2004

Boosted Decision Trees For Word Recognition In Handwritten Document Retrieval, Nicholas R. Howe, Toni M. Rath, R. Manmatha

R. Manmatha

Recognition and retrieval of historical handwritten material is an unsolved problem. We propose a novel approach to recognizing and retrieving handwritten manuscripts, based upon word image classification as a key step. Decision trees with normalized pixels as features form the basis of a highly accurate AdaBoost classifier, trained on a corpus of word images that have been resized and sampled at a pyramid of resolutions. To stem problems from the highly skewed distribution of class frequencies, word classes with very few training samples are augmented with stochastically altered versions of the originals. This increases recognition performance substantially. On a standard …


Joint Visualtext Modeling For Automatic Retrieval Of Multimedia Documents, G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. P. Khudanpur, D. Klakow, M. R. Krause, R. Manmatha, H. J. Nock, D. Petkova, B. Pytlik, P. Virga Dec 2004

Joint Visualtext Modeling For Automatic Retrieval Of Multimedia Documents, G. Iyengar, P. Duygulu, S. Feng, P. Ircing, S. P. Khudanpur, D. Klakow, M. R. Krause, R. Manmatha, H. J. Nock, D. Petkova, B. Pytlik, P. Virga

R. Manmatha

In this paper we describe our approach for jointly modeling the text part and the visual part of multimedia documents for the purpose of information retrieval(IR). In the prevalent state-of-the-art systems, a late combination between two independent systems, one analyzing just the text part of such documents, and the other analyzing the visual part without leveraging any knowledge acquired in the text processing, is the norm. Such systems rarely exceed the performance of any single modality (i.e. text or video) in information retrieval tasks. Our experiments indicate that allowing a rich interaction between the modalities results in signi.- cant improvement …


Classification Models For Historical Manuscript Recognition, S. L. Feng, R. Manmatha Dec 2004

Classification Models For Historical Manuscript Recognition, S. L. Feng, R. Manmatha

R. Manmatha

This paper investigates different machine learning models to solve the historical handwritten manuscript recognition problem. In particular, we test and compare support vector machines, conditional maximum entropy models and Naive Bayes with kernel density estimates and explore their behaviors and properties when solving this problem. We focus on a whole word problem to avoid having to do character segmentation which is difficult with degraded handwritten documents. Our results on a publicly available standard dataset of 20 pages of George Washington's manuscripts show that Naive Bayes with Gaussian kernel density estimates significantly outperforms the other models and prior work using hidden …