Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 14 of 14

Full-Text Articles in Physical Sciences and Mathematics

Experimenting Vireo-374: Bag-Of-Visual-Words And Visual-Based Ontology For Semantic Video Indexing And Search, Chong-Wah Ngo, Yu-Gang Jiang, Xiaoyong Wei, Feng Wang, Wanlei Zhao, Hung-Khoon Tan, Xiao Wu Nov 2007

Experimenting Vireo-374: Bag-Of-Visual-Words And Visual-Based Ontology For Semantic Video Indexing And Search, Chong-Wah Ngo, Yu-Gang Jiang, Xiaoyong Wei, Feng Wang, Wanlei Zhao, Hung-Khoon Tan, Xiao Wu

Research Collection School Of Computing and Information Systems

In this paper, we present our approaches and results of high-level feature extraction and automatic video search in TRECVID-2007.


Om-Based Video Shot Retrieval By One-To-One Matching, Yuxin Peng, Chong-Wah Ngo, Jianguo Xiao Oct 2007

Om-Based Video Shot Retrieval By One-To-One Matching, Yuxin Peng, Chong-Wah Ngo, Jianguo Xiao

Research Collection School Of Computing and Information Systems

This paper proposes a new approach for shot-based retrieval by optimal matching (OM), which provides an effective mechanism for the similarity measure and ranking of shots by one-to-one matching. In the proposed approach, a weighted bipartite graph is constructed to model the color similarity between two shots. Then OM based on Kuhn-Munkres algorithm is employed to compute the maximum weight of a constructed bipartite graph as the shot similarity value by one-to-one matching among frames. To improve the speed efficiency of OM, two improved algorithms are also proposed: bipartite graph construction based on subshots and bipartite graph construction based on …


Evaluating Bag-Of-Visual-Words Representations In Scene Classification, Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, Chong-Wah Ngo Sep 2007

Evaluating Bag-Of-Visual-Words Representations In Scene Classification, Jun Yang, Yu-Gang Jiang, Alexander G. Hauptmann, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Based on keypoints extracted as salient image patches, an image can be described as a “bag of visual words” and this representation has been used in scene classification. The choice of dimension, selection, and weighting of visual words in this representation is crucial to the classification performance but has not been thoroughly studied in previous work. Given the analogy between this representation and the bag-of-words representation of text documents, we apply techniques used in text categorization, including term weighting, stop word removal, feature selection, to generate image representations that differ in the dimension, selection, and weighting of visual words. The …


Practical Elimination Of Near-Duplicates From Web Video Search, Xiao Wu, Alexander G. Hauptmann, Chong-Wah Ngo Sep 2007

Practical Elimination Of Near-Duplicates From Web Video Search, Xiao Wu, Alexander G. Hauptmann, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Current web video search results rely exclusively on text keywords or user-supplied tags. A search on typical popular video often returns many duplicate and near-duplicate videos in the top results. This paper outlines ways to cluster and filter out the nearduplicate video using a hierarchical approach. Initial triage is performed using fast signatures derived from color histograms. Only when a video cannot be clearly classified as novel or nearduplicate using global signatures, we apply a more expensive local feature based near-duplicate detection which provides very accurate duplicate analysis through more costly computation. The results of 24 queries in a data …


Ontology-Enriched Semantic Space For Video Search, Xiao-Yong Wei, Chong-Wah Ngo Sep 2007

Ontology-Enriched Semantic Space For Video Search, Xiao-Yong Wei, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Multimedia-based ontology construction and reasoning have recently been recognized as two important issues in video search, particularly for bridging semantic gap. The lack of coincidence between low-level features and user expectation makes concept-based ontology reasoning an attractive midlevel framework for interpreting high-level semantics. In this paper, we propose a novel model, namely ontology-enriched semantic space (OSS), to provide a computable platform for modeling and reasoning concepts in a linear space. OSS enlightens the possibility of answering conceptual questions such as a high coverage of semantic space with minimal set of concepts, and the set of concepts to be developed for …


Rushes Video Summarization By Object And Event Understanding, Feng Wang, Chong-Wah Ngo Sep 2007

Rushes Video Summarization By Object And Event Understanding, Feng Wang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

This paper explores a variety of visual and audio analysis techniques in selecting the most representative video clips for rushes summarization at TRECVID 2007. These techniques include object detection, camera motion estimation, keypoint matching and tracking, audio classification and speech recognition. Our system is composed of two major steps. First, based on video structuring, we filter undesirable shots and minimize the inter-shot redundancy by repetitive shot detection. Second, a representability measure is proposed to model the presence of objects and four audio-visual events: motion activity of objects, camera motion, scene changes, and speech content, in a video clip. The video …


Near-Duplicate Keyframe Identification With Interest Point Matching And Pattern Learning, Wan-Lei Zhao, Chong-Wah Ngo, Hung-Khoon Tan, Xiao Wu Aug 2007

Near-Duplicate Keyframe Identification With Interest Point Matching And Pattern Learning, Wan-Lei Zhao, Chong-Wah Ngo, Hung-Khoon Tan, Xiao Wu

Research Collection School Of Computing and Information Systems

This paper proposes a new approach for near-duplicate keyframe (NDK) identification by matching, filtering and learning of local interest points (LIPs) with PCA-SIFT descriptors. The issues in matching reliability, filtering efficiency and learning flexibility are novelly exploited to delve into the potential of LIP-based retrieval and detection. In matching, we propose a one-to-one symmetric matching (OOS) algorithm which is found to be highly reliable for NDK identification, due to its capability in excluding false LIP matches compared with other matching strategies. For rapid filtering, we address two issues: speed efficiency and search effectiveness, to support OOS with a new index …


Towards Optimal Bag-Of-Features For Object Categorization And Semantic Video Retrieval, Yu-Gang Jiang, Chong-Wah Ngo, Jun Yang Jul 2007

Towards Optimal Bag-Of-Features For Object Categorization And Semantic Video Retrieval, Yu-Gang Jiang, Chong-Wah Ngo, Jun Yang

Research Collection School Of Computing and Information Systems

Bag-of-features (BoF) deriving from local keypoints has recently appeared promising for object and scene classification. Whether BoF can naturally survive the challenges such as reliability and scalability of visual classification, nevertheless, remains uncertain due to various implementation choices. In this paper, we evaluate various factors which govern the performance of BoF. The factors include the choices of detector, kernel, vocabulary size and weighting scheme. We offer some practical insights in how to optimize the performance by choosing good keypoint detector and kernel. For the weighting scheme, we propose a novel soft-weighting method to assess the significance of a visual word …


Efficient Near-Duplicate Keyframe Retrieval With Visual Language Models, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo Jul 2007

Efficient Near-Duplicate Keyframe Retrieval With Visual Language Models, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Near-duplicate keyframe retrieval is a critical task for video similarity measure, video threading and tracking. In this paper, instead of using expensive point-to-point matching on keypoints, we investigate the visual language models built on visual keywords to speed up the near-duplicate keyframe retrieval. The main idea is to estimate a visual language model on visual keywords for each keyframe and compare keyframes by the likelihood of their visual language models. Experiments on a subset of TRECVID-2004 video corpus show that visual language models built on visual keywords demonstrate promising performance for near-duplicate keyframe retrieval, which greatly speed up the retrieval …


Moving-Object Detection, Association, And Selection In Home Videos, Zailiang Pan, Chong-Wah Ngo Feb 2007

Moving-Object Detection, Association, And Selection In Home Videos, Zailiang Pan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Due to the prevalence of digital video camcorders, home videos have become an important part of life-logs of personal experiences. To enable efficient video parsing, a critical step is to automatically extract objects, events and scene characteristics present in videos. This paper addresses the problem of extracting objects from home videos. Automatic detection of objects is a classical yet difficult vision problem, particularly for videos with complex scenes and unrestricted domains. Compared with edited and surveillant videos, home videos captured in uncontrolled environment are usually coupled with several notable features such as shaking artifacts, irregular motions, and arbitrary settings. These …


Lecture Video Enhancement And Editing By Integrating Posture, Gesture, And Text, Feng Wang, Chong-Wah Ngo, Ting-Chuen Pong Feb 2007

Lecture Video Enhancement And Editing By Integrating Posture, Gesture, And Text, Feng Wang, Chong-Wah Ngo, Ting-Chuen Pong

Research Collection School Of Computing and Information Systems

This paper describes a novel framework for automatic lecture video editing by gesture, posture, and video text recognition. In content analysis, the trajectory of hand movement is tracked and the intentional gestures are automatically extracted for recognition. In addition, head pose is estimated through overcoming the difficulties due to the complex lighting conditions in classrooms. The aim of recognition is to characterize the flow of lecturing with a series of regional focuses depicted by human postures and gestures. The regions of interest (ROIs) in videos are semantically structured with text recognition and the aid of external documents. By tracing the …


Mining Multiple Visual Appearances Of Semantics For Image Annotation, Hung-Khoon Tan, Chong-Wah Ngo Jan 2007

Mining Multiple Visual Appearances Of Semantics For Image Annotation, Hung-Khoon Tan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

This paper investigates the problem of learning the visual semantics of keyword categories for automatic image annotation. Supervised learning algorithms which learn only a single concept point of a category are limited in their effectiveness for image annotation. We propose to use data mining techniques to mine multiple concepts, where each concept may consist of one or more visual parts, to capture the diverse visual appearances of a single keyword category. For training, we use the Apriori principle to efficiently mine a set of frequent blobsets to capture the semantics of a rich and diverse visual category. Each concept is …


Introduction: Special Issue For The Selected Papers In The Fourth International Conference On Intelligent Multimedia Computing And Networking (Immcn) 2005, Chong-Wah Ngo, Hong-Va Leong Jan 2007

Introduction: Special Issue For The Selected Papers In The Fourth International Conference On Intelligent Multimedia Computing And Networking (Immcn) 2005, Chong-Wah Ngo, Hong-Va Leong

Research Collection School Of Computing and Information Systems

This special issue introduces seven papers selected from the IMMCN’ 2005, covering a wide range of emerging topics in multimedia field. These papers receive high scores and good comments from the reviewers in their respective areas of intelligent and nextgeneration networking, technology and application, multimedia coding, content analysis and retrieval. The seven papers are extended to 20 pages and then gone through another review process before the final publication. In this issue, we have two papers for video streaming, two papers for multimedia applications, one paper for video coding, and two papers for image and video retrieval.


Image Segmentation Using Multi-Coloured Active Illumination, Tze Ki (Xu Shuqi) Koh, Nicholas Miles, Barrie Hayes-Gill Jan 2007

Image Segmentation Using Multi-Coloured Active Illumination, Tze Ki (Xu Shuqi) Koh, Nicholas Miles, Barrie Hayes-Gill

Research Collection School Of Computing and Information Systems

In this paper, the use of active illumination is extended to image segmentation, specifically in the case of overlapping particles. This work is based on Multi-Flash Imaging (MFI), originally developed by Mitsubishi Electric Labs, to detect depth discontinuities. Illuminations of different wavelengths are projected from multiple positions, providing additional information about a scene compared to conventional segmentation techniques. Shadows are used to identify true object edges. The identification of non- occluded particles is made possible by exploiting the fact that shadows are cast on underlying particles. Implementation issues such as selecting the appropriate colour model and number of illuminations are …