Physical Sciences and Mathematics | Open Access Articles

Vireo/Dvmm At Trecvid 2009: High-Level Feature Extraction, Automatic Video Search, And Content-Based Copy Detection, Chong-Wah Ngo, Yu-Gang Jiang, Xiao-Yong Wei, Wanlei Zhao, Yang Liu, Jun Wang, Shiai Zhu, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

This paper presents overview and comparative analysis of our systems designed for 3 TRECVID 2009 tasks: high-level feature extraction, automatic search, and content-based copy detection.

Go to article

Semantic Context Transfer Across Heterogeneous Sources For Domain Adaptive Video Search, Yu-Gang Jiang, Chong-Wah Ngo, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

Automatic video search based on semantic concept detectors has recently received significant attention. Since the number of available detectors is much smaller than the size of human vocabulary, one major challenge is to select appropriate detectors to response user queries. In this paper, we propose a novel approach that leverages heterogeneous knowledge sources for domain adaptive video search. First, instead of utilizing WordNet as most existing works, we exploit the context information associated with Flickr images to estimate query-detector similarity. The resulting measurement, named Flickr context similarity (FCS), reflects the co-occurrence statistics of words in image context rather than textual …

Go to article

Localizing Volumetric Motion For Action Recognition In Realistic Videos, Xiao Wu, Chong-Wah Ngo, Jintao Li, Yongdong Zhang

Research Collection School Of Computing and Information Systems

This paper presents a novel motion localization approach for recognizing actions and events in real videos. Examples include StandUp and Kiss in Hollywood movies. The challenge can be attributed to the large visual and motion variations imposed by realistic action poses. Previous works mainly focus on learning from descriptors of cuboids around space time interest points (STIP) to characterize actions. The size, shape and space-time position of cuboids are fixed without considering the underlying motion dynamics. This often results in large set of fragmentized cuboids which fail to capture long-term dynamic properties of realistic actions. This paper proposes the detection …

Go to article

Towards Google Challenge: Combining Contextual And Social Information For Web Video Categorization, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Web video categorization is a fundamental task for web video search. In this paper, we explore the Google challenge from a new perspective by combing contextual and social information under the scenario of social web. The semantic meaning of text (title and tags), video relevance from related videos, and user interest induced from user videos, are integrated to robustly determine the video category. Experiments on YouTube videos demonstrate the effectiveness of the proposed solution. The performance reaches 60% improvement compared to the traditional text based classifiers.

Go to article

Distribution-Based Concept Selection For Concept-Based Video Retrieval, Juan Cao, Hongfang Jing, Chong-Wah Ngo, Yongdong Zhang

Research Collection School Of Computing and Information Systems

Query-to-concept mapping plays one of the keys to concept-based video retrieval. Conventional approaches try to find concepts that are likely to co-occur in the relevant shots from the lexical or statistical aspects. However, the high probability of co-occurrence alone cannot ensure its effectiveness to distinguish the relevant shots from the irrelevant ones. In this paper, we propose distribution-based concept selection (DBCS) for query-to-concept mapping by analyzing concept score distributions of within and between relevant and irrelevant sets. In view of the imbalance between relevant and irrelevant examples, two variants of DBCS are proposed respectively by considering the two-sided and onesided …

Go to article

Scalable Detection Of Partial Near-Duplicate Videos By Visual-Temporal Consistency, Hung-Khoon Tan, Chong-Wah Ngo, Richang Hong, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Following the exponential growth of social media, there now exist huge repositories of videos online. Among the huge volumes of videos, there exist large numbers of near-duplicate videos. Most existing techniques either focus on the fast retrieval of full copies or near-duplicates, or consider localization in a heuristic manner. This paper considers the scalable detection and localization of partial near-duplicate videos by jointly considering visual similarity and temporal consistency. Temporal constraints are embedded into a network structure as directed edges. Through the structure, partial alignment is novelly converted into a network flow problem where highly efficient solutions exist. To precisely …

Go to article

Domain Adaptive Semantic Diffusion For Large Scale Context-Based Video Annotation, Yu-Gang Jiang, Jun Wang, Shih-Fu Chang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Learning to cope with domain change has been known as a challenging problem in many real-world applications. This paper proposes a novel and efficient approach, named domain adaptive semantic diffusion (DASD), to exploit semantic context while considering the domain-shift-of-context for large scale video concept annotation. Starting with a large set of concept detectors, the proposed DASD refines the initial annotation results using graph diffusion technique, which preserves the consistency and smoothness of the annotation over a semantic graph. Different from the existing graph learning methods which capture relations among data samples, the semantic graph treats concepts as nodes and the …

Go to article

A Latent Model For Visual Disambiguation Of Keyword-Based Image Search, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia, Sujoy Roy

Research Collection School Of Computing and Information Systems

The problem of polysemy in keyword-based image search arises mainly from the inherent ambiguity in user queries. We propose a latent model based approach that resolves user search ambiguity by allowing sense specific diversity in search results. Given a query keyword and the images retrieved by issuing the query to an image search engine, we first learn a latent visual sense model of these polysemous images. Next, we use Wikipedia to disambiguate the word sense of the original query, and issue these Wiki-senses as new queries to retrieve sense specific images. A sense-specific image classifier is then learnt by combining …

Go to article

Localized Matching Using Earth Mover's Distance Towards Discovery Of Common Patterns From Small Image Samples, Hung-Khoon Tan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

This paper proposes a new approach for the discovery of common patterns in a small set of images by region matching. The issues in feature robustness, matching robustness and noise artifact are addressed to delve into the potential of using regions as the basic matching unit. We novelly employ the many-to-many (M2M) matching strategy, specifically with the Earth Mover's Distance (EMD), to increase resilience towards the structural inconsistency from improper region segmentation. However, the matching pattern of M2M is dispersed and unregulated in nature, leading to the challenges of mining a common pattern while identifying the underlying transformation. To avoid …

Go to article

A Bayesian Approach Integrating Regional And Global Features For Image Semantic Learning, Luong-Dong Nguyen, Ghim-Eng Yap, Ying Liu, Ah-Hwee Tan, Liang-Tien Chia, Joo-Hwee Lim

Research Collection School Of Computing and Information Systems

In content-based image retrieval, the “semantic gap” between visual image features and user semantics makes it hard to predict abstract image categories from low-level features. We present a hybrid system that integrates global features (Gfeatures) and region features (R-features) for predicting image semantics. As an intermediary between image features and categories, we introduce the notion of mid-level concepts, which enables us to predict an image’s category in three steps. First, a G-prediction system uses G-features to predict the probability of each category for an image. Simultaneously, a R-prediction system analyzes R-features to identify the probabilities of mid-level concepts in that …

Go to article

Exploring Inter-Concept Relationship With Context Space For Semantic Video Indexing, Xiao-Yong Wei, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Semantic concept detectors are often individually and independently developed. Using peripherally related concepts for leveraging the power of joint detection, which is referred to as context-based concept fusion (CBCF), has been one of the focus studies in recent years. This paper proposes the construction of a context space and the exploration of the space for CBCF. Context space considers the global consistency of concept relationship, addresses the problem of missing annotation, and is extensible for cross-domain contextual fusion. The space is linear and can be built by modeling the inter-concept relationship through annotation provided by either manual labeling or machine …

Go to article

Large-Scale Near-Duplicate Web Video Search: Challenge And Opportunity, Wan-Lei Zhao, Song Tan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

The massive amount of near-duplicate and duplicate web videos has presented both challenge and opportunity to multimedia computing. On one hand, browsing videos on Internet becomes highly inefficient for the need to repeatedly fast-forward videos of similar content. On the other hand, the tremendous amount of somewhat duplicate content also makes some traditionally difficult vision tasks become simple and easy. For example, annotating pictures can be as simple as recycling the tags of Internet images retrieved from image search engines. Such tasks, of either to eliminate or to recycle near-duplicates, can usually be achieved by the nearest neighbor search of …

Go to article

A Revisit Of Generative Model For Automatic Image Annotation Using Markov Random Fields, Yu Xiang, Xiangdong Zhou, Tat-Seng Chua, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Much research effort on Automatic Image Annotation (AIA) has been focused on Generative Model, due to its well formed theory and competitive performance as compared with many well designed and sophisticated methods. However, when considering semantic context for annotation, the model suffers from the weak learning ability. This is mainly due to the lack of parameter setting and appropriate learning strategy for characterizing the semantic context in the traditional generative model. In this paper, we present a new approach based on Multiple Markov Random Fields (MRF) for semantic context modeling and learning. Differing from previous MRF related AIA approach, we …

Go to article

Visual Word Proximity And Linguistics For Semantic Video Indexing And Near-Duplicate Retrieval, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Bag-of-visual-words (BoW) has recently become a popular representation to describe video and image content. Most existing approaches, nevertheless, neglect inter-word relatedness and measure similarity by bin-to-bin comparison of visual words in histograms. In this paper, we explore the linguistic and ontological aspects of visual words for video analysis. Two approaches, soft-weighting and constraint-based earth mover’s distance (CEMD), are proposed to model different aspects of visual word linguistics and proximity. In soft-weighting, visual words are cleverly weighted such that the linguistic meaning of words is taken into account for bin-to-bin histogram comparison. In CEMD, a cross-bin matching algorithm is formulated such …

Go to article

Real-Time Near-Duplicate Elimination For Web Video Search With Content And Context, Xiao Wu, Chong-Wah Ngo, Alexander G. Hauptmann

Research Collection School Of Computing and Information Systems

With the exponential growth of social media, there exist huge numbers of near-duplicate web videos, ranging from simple formatting to complex mixture of different editing effects. In addition to the abundant video content, the social web provides rich sets of context information associated with web videos, such as thumbnail image, time duration and so on. At the same time, the popularity of Web 2.0 demands for timely response to user queries. To balance the speed and accuracy aspects, in this paper, we combine the contextual information from time duration, number of views, and thumbnail images with the content analysis derived …

Go to article

Scale-Rotation Invariant Pattern Entropy For Keypoint-Based Near-Duplicate Detection, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Near-duplicate (ND) detection appears as a timely issue recently, being regarded as a powerful tool for various emerging applications. In the Web 2.0 environment particularly, the identification of near-duplicates enables the tasks such as copyright enforcement, news topic tracking, image and video search. In this paper, we describe an algorithm, namely Scale-Rotation invariant Pattern Entropy (SR-PE), for the detection of near-duplicates in large-scale video corpus. SR-PE is a novel pattern evaluation technique capable of measuring the spatial regularity of matching patterns formed by local keypoints. More importantly, the coherency of patterns and the perception of visual similarity, under the scenario …

Go to article

Physical Sciences and Mathematics Commons^™

Full-Text Articles in Physical Sciences and Mathematics

Vireo/Dvmm At Trecvid 2009: High-Level Feature Extraction, Automatic Video Search, And Content-Based Copy Detection, Chong-Wah Ngo, Yu-Gang Jiang, Xiao-Yong Wei, Wanlei Zhao, Yang Liu, Jun Wang, Shiai Zhu, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

Semantic Context Transfer Across Heterogeneous Sources For Domain Adaptive Video Search, Yu-Gang Jiang, Chong-Wah Ngo, Shih-Fu Chang

Research Collection School Of Computing and Information Systems

Localizing Volumetric Motion For Action Recognition In Realistic Videos, Xiao Wu, Chong-Wah Ngo, Jintao Li, Yongdong Zhang

Research Collection School Of Computing and Information Systems

Towards Google Challenge: Combining Contextual And Social Information For Web Video Categorization, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Distribution-Based Concept Selection For Concept-Based Video Retrieval, Juan Cao, Hongfang Jing, Chong-Wah Ngo, Yongdong Zhang

Research Collection School Of Computing and Information Systems

Scalable Detection Of Partial Near-Duplicate Videos By Visual-Temporal Consistency, Hung-Khoon Tan, Chong-Wah Ngo, Richang Hong, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Domain Adaptive Semantic Diffusion For Large Scale Context-Based Video Annotation, Yu-Gang Jiang, Jun Wang, Shih-Fu Chang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

A Latent Model For Visual Disambiguation Of Keyword-Based Image Search, Kong-Wah Wan, Ah-Hwee Tan, Joo-Hwee Lim, Liang-Tien Chia, Sujoy Roy

Research Collection School Of Computing and Information Systems

Localized Matching Using Earth Mover's Distance Towards Discovery Of Common Patterns From Small Image Samples, Hung-Khoon Tan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

A Bayesian Approach Integrating Regional And Global Features For Image Semantic Learning, Luong-Dong Nguyen, Ghim-Eng Yap, Ying Liu, Ah-Hwee Tan, Liang-Tien Chia, Joo-Hwee Lim

Research Collection School Of Computing and Information Systems

Exploring Inter-Concept Relationship With Context Space For Semantic Video Indexing, Xiao-Yong Wei, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Large-Scale Near-Duplicate Web Video Search: Challenge And Opportunity, Wan-Lei Zhao, Song Tan, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

A Revisit Of Generative Model For Automatic Image Annotation Using Markov Random Fields, Yu Xiang, Xiangdong Zhou, Tat-Seng Chua, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Visual Word Proximity And Linguistics For Semantic Video Indexing And Near-Duplicate Retrieval, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Real-Time Near-Duplicate Elimination For Web Video Search With Content And Context, Xiao Wu, Chong-Wah Ngo, Alexander G. Hauptmann

Research Collection School Of Computing and Information Systems

Scale-Rotation Invariant Pattern Entropy For Keypoint-Based Near-Duplicate Detection, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems