Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Deep learning (3)
- Visualization (2)
- Ad-hoc video search (1)
- Augmented reality (1)
- Automated Infographic Design (1)
-
- Benchmark testing (1)
- Cameras (1)
- Cause-and-effect reasoning (1)
- Chinese food (1)
- Compact bilinear mapping (1)
- Comparative Analysis (1)
- Complex events (1)
- Computational photography (1)
- Computer vision (1)
- Computing methodologies (1)
- Computing methodology (1)
- Concept-based search (1)
- Concept-free search (1)
- Cooking workflow (1)
- Cost-sensitive Deep Forest;Ensemble Deep Learning;Price Prediction;Modified K-means (1)
- Cross-domain transfer (1)
- Cross-modal food retrieval (1)
- Cross-modal translation (1)
- Data visualization (1)
- Datasets (1)
- Deep Learning (1)
- Deep Learning-based Approach (1)
- DeepFake (1)
- DeepFake detection (1)
- Dual-spatial-temporal attention (1)
Articles 1 - 30 of 50
Full-Text Articles in Physical Sciences and Mathematics
Sharper Generalisation Bounds For Pairwise Learning, Yunwen Lei, Antoine Ledent, Marius Kloft
Sharper Generalisation Bounds For Pairwise Learning, Yunwen Lei, Antoine Ledent, Marius Kloft
Research Collection School Of Computing and Information Systems
Pairwise learning refers to learning tasks with loss functions depending on a pair of training examples, which includes ranking and metric learning as specific examples. Recently, there has been an increasing amount of attention on the generalization analysis of pairwise learning to understand its practical behavior. However, the existing stability analysis provides suboptimal high-probability generalization bounds. In this paper, we provide a refined stability analysis by developing generalization bounds which can be √nn-times faster than the existing results, where nn is the sample size. This implies excess risk bounds of the order O(n−1/2) (up to a logarithmic factor) for both …
A Study Of Multi-Task And Region-Wise Deep Learning For Food Ingredient Recognition, Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang
A Study Of Multi-Task And Region-Wise Deep Learning For Food Ingredient Recognition, Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang
Research Collection School Of Computing and Information Systems
Food recognition has captured numerous research attention for its importance for health-related applications. The existing approaches mostly focus on the categorization of food according to dish names, while ignoring the underlying ingredient composition. In reality, two dishes with the same name do not necessarily share the exact list of ingredients. Therefore, the dishes under the same food category are not mandatorily equal in nutrition content. Nevertheless, due to limited datasets available with ingredient labels, the problem of ingredient recognition is often overlooked. Furthermore, as the number of ingredients is expected to be much less than the number of food categories, …
Exploring And Evaluating Attributes, Values, And Structures For Entity Alignment, Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua
Exploring And Evaluating Attributes, Values, And Structures For Entity Alignment, Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective …
Global Context Aware Convolutions For 3d Point Cloud Understanding, Zhiyuan Zhang, Binh-Son Hua, Wei Chen, Yibin Tian, Sai-Kit Yeung
Global Context Aware Convolutions For 3d Point Cloud Understanding, Zhiyuan Zhang, Binh-Son Hua, Wei Chen, Yibin Tian, Sai-Kit Yeung
Research Collection School Of Computing and Information Systems
Recent advances in deep learning for 3D point clouds have shown great promises in scene understanding tasks thanks to the introduction of convolution operators to consume 3D point clouds directly in a neural network. Point cloud data, however, could have arbitrary rotations, especially those acquired from 3D scanning. Recent works show that it is possible to design point cloud convolutions with rotation invariance property, but such methods generally do not perform as well as translation-invariant only convolution. We found that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. …
Cost-Sensitive Deep Forest For Price Prediction, Chao Ma, Zhenbing Liu, Zhiguang Cao, Wen Song, Jie Zhang, Weiliang Zeng
Cost-Sensitive Deep Forest For Price Prediction, Chao Ma, Zhenbing Liu, Zhiguang Cao, Wen Song, Jie Zhang, Weiliang Zeng
Research Collection School Of Computing and Information Systems
For many real-world applications, predicting a price range is more practical and desirable than predicting a concrete value. In this case, price prediction can be regarded as a classification problem. Although deep forest is recognized as the best solution to many classification problems, a crucial issue limits its direct application to price prediction, i.e., it treated all the misclassifications equally no matter how far away they are from the real classes, since their impacts on the accuracy are the same. This is unreasonable to price prediction as the misclassification should be as close to the real price range as possible …
Tangi: Tangible Proxies For Embodied Object Exploration And Manipulation In Virtual Reality, Martin Feick, Scott Bateman, Anthony Tang, Anthony Tang
Tangi: Tangible Proxies For Embodied Object Exploration And Manipulation In Virtual Reality, Martin Feick, Scott Bateman, Anthony Tang, Anthony Tang
Research Collection School Of Computing and Information Systems
Exploring and manipulating complex virtual objects is challenging due to limitations of conventional controllers and free-hand interaction techniques. We present the TanGi toolkit which enables novices to rapidly build physical proxy objects using Composable Shape Primitives. TanGi also provides Manipulators allowing users to build objects including movable parts, making them suitable for rich object exploration and manipulation in VR. With a set of different use cases and applications we show the capabilities of the TanGi toolkit and evaluate its use. In a study with 16 participants, we demonstrate that novices can quickly build physical proxy objects using the Composable Shape …
Fakepolisher: Making Deepfakes More Detection-Evasive By Shallow Reconstruction, Yihao Huang, Felix Juefei-Xu, Run Wang, Qing Guo, Lei Ma, Xiaofei Xie, Jianwen Li, Weikai Miao, Yang Liu, Geguang Pu
Fakepolisher: Making Deepfakes More Detection-Evasive By Shallow Reconstruction, Yihao Huang, Felix Juefei-Xu, Run Wang, Qing Guo, Lei Ma, Xiaofei Xie, Jianwen Li, Weikai Miao, Yang Liu, Geguang Pu
Research Collection School Of Computing and Information Systems
At this moment, GAN-based image generation methods are still imperfect, whose upsampling design has limitations in leaving some certain artifact patterns in the synthesized image. Such artifact patterns can be easily exploited (by recent methods) for difference detection of real and GAN-synthesized images. However, the existing detection methods put much emphasis on the artifact patterns, which can become futile if such artifact patterns were reduced.Towards reducing the artifacts in the synthesized images, in this paper, we devise a simple yet powerful approach termed FakePolisher that performs shallow reconstruction of fake images through a learned linear dictionary, intending to effectively and …
Compact Bilinear Augmented Query Structured Attention For Sport Highlights Classification, Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qing Liu, Xiaojun Hu
Compact Bilinear Augmented Query Structured Attention For Sport Highlights Classification, Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qing Liu, Xiaojun Hu
Research Collection School Of Computing and Information Systems
Understanding fine-grained activities, such as sport highlights, is a problem being overlooked and receives considerably less research attention. Potential reasons include absences of specific fine-grained action benchmark datasets, research preferences to general supercategorical activities classification, and challenges of large visual similarities between fine-grained actions. To tackle these, we collect and manually annotate two sport highlights datasets, i.e., Basketball8 & Soccer-10, for fine-grained action classification. Sample clips in the datasets are annotated with professional sub-categorical actions like “dunk”, “goalkeeping” and etc. We also propose a Compact Bilinear Augmented Query Structured Attention (CBA-QSA) module and stack it on top of general three-dimensional …
Cross-Domain Cross-Modal Food Transfer, Bin Zhu, Chong-Wah Ngo, Jingjing Chen
Cross-Domain Cross-Modal Food Transfer, Bin Zhu, Chong-Wah Ngo, Jingjing Chen
Research Collection School Of Computing and Information Systems
The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs …
Gesture Enhanced Comprehension Of Ambiguous Human-To-Robot Instructions, Weerakoon Mudiyanselage Dulanga Kaveesha Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Minh Anh Tuan Tran, Qianli Xu, U-Xuan Tan, Joo Hwee Lim, Archan Misra
Gesture Enhanced Comprehension Of Ambiguous Human-To-Robot Instructions, Weerakoon Mudiyanselage Dulanga Kaveesha Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Minh Anh Tuan Tran, Qianli Xu, U-Xuan Tan, Joo Hwee Lim, Archan Misra
Research Collection School Of Computing and Information Systems
This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant …
Activity River: Visualizing Planned And Logged Personal Activities For Reflection, Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, Sheelagh Carpendale
Activity River: Visualizing Planned And Logged Personal Activities For Reflection, Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, Sheelagh Carpendale
Research Collection School Of Computing and Information Systems
We present Activity River, a personal visualization tool which enables individuals to plan, log, and reflect on their self-defined activities. We are interested in supporting this type of reflective practice as prior work has shown that reflection can help people plan and manage their time effectively. Hence, we designed Activity River based on five design goals (visualize historical and contextual data, facilitate comparison of goals and achievements, engage viewers with delightful visuals, support authorship, and enable flexible planning and logging) which we distilled from the Information Visualization and Human-Computer Interaction literature. To explore our approach's strengths and limitations, we conducted …
Deeprhythm: Exposing Deepfakes With Attentional Visual Heartbeat Rhythms, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, Jianjun Zhao
Deeprhythm: Exposing Deepfakes With Attentional Visual Heartbeat Rhythms, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, Jianjun Zhao
Research Collection School Of Computing and Information Systems
As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors. Motivated by the fact that remote visual photoplethysmography (PPG) is made possible by monitoring the minuscule periodic changes of skin color due to blood pumping through the face, we conjecture that normal heartbeat rhythms found in the real face videos will be disrupted or even entirely broken in a DeepFake video, making it a potentially powerful indicator for DeepFake detection. In this work, we propose DeepRhythm, a DeepFake …
The Virtual Reality Questionnaire Toolkit, Martin Feick, Niko Kleer, Anthony Tang, Anthony Tang
The Virtual Reality Questionnaire Toolkit, Martin Feick, Niko Kleer, Anthony Tang, Anthony Tang
Research Collection School Of Computing and Information Systems
In this work, we present the VRQuestionnaireToolkit, which enables the research community to easily collect subjective measures within virtual reality (VR). We contribute a highly customizable and reusable open-source toolkit which can be integrated in existing VR projects rapidly. The toolkit comes with a pre-installed set of standard questionnaires such as NASA TLX, SSQ and SUS Presence questionnaire. Our system aims to lower the entry barrier to use questionnaires in VR and to significantly reduce development time and cost needed to run pre-, in between- and post-study questionnaires.
Zoomwalls: Dynamic Walls That Simulate Haptic Infrastructure For Room-Scale Vr World, Yan Yixian, Kazuki Takashima, Anthony Tang, Takayuki Tanno, Kazuyuki Fujita, Yoshifumi Kitamura
Zoomwalls: Dynamic Walls That Simulate Haptic Infrastructure For Room-Scale Vr World, Yan Yixian, Kazuki Takashima, Anthony Tang, Takayuki Tanno, Kazuyuki Fujita, Yoshifumi Kitamura
Research Collection School Of Computing and Information Systems
We focus on the problem of simulating the haptic infrastructure of a virtual environment (i.e. walls, doors). Our approach relies on multiple ZoomWalls---autonomous robotic encounter-type haptic wall-shaped props---that coordinate to provide haptic feedback for room-scale virtual reality. Based on a user's movement through the physical space, ZoomWall props are coordinated through a predict-and-dispatch architecture to provide just-in-time haptic feedback for objects the user is about to touch. To refine our system, we conducted simulation studies of different prediction algorithms, which helped us to refine our algorithmic approach to realize the physical ZoomWall prototype. Finally, we evaluated our system through a …
Interpretable Embedding For Ad-Hoc Video Search, Jiaxin Wu, Chong-Wah Ngo
Interpretable Embedding For Ad-Hoc Video Search, Jiaxin Wu, Chong-Wah Ngo
Research Collection School Of Computing and Information Systems
Answering query with semantic concepts has long been the mainstream approach for video search. Until recently, its performance is surpassed by concept-free approach, which embeds queries in a joint space as videos. Nevertheless, the embedded features as well as search results are not interpretable, hindering subsequent steps in video browsing and query reformulation. This paper integrates feature embedding and concept interpretation into a neural network for unified dual-task learning. In this way, an embedding is associated with a list of semantic concepts as an interpretation of video content. This paper empirically demonstrates that, by using either the embedding features or …
Person-Level Action Recognition In Complex Events Via Tsd-Tsm Networks, Yanbin Hao, Zi-Niu Liu, Hao Zhang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo
Person-Level Action Recognition In Complex Events Via Tsd-Tsm Networks, Yanbin Hao, Zi-Niu Liu, Hao Zhang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo
Research Collection School Of Computing and Information Systems
The task of person-level action recognition in complex events aims to densely detect pedestrians and individually predict their actions from surveillance videos. In this paper, we present a simple yet efficient pipeline for this task, referred to as TSD-TSM networks. Firstly, we adopt the TSD detector for the pedestrian localization on each single keyframe. Secondly, we generate the sequential ROIs for a person proposal by replicating the adjusted bounding box coordinates around the keyframe. Particularly, we propose to conduct straddling expansion and region squaring on the original bounding box of a person proposal to widen the potential space of motion …
Knowledge Enhanced Neural Fashion Trend Forecasting, Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua
Knowledge Enhanced Neural Fashion Trend Forecasting, Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
Fashion trend forecasting is a crucial task for both academia and industry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal the real fashion trends. Towards insightful fashion trend forecasting, this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Furthermore, to effectively model the time series data of fashion elements with rather complex patterns, we propose …
Multi-Modal Cooking Workflow Construction For Food Recipes, Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yugang Jiang, Tat-Seng Chua
Multi-Modal Cooking Workflow Construction For Food Recipes, Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yugang Jiang, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps …
Visualization Research Lab At Hkust, Yong Wang
Visualization Research Lab At Hkust, Yong Wang
Research Collection School Of Computing and Information Systems
HKUST VisLab (http://vis.cse.ust.hk/) is one of the leading research labs in the field of data visualization and human-computer interaction worldwide. The lab is dedicated to conducting cutting-edge research on data visualization and human-computer interaction to facilitate data exploration and analytics in various application domains, including E-learning, urban computing, social media and industry 4.0. Starting from its foundation by Prof. Huamin Qu in August 2004, the mission of HKUST VisLab is to build an excellent visualization research center and foster data visualization research and talent cultivation in Asia, as there were very few visualization researchers in Asia around 2004.
Weakly Paired Multi-Domain Image Translation, M.Y. Zhang, Zhiwu Huang, D.P. Paudel, J. Thoma, Gool L. Van
Weakly Paired Multi-Domain Image Translation, M.Y. Zhang, Zhiwu Huang, D.P. Paudel, J. Thoma, Gool L. Van
Research Collection School Of Computing and Information Systems
In this paper, we aim at studying the new problem of weakly paired multi-domain image translation. To this end, we collect a dataset that contains weakly paired images from multiple domains. Two images are considered to be weakly paired if they are captured from nearby locations and share an overlapping field of view. These images are possibly captured by two asynchronous cameras—often resulting in images from separate domains, e.g. summer and winter. Major motivations for using weakly paired images are: (i) performance improvement towards that of paired data; (ii) cheap labels and abundant data availability. For the first time in …
Tenet: Triple Excitation Network For Video Salient Object Detection, Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He
Tenet: Triple Excitation Network For Video Salient Object Detection, Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He
Research Collection School Of Computing and Information Systems
In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations. These excitation mechanisms are designed following the spirit of curriculum learning and aim to reduce learning ambiguities at the beginning of training by selectively exciting feature activations using ground truth. Then we gradually reduce the weight of ground truth excitations by a curriculum rate and replace it by a curriculum complementary map for better and faster convergence. In particular, the spatial excitation strengthens feature activations for clear object …
Improving Event Detection Via Open-Domain Event Trigger Knowledge, Meihan Tong, Bin Xu, Shuai Wang, Yixin Cao, Lei Hou, Juanzi Li, Jun Xie
Improving Event Detection Via Open-Domain Event Trigger Knowledge, Meihan Tong, Bin Xu, Shuai Wang, Yixin Cao, Lei Hou, Juanzi Li, Jun Xie
Research Collection School Of Computing and Information Systems
Event Detection (ED) is a fundamental task in automatically structuring texts. Due to the small scale of training data, previous methods perform poorly on unseen/sparsely labeled trigger words and are prone to overfitting densely labeled trigger words. To address the issue, we propose a novel Enrichment Knowledge Distillation (EKD) model to leverage external open-domain trigger knowledge to reduce the in-built biases to frequent trigger words in annotations. Experiments on benchmark ACE2005 show that our model outperforms nine strong baselines, is especially effective for unseen/sparsely labeled trigger words. The source code is released on https://github.com/shuaiwa16/ekd.git.
Deep Learning Of Facial Embeddings And Facial Landmark Points For The Detection Of Academic Emotions, Hua Leong Fwa
Deep Learning Of Facial Embeddings And Facial Landmark Points For The Detection Of Academic Emotions, Hua Leong Fwa
Research Collection School Of Computing and Information Systems
Automatic emotion recognition is an actively researched area as emotion plays a pivotal role in effective human communications. Equipping a computer to understand and respond to human emotions has potential applications in many fields including education, medicine, transport and hospitality. In a classroom or online learning context, the basic emotions do not occur frequently and do not influence the learning process itself. The academic emotions such as engagement, frustration, confusion and boredom are the ones which are pivotal to sustaining the motivation of learners. In this study, we evaluated the use of deep learning on FaceNet embeddings and facial landmark …
Tree-Augmented Cross-Modal Encoding For Complex-Query Video Retrieval, Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua
Tree-Augmented Cross-Modal Encoding For Complex-Query Video Retrieval, Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate …
Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen, Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Lu, Tat-Seng Chua
Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen, Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Lu, Tat-Seng Chua
Research Collection School Of Computing and Information Systems
The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style …
Ntire 2020 Challenge On Video Quality Mapping: Methods And Results, D. Fuoli, Zhiwu Huang, M. Danelljan, R. Timofte, H. Wang, L. Jin, D. Su, J. Liu, J. Lee, M. Kudelski, L. Bala, D. Hryboy, M. Mozejko, M. Li, S. Li, B. Pang, C. Lu, Li C., He D., Li F.
Ntire 2020 Challenge On Video Quality Mapping: Methods And Results, D. Fuoli, Zhiwu Huang, M. Danelljan, R. Timofte, H. Wang, L. Jin, D. Su, J. Liu, J. Lee, M. Kudelski, L. Bala, D. Hryboy, M. Mozejko, M. Li, S. Li, B. Pang, C. Lu, Li C., He D., Li F.
Research Collection School Of Computing and Information Systems
This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from more compressed videos to less compressed videos in a supervised training manner. In track 2, algorithms are required to learn the quality mapping from one device to another when their quality varies substantially and weaklyaligned video pairs …
Don't Hit Me! Glass Detection In Real-World Scenes, Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, Rynson W.H. Lau
Don't Hit Me! Glass Detection In Real-World Scenes, Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, Rynson W.H. Lau
Research Collection School Of Computing and Information Systems
Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass, and the content within the glass region is typically similar to those behind it. In this paper, we propose an important problem of detecting glass from a single RGB image. To address this problem, we construct a large-scale glass detection dataset (GDD) and design a glass detection network, called GDNet, …
Visual Commonsense R-Cnn, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
Visual Commonsense R-Cnn, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
Research Collection School Of Computing and Information Systems
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also …
Mnemonics Training: Multi-Class Incremental Learning Without Forgetting, Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, Qianru Sun
Mnemonics Training: Multi-Class Incremental Learning Without Forgetting, Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, Qianru Sun
Research Collection School Of Computing and Information Systems
Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and …
Visual Commonsense Representation Learning Via Causal Inference, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
Visual Commonsense Representation Learning Via Causal Inference, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun
Research Collection School Of Computing and Information Systems
We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the con-textual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). We extensively apply …