Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 50

Full-Text Articles in Physical Sciences and Mathematics

Sharper Generalisation Bounds For Pairwise Learning, Yunwen Lei, Antoine Ledent, Marius Kloft Dec 2020

Sharper Generalisation Bounds For Pairwise Learning, Yunwen Lei, Antoine Ledent, Marius Kloft

Research Collection School Of Computing and Information Systems

Pairwise learning refers to learning tasks with loss functions depending on a pair of training examples, which includes ranking and metric learning as specific examples. Recently, there has been an increasing amount of attention on the generalization analysis of pairwise learning to understand its practical behavior. However, the existing stability analysis provides suboptimal high-probability generalization bounds. In this paper, we provide a refined stability analysis by developing generalization bounds which can be √nn-times faster than the existing results, where nn is the sample size. This implies excess risk bounds of the order O(n−1/2) (up to a logarithmic factor) for both …


A Study Of Multi-Task And Region-Wise Deep Learning For Food Ingredient Recognition, Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang Dec 2020

A Study Of Multi-Task And Region-Wise Deep Learning For Food Ingredient Recognition, Jingjing Chen, Bin Zhu, Chong-Wah Ngo, Tat-Seng Chua, Yu-Gang Jiang

Research Collection School Of Computing and Information Systems

Food recognition has captured numerous research attention for its importance for health-related applications. The existing approaches mostly focus on the categorization of food according to dish names, while ignoring the underlying ingredient composition. In reality, two dishes with the same name do not necessarily share the exact list of ingredients. Therefore, the dishes under the same food category are not mandatorily equal in nutrition content. Nevertheless, due to limited datasets available with ingredient labels, the problem of ingredient recognition is often overlooked. Furthermore, as the number of ingredients is expected to be much less than the number of food categories, …


Exploring And Evaluating Attributes, Values, And Structures For Entity Alignment, Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua Nov 2020

Exploring And Evaluating Attributes, Values, And Structures For Entity Alignment, Zhiyuan Liu, Yixin Cao, Liangming Pan, Juanzi Li, Zhiyuan Liu, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Entity alignment (EA) aims at building a unified Knowledge Graph (KG) of rich content by linking the equivalent entities from various KGs. GNN-based EA methods present promising performance by modeling the KG structure defined by relation triples. However, attribute triples can also provide crucial alignment signal but have not been well explored yet. In this paper, we propose to utilize an attributed value encoder and partition the KG into subgraphs to model the various types of attribute triples efficiently. Besides, the performances of current EA methods are overestimated because of the name-bias of existing EA datasets. To make an objective …


Global Context Aware Convolutions For 3d Point Cloud Understanding, Zhiyuan Zhang, Binh-Son Hua, Wei Chen, Yibin Tian, Sai-Kit Yeung Nov 2020

Global Context Aware Convolutions For 3d Point Cloud Understanding, Zhiyuan Zhang, Binh-Son Hua, Wei Chen, Yibin Tian, Sai-Kit Yeung

Research Collection School Of Computing and Information Systems

Recent advances in deep learning for 3D point clouds have shown great promises in scene understanding tasks thanks to the introduction of convolution operators to consume 3D point clouds directly in a neural network. Point cloud data, however, could have arbitrary rotations, especially those acquired from 3D scanning. Recent works show that it is possible to design point cloud convolutions with rotation invariance property, but such methods generally do not perform as well as translation-invariant only convolution. We found that a key reason is that compared to point coordinates, rotation-invariant features consumed by point cloud convolution are not as distinctive. …


Cost-Sensitive Deep Forest For Price Prediction, Chao Ma, Zhenbing Liu, Zhiguang Cao, Wen Song, Jie Zhang, Weiliang Zeng Nov 2020

Cost-Sensitive Deep Forest For Price Prediction, Chao Ma, Zhenbing Liu, Zhiguang Cao, Wen Song, Jie Zhang, Weiliang Zeng

Research Collection School Of Computing and Information Systems

For many real-world applications, predicting a price range is more practical and desirable than predicting a concrete value. In this case, price prediction can be regarded as a classification problem. Although deep forest is recognized as the best solution to many classification problems, a crucial issue limits its direct application to price prediction, i.e., it treated all the misclassifications equally no matter how far away they are from the real classes, since their impacts on the accuracy are the same. This is unreasonable to price prediction as the misclassification should be as close to the real price range as possible …


Tangi: Tangible Proxies For Embodied Object Exploration And Manipulation In Virtual Reality, Martin Feick, Scott Bateman, Anthony Tang, Anthony Tang Nov 2020

Tangi: Tangible Proxies For Embodied Object Exploration And Manipulation In Virtual Reality, Martin Feick, Scott Bateman, Anthony Tang, Anthony Tang

Research Collection School Of Computing and Information Systems

Exploring and manipulating complex virtual objects is challenging due to limitations of conventional controllers and free-hand interaction techniques. We present the TanGi toolkit which enables novices to rapidly build physical proxy objects using Composable Shape Primitives. TanGi also provides Manipulators allowing users to build objects including movable parts, making them suitable for rich object exploration and manipulation in VR. With a set of different use cases and applications we show the capabilities of the TanGi toolkit and evaluate its use. In a study with 16 participants, we demonstrate that novices can quickly build physical proxy objects using the Composable Shape …


Fakepolisher: Making Deepfakes More Detection-Evasive By Shallow Reconstruction, Yihao Huang, Felix Juefei-Xu, Run Wang, Qing Guo, Lei Ma, Xiaofei Xie, Jianwen Li, Weikai Miao, Yang Liu, Geguang Pu Oct 2020

Fakepolisher: Making Deepfakes More Detection-Evasive By Shallow Reconstruction, Yihao Huang, Felix Juefei-Xu, Run Wang, Qing Guo, Lei Ma, Xiaofei Xie, Jianwen Li, Weikai Miao, Yang Liu, Geguang Pu

Research Collection School Of Computing and Information Systems

At this moment, GAN-based image generation methods are still imperfect, whose upsampling design has limitations in leaving some certain artifact patterns in the synthesized image. Such artifact patterns can be easily exploited (by recent methods) for difference detection of real and GAN-synthesized images. However, the existing detection methods put much emphasis on the artifact patterns, which can become futile if such artifact patterns were reduced.Towards reducing the artifacts in the synthesized images, in this paper, we devise a simple yet powerful approach termed FakePolisher that performs shallow reconstruction of fake images through a learned linear dictionary, intending to effectively and …


Compact Bilinear Augmented Query Structured Attention For Sport Highlights Classification, Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qing Liu, Xiaojun Hu Oct 2020

Compact Bilinear Augmented Query Structured Attention For Sport Highlights Classification, Yanbin Hao, Hao Zhang, Chong-Wah Ngo, Qing Liu, Xiaojun Hu

Research Collection School Of Computing and Information Systems

Understanding fine-grained activities, such as sport highlights, is a problem being overlooked and receives considerably less research attention. Potential reasons include absences of specific fine-grained action benchmark datasets, research preferences to general supercategorical activities classification, and challenges of large visual similarities between fine-grained actions. To tackle these, we collect and manually annotate two sport highlights datasets, i.e., Basketball8 & Soccer-10, for fine-grained action classification. Sample clips in the datasets are annotated with professional sub-categorical actions like “dunk”, “goalkeeping” and etc. We also propose a Compact Bilinear Augmented Query Structured Attention (CBA-QSA) module and stack it on top of general three-dimensional …


Cross-Domain Cross-Modal Food Transfer, Bin Zhu, Chong-Wah Ngo, Jingjing Chen Oct 2020

Cross-Domain Cross-Modal Food Transfer, Bin Zhu, Chong-Wah Ngo, Jingjing Chen

Research Collection School Of Computing and Information Systems

The recent works in cross-modal image-to-recipe retrieval pave a new way to scale up food recognition. By learning the joint space between food images and recipes, food recognition is boiled down as a retrieval problem by evaluating the similarity of embedded features. The major drawback, nevertheless, is the difficulty in applying an already-trained model to recognize different cuisines of dishes unknown to the model. In general, model updating with new training examples, in the form of image-recipe pairs, is required to adapt a model to new cooking styles in a cuisine. Nevertheless, in practice, acquiring sufficient number of image-recipe pairs …


Gesture Enhanced Comprehension Of Ambiguous Human-To-Robot Instructions, Weerakoon Mudiyanselage Dulanga Kaveesha Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Minh Anh Tuan Tran, Qianli Xu, U-Xuan Tan, Joo Hwee Lim, Archan Misra Oct 2020

Gesture Enhanced Comprehension Of Ambiguous Human-To-Robot Instructions, Weerakoon Mudiyanselage Dulanga Kaveesha Weerakoon, Vigneshwaran Subbaraju, Nipuni Karumpulli, Minh Anh Tuan Tran, Qianli Xu, U-Xuan Tan, Joo Hwee Lim, Archan Misra

Research Collection School Of Computing and Information Systems

This work demonstrates the feasibility and benefits of using pointing gestures, a naturally-generated additional input modality, to improve the multi-modal comprehension accuracy of human instructions to robotic agents for collaborative tasks.We present M2Gestic, a system that combines neural-based text parsing with a novel knowledge-graph traversal mechanism, over a multi-modal input of vision, natural language text and pointing. Via multiple studies related to a benchmark table top manipulation task, we show that (a) M2Gestic can achieve close-to-human performance in reasoning over unambiguous verbal instructions, and (b) incorporating pointing input (even with its inherent location uncertainty) in M2Gestic results in a significant …


Activity River: Visualizing Planned And Logged Personal Activities For Reflection, Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, Sheelagh Carpendale Oct 2020

Activity River: Visualizing Planned And Logged Personal Activities For Reflection, Bon Adriel Aseniero, Charles Perin, Wesley Willett, Anthony Tang, Sheelagh Carpendale

Research Collection School Of Computing and Information Systems

We present Activity River, a personal visualization tool which enables individuals to plan, log, and reflect on their self-defined activities. We are interested in supporting this type of reflective practice as prior work has shown that reflection can help people plan and manage their time effectively. Hence, we designed Activity River based on five design goals (visualize historical and contextual data, facilitate comparison of goals and achievements, engage viewers with delightful visuals, support authorship, and enable flexible planning and logging) which we distilled from the Information Visualization and Human-Computer Interaction literature. To explore our approach's strengths and limitations, we conducted …


Deeprhythm: Exposing Deepfakes With Attentional Visual Heartbeat Rhythms, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, Jianjun Zhao Oct 2020

Deeprhythm: Exposing Deepfakes With Attentional Visual Heartbeat Rhythms, Hua Qi, Qing Guo, Felix Juefei-Xu, Xiaofei Xie, Lei Ma, Wei Feng, Yang Liu, Jianjun Zhao

Research Collection School Of Computing and Information Systems

As the GAN-based face image and video generation techniques, widely known as DeepFakes, have become more and more matured and realistic, there comes a pressing and urgent demand for effective DeepFakes detectors. Motivated by the fact that remote visual photoplethysmography (PPG) is made possible by monitoring the minuscule periodic changes of skin color due to blood pumping through the face, we conjecture that normal heartbeat rhythms found in the real face videos will be disrupted or even entirely broken in a DeepFake video, making it a potentially powerful indicator for DeepFake detection. In this work, we propose DeepRhythm, a DeepFake …


The Virtual Reality Questionnaire Toolkit, Martin Feick, Niko Kleer, Anthony Tang, Anthony Tang Oct 2020

The Virtual Reality Questionnaire Toolkit, Martin Feick, Niko Kleer, Anthony Tang, Anthony Tang

Research Collection School Of Computing and Information Systems

In this work, we present the VRQuestionnaireToolkit, which enables the research community to easily collect subjective measures within virtual reality (VR). We contribute a highly customizable and reusable open-source toolkit which can be integrated in existing VR projects rapidly. The toolkit comes with a pre-installed set of standard questionnaires such as NASA TLX, SSQ and SUS Presence questionnaire. Our system aims to lower the entry barrier to use questionnaires in VR and to significantly reduce development time and cost needed to run pre-, in between- and post-study questionnaires.


Zoomwalls: Dynamic Walls That Simulate Haptic Infrastructure For Room-Scale Vr World, Yan Yixian, Kazuki Takashima, Anthony Tang, Takayuki Tanno, Kazuyuki Fujita, Yoshifumi Kitamura Oct 2020

Zoomwalls: Dynamic Walls That Simulate Haptic Infrastructure For Room-Scale Vr World, Yan Yixian, Kazuki Takashima, Anthony Tang, Takayuki Tanno, Kazuyuki Fujita, Yoshifumi Kitamura

Research Collection School Of Computing and Information Systems

We focus on the problem of simulating the haptic infrastructure of a virtual environment (i.e. walls, doors). Our approach relies on multiple ZoomWalls---autonomous robotic encounter-type haptic wall-shaped props---that coordinate to provide haptic feedback for room-scale virtual reality. Based on a user's movement through the physical space, ZoomWall props are coordinated through a predict-and-dispatch architecture to provide just-in-time haptic feedback for objects the user is about to touch. To refine our system, we conducted simulation studies of different prediction algorithms, which helped us to refine our algorithmic approach to realize the physical ZoomWall prototype. Finally, we evaluated our system through a …


Interpretable Embedding For Ad-Hoc Video Search, Jiaxin Wu, Chong-Wah Ngo Oct 2020

Interpretable Embedding For Ad-Hoc Video Search, Jiaxin Wu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Answering query with semantic concepts has long been the mainstream approach for video search. Until recently, its performance is surpassed by concept-free approach, which embeds queries in a joint space as videos. Nevertheless, the embedded features as well as search results are not interpretable, hindering subsequent steps in video browsing and query reformulation. This paper integrates feature embedding and concept interpretation into a neural network for unified dual-task learning. In this way, an embedding is associated with a list of semantic concepts as an interpretation of video content. This paper empirically demonstrates that, by using either the embedding features or …


Person-Level Action Recognition In Complex Events Via Tsd-Tsm Networks, Yanbin Hao, Zi-Niu Liu, Hao Zhang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo Oct 2020

Person-Level Action Recognition In Complex Events Via Tsd-Tsm Networks, Yanbin Hao, Zi-Niu Liu, Hao Zhang, Bin Zhu, Jingjing Chen, Yu-Gang Jiang, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

The task of person-level action recognition in complex events aims to densely detect pedestrians and individually predict their actions from surveillance videos. In this paper, we present a simple yet efficient pipeline for this task, referred to as TSD-TSM networks. Firstly, we adopt the TSD detector for the pedestrian localization on each single keyframe. Secondly, we generate the sequential ROIs for a person proposal by replicating the adjusted bounding box coordinates around the keyframe. Particularly, we propose to conduct straddling expansion and region squaring on the original bounding box of a person proposal to widen the potential space of motion …


Knowledge Enhanced Neural Fashion Trend Forecasting, Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua Oct 2020

Knowledge Enhanced Neural Fashion Trend Forecasting, Yunshan Ma, Yujuan Ding, Xun Yang, Lizi Liao, Wai Keung Wong, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Fashion trend forecasting is a crucial task for both academia and industry. Although some efforts have been devoted to tackling this challenging task, they only studied limited fashion elements with highly seasonal or simple patterns, which could hardly reveal the real fashion trends. Towards insightful fashion trend forecasting, this work focuses on investigating fine-grained fashion element trends for specific user groups. We first contribute a large-scale fashion trend dataset (FIT) collected from Instagram with extracted time series fashion element records and user information. Furthermore, to effectively model the time series data of fashion elements with rather complex patterns, we propose …


Multi-Modal Cooking Workflow Construction For Food Recipes, Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yugang Jiang, Tat-Seng Chua Oct 2020

Multi-Modal Cooking Workflow Construction For Food Recipes, Liangming Pan, Jingjing Chen, Jianlong Wu, Shaoteng Liu, Chong-Wah Ngo, Min-Yen Kan, Yugang Jiang, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Understanding food recipe requires anticipating the implicit causal effects of cooking actions, such that the recipe can be converted into a graph describing the temporal workflow of the recipe. This is a non-trivial task that involves common-sense reasoning. However, existing efforts rely on hand-crafted features to extract the workflow graph from recipes due to the lack of large-scale labeled datasets. Moreover, they fail to utilize the cooking images, which constitute an important part of food recipes. In this paper, we build MM-ReS, the first large-scale dataset for cooking workflow construction, consisting of 9,850 recipes with human-labeled workflow graphs. Cooking steps …


Visualization Research Lab At Hkust, Yong Wang Sep 2020

Visualization Research Lab At Hkust, Yong Wang

Research Collection School Of Computing and Information Systems

HKUST VisLab (http://vis.cse.ust.hk/) is one of the leading research labs in the field of data visualization and human-computer interaction worldwide. The lab is dedicated to conducting cutting-edge research on data visualization and human-computer interaction to facilitate data exploration and analytics in various application domains, including E-learning, urban computing, social media and industry 4.0. Starting from its foundation by Prof. Huamin Qu in August 2004, the mission of HKUST VisLab is to build an excellent visualization research center and foster data visualization research and talent cultivation in Asia, as there were very few visualization researchers in Asia around 2004.


Weakly Paired Multi-Domain Image Translation, M.Y. Zhang, Zhiwu Huang, D.P. Paudel, J. Thoma, Gool L. Van Sep 2020

Weakly Paired Multi-Domain Image Translation, M.Y. Zhang, Zhiwu Huang, D.P. Paudel, J. Thoma, Gool L. Van

Research Collection School Of Computing and Information Systems

In this paper, we aim at studying the new problem of weakly paired multi-domain image translation. To this end, we collect a dataset that contains weakly paired images from multiple domains. Two images are considered to be weakly paired if they are captured from nearby locations and share an overlapping field of view. These images are possibly captured by two asynchronous cameras—often resulting in images from separate domains, e.g. summer and winter. Major motivations for using weakly paired images are: (i) performance improvement towards that of paired data; (ii) cheap labels and abundant data availability. For the first time in …


Tenet: Triple Excitation Network For Video Salient Object Detection, Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He Aug 2020

Tenet: Triple Excitation Network For Video Salient Object Detection, Sucheng Ren, Chu Han, Xin Yang, Guoqiang Han, Shengfeng He

Research Collection School Of Computing and Information Systems

In this paper, we propose a simple yet effective approach, named Triple Excitation Network, to reinforce the training of video salient object detection (VSOD) from three aspects, spatial, temporal, and online excitations. These excitation mechanisms are designed following the spirit of curriculum learning and aim to reduce learning ambiguities at the beginning of training by selectively exciting feature activations using ground truth. Then we gradually reduce the weight of ground truth excitations by a curriculum rate and replace it by a curriculum complementary map for better and faster convergence. In particular, the spatial excitation strengthens feature activations for clear object …


Improving Event Detection Via Open-Domain Event Trigger Knowledge, Meihan Tong, Bin Xu, Shuai Wang, Yixin Cao, Lei Hou, Juanzi Li, Jun Xie Jul 2020

Improving Event Detection Via Open-Domain Event Trigger Knowledge, Meihan Tong, Bin Xu, Shuai Wang, Yixin Cao, Lei Hou, Juanzi Li, Jun Xie

Research Collection School Of Computing and Information Systems

Event Detection (ED) is a fundamental task in automatically structuring texts. Due to the small scale of training data, previous methods perform poorly on unseen/sparsely labeled trigger words and are prone to overfitting densely labeled trigger words. To address the issue, we propose a novel Enrichment Knowledge Distillation (EKD) model to leverage external open-domain trigger knowledge to reduce the in-built biases to frequent trigger words in annotations. Experiments on benchmark ACE2005 show that our model outperforms nine strong baselines, is especially effective for unseen/sparsely labeled trigger words. The source code is released on https://github.com/shuaiwa16/ekd.git.


Deep Learning Of Facial Embeddings And Facial Landmark Points For The Detection Of Academic Emotions, Hua Leong Fwa Jul 2020

Deep Learning Of Facial Embeddings And Facial Landmark Points For The Detection Of Academic Emotions, Hua Leong Fwa

Research Collection School Of Computing and Information Systems

Automatic emotion recognition is an actively researched area as emotion plays a pivotal role in effective human communications. Equipping a computer to understand and respond to human emotions has potential applications in many fields including education, medicine, transport and hospitality. In a classroom or online learning context, the basic emotions do not occur frequently and do not influence the learning process itself. The academic emotions such as engagement, frustration, confusion and boredom are the ones which are pivotal to sustaining the motivation of learners. In this study, we evaluated the use of deep learning on FaceNet embeddings and facial landmark …


Tree-Augmented Cross-Modal Encoding For Complex-Query Video Retrieval, Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua Jul 2020

Tree-Augmented Cross-Modal Encoding For Complex-Query Video Retrieval, Xun Yang, Jianfeng Dong, Yixin Cao, Xun Wang, Meng Wang, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

The rapid growth of user-generated videos on the Internet has intensified the need for text-based video retrieval systems. Traditional methods mainly favor the concept-based paradigm on retrieval with simple queries, which are usually ineffective for complex queries that carry far more complex semantics. Recently, embedding-based paradigm has emerged as a popular approach. It aims to map the queries and videos into a shared embedding space where semantically-similar texts and videos are much closer to each other. Despite its simplicity, it forgoes the exploitation of the syntactic structure of text queries, making it suboptimal to model the complex queries. To facilitate …


Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen, Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Lu, Tat-Seng Chua Jul 2020

Expertise Style Transfer: A New Task Towards Better Communication Between Experts And Laymen, Yixin Cao, Ruihao Shui, Liangming Pan, Min-Yen Kan, Zhiyuan Lu, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

The curse of knowledge can impede communication between experts and laymen. We propose a new task of expertise style transfer and contribute a manually annotated dataset with the goal of alleviating such cognitive biases. Solving this task not only simplifies the professional language, but also improves the accuracy and expertise level of laymen descriptions using simple words. This is a challenging task, unaddressed in previous work, as it requires the models to have expert intelligence in order to modify text with a deep understanding of domain knowledge and structures. We establish the benchmark performance of five state-of-the-art models for style …


Ntire 2020 Challenge On Video Quality Mapping: Methods And Results, D. Fuoli, Zhiwu Huang, M. Danelljan, R. Timofte, H. Wang, L. Jin, D. Su, J. Liu, J. Lee, M. Kudelski, L. Bala, D. Hryboy, M. Mozejko, M. Li, S. Li, B. Pang, C. Lu, Li C., He D., Li F. Jun 2020

Ntire 2020 Challenge On Video Quality Mapping: Methods And Results, D. Fuoli, Zhiwu Huang, M. Danelljan, R. Timofte, H. Wang, L. Jin, D. Su, J. Liu, J. Lee, M. Kudelski, L. Bala, D. Hryboy, M. Mozejko, M. Li, S. Li, B. Pang, C. Lu, Li C., He D., Li F.

Research Collection School Of Computing and Information Systems

This paper reviews the NTIRE 2020 challenge on video quality mapping (VQM), which addresses the issues of quality mapping from source video domain to target video domain. The challenge includes both a supervised track (track 1) and a weakly-supervised track (track 2) for two benchmark datasets. In particular, track 1 offers a new Internet video benchmark, requiring algorithms to learn the map from more compressed videos to less compressed videos in a supervised training manner. In track 2, algorithms are required to learn the quality mapping from one device to another when their quality varies substantially and weaklyaligned video pairs …


Don't Hit Me! Glass Detection In Real-World Scenes, Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, Rynson W.H. Lau Jun 2020

Don't Hit Me! Glass Detection In Real-World Scenes, Haiyang Mei, Xin Yang, Yang Wang, Yuanyuan Liu, Shengfeng He, Qiang Zhang, Xiaopeng Wei, Rynson W.H. Lau

Research Collection School Of Computing and Information Systems

Glass is very common in our daily life. Existing computer vision systems neglect it and thus may have severe consequences, e.g., a robot may crash into a glass wall. However, sensing the presence of glass is not straightforward. The key challenge is that arbitrary objects/scenes can appear behind the glass, and the content within the glass region is typically similar to those behind it. In this paper, we propose an important problem of detecting glass from a single RGB image. To address this problem, we construct a large-scale glass detection dataset (GDD) and design a glass detection network, called GDNet, …


Visual Commonsense R-Cnn, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun Jun 2020

Visual Commonsense R-Cnn, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

Research Collection School Of Computing and Information Systems

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the contextual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). This is also …


Mnemonics Training: Multi-Class Incremental Learning Without Forgetting, Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, Qianru Sun Jun 2020

Mnemonics Training: Multi-Class Incremental Learning Without Forgetting, Yaoyao Liu, Yuting Su, An-An Liu, Bernt Schiele, Qianru Sun

Research Collection School Of Computing and Information Systems

Multi-Class Incremental Learning (MCIL) aims to learn new concepts by incrementally updating a model trained on previous concepts. However, there is an inherent trade-off to effectively learning new concepts without catastrophic forgetting of previous ones. To alleviate this issue, it has been proposed to keep around a few examples of the previous concepts but the effectiveness of this approach heavily depends on the representativeness of these examples. This paper proposes a novel and automatic framework we call mnemonics, where we parameterize exemplars and make them optimizable in an end-to-end manner. We train the framework through bilevel optimizations, i.e., model-level and …


Visual Commonsense Representation Learning Via Causal Inference, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun Jun 2020

Visual Commonsense Representation Learning Via Causal Inference, Tan Wang, Jianqiang Huang, Hanwang Zhang, Qianru Sun

Research Collection School Of Computing and Information Systems

We present a novel unsupervised feature representation learning method, Visual Commonsense Region-based Convolutional Neural Network (VC R-CNN), to serve as an improved visual region encoder for high-level tasks such as captioning and VQA. Given a set of detected object regions in an image (e.g., using Faster R-CNN), like any other unsupervised feature learning methods (e.g., word2vec), the proxy training objective of VC R-CNN is to predict the con-textual objects of a region. However, they are fundamentally different: the prediction of VC R-CNN is by using causal intervention: P(Y|do(X)), while others are by using the conventional likelihood: P(Y|X). We extensively apply …