Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 45

Full-Text Articles in Graphics and Human Computer Interfaces

Triadic Temporal-Semantic Alignment For Weakly-Supervised Video Moment Retrieval, Jin Liu, Jialong Xie, Fengyu Zhou, Shengfeng He Dec 2024

Triadic Temporal-Semantic Alignment For Weakly-Supervised Video Moment Retrieval, Jin Liu, Jialong Xie, Fengyu Zhou, Shengfeng He

Research Collection School Of Computing and Information Systems

Video Moment Retrieval (VMR) aims to identify specific event moments within untrimmed videos based on natural language queries. Existing VMR methods have been criticized for relying heavily on moment annotation bias rather than true multi-modal alignment reasoning. Weakly supervised VMR approaches inherently overcome this issue by training without precise temporal location information. However, they struggle with fine-grained semantic alignment and often yield multiple speculative predictions with prolonged video spans. In this paper, we take a step forward in the context of weakly supervised VMR by proposing a triadic temporalsemantic alignment model. Our proposed approach augments weak supervision by comprehensively addressing …


Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu Sep 2024

Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu

Research Collection School Of Computing and Information Systems

This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi-grained features from intricate 3D scenes, largely due to a focus on global scene processing and single-scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi-granularity analysis by predicting relation triplets from specific sub-scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape-adaptive local point cloud sampling, thereby …


Nonfactoid Question Answering As Query-Focused Summarization With Graph-Enhanced Multihop Inference, Yang Deng, Wenxuan Zhang, Weiwen Xu, Ying Shen, Wai Lam Aug 2024

Nonfactoid Question Answering As Query-Focused Summarization With Graph-Enhanced Multihop Inference, Yang Deng, Wenxuan Zhang, Weiwen Xu, Ying Shen, Wai Lam

Research Collection School Of Computing and Information Systems

Nonfactoid question answering (QA) is one of the most extensive yet challenging applications and research areas in natural language processing (NLP). Existing methods fall short of handling the long-distance and complex semantic relations between the question and the document sentences. In this work, we propose a novel query-focused summarization method, namely a graph-enhanced multihop query-focused summarizer (GMQS), to tackle the nonfactoid QA problem. Specifically, we leverage graph-enhanced reasoning techniques to elaborate the multihop inference process in nonfactoid QA. Three types of graphs with different semantic relations, namely semantic relevance, topical coherence, and coreference linking, are constructed for explicitly capturing the …


G2face: High-Fidelity Reversible Face Anonymization Via Generative And Geometric Priors, Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He Aug 2024

G2face: High-Fidelity Reversible Face Anonymization Via Generative And Geometric Priors, Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He

Research Collection School Of Computing and Information Systems

Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent manipulation in pre-trained GANs can lead to changes in ID-irrelevant attributes, adversely affecting data utility due to GAN inversion inaccuracies. This paper introduces G 2 Face, which leverages both generative and geometric priors to enhance identity manipulation, achieving high-quality reversible face anonymization without compromising data utility. We utilize a 3D face …


Certified Robust Accuracy Of Neural Networks Are Bounded Due To Bayes Errors, Ruihan Zhang, Jun Sun Jul 2024

Certified Robust Accuracy Of Neural Networks Are Bounded Due To Bayes Errors, Ruihan Zhang, Jun Sun

Research Collection School Of Computing and Information Systems

Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accuracy. In this work, we offer a novel perspective based on Bayes errors. By adopting Bayes error to robustness analysis, we investigate the limit of certified robust accuracy, taking into account data distribution uncertainties. We first show that the accuracy inevitably decreases in the pursuit of …


Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra Jul 2024

Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra

Research Collection School Of Computing and Information Systems

We present JIGSAW, a novel system that performs edge-based streaming perception over multiple video streams, while additionally factoring in the redundancy offered by the spatial overlap often exhibited in urban, multi-camera deployments. To assure high streaming throughput, JIGSAW extracts and spatially multiplexes multiple regions-of-interest from different camera frames into a smaller canvas frame. Moreover, to ensure that perception stays abreast of evolving object kinematics, JIGSAW includes a utility-based weighted scheduler to preferentially prioritize and even skip object-specific tiles extracted from an incoming stream of camera frames. Using the CityflowV2 traffic surveillance dataset, we show that JIGSAW can simultaneously process 25 …


How People Prompt Generative Ai To Create Interactive Vr Scenes, Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang Jul 2024

How People Prompt Generative Ai To Create Interactive Vr Scenes, Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang

Research Collection School Of Computing and Information Systems

Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear---particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here,'' while pointing at a location. If such linguistic and embodied features are common to people's prompts, we need to tune models to accommodate them. In this work, we present a Wizard of Oz elicitation study with 22 participants, where we studied people's implicit expectations when verbally prompting such programming …


Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He Jul 2024

Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Restoring old photographs can preserve cherished memories. Previous methods handled diverse damages within the same network structure, which proved impractical. In addition, these methods cannot exploit correlations among artifacts, especially in scratches versus patch-misses issues. Hence, a tailored network is particularly crucial. In light of this, we propose a unified framework consisting of two key components: ScratchNet and PatchNet. In detail, ScratchNet employs the parallel Multi-scale Partial Convolution Module to effectively repair scratches, learning from multi-scale local receptive fields. In contrast, the patch-misses necessitate the network to emphasize global information. To this end, we incorporate a transformer-based encoder and decoder …


Let’S Think Outside The Box: Exploring Leap-Of-Thought In Large Language Models With Multimodal Humor Generation, Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou Jun 2024

Let’S Think Outside The Box: Exploring Leap-Of-Thought In Large Language Models With Multimodal Humor Generation, Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou

Research Collection School Of Computing and Information Systems

Chain-of-Thought (CoT) [2, 3] guides large language models (LLMs) to reason step-by-step, and can motivate their logical reasoning ability. While effective for logical tasks, CoT is not conducive to creative problem-solving which often requires out-of-box thoughts and is crucial for innovation advancements. In this paper, we explore the Leap-of-Thought (LoT) abilities within LLMs — a nonsequential, creative paradigm involving strong associations and knowledge leaps. To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus …


Few-Shot Learner Parameterization By Diffusion Time-Steps, Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Sun Qianru Jun 2024

Few-Shot Learner Parameterization By Diffusion Time-Steps, Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Sun Qianru

Research Collection School Of Computing and Information Systems

Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) …


Diffusion Time-Step Curriculum For One Image To 3d Generation, Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo Hwee Lim, Hanwang Zhang Jun 2024

Diffusion Time-Step Curriculum For One Image To 3d Generation, Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo Hwee Lim, Hanwang Zhang

Research Collection School Of Computing and Information Systems

Score distillation sampling (SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a single image. It leverages pretrained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization: it unreasonably treats the studentteacher knowledge distillation to be equal at all time-steps and thus entangles coarse-grained and fine-grained modeling. Therefore, we propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both …


Consistent3d: Towards Consistent High-Fidelity Text-To-3d Generation With Deterministic Sampling Prior, Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang Jun 2024

Consistent3d: Towards Consistent High-Fidelity Text-To-3d Generation With Deterministic Sampling Prior, Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang

Research Collection School Of Computing and Information Systems

Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model. However, the randomness in SDE sampling often leads to a diverse and unpredictable sample which is not always less noisy, and thus is …


Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan Jun 2024

Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan

Research Collection School Of Computing and Information Systems

Aligning a user query and video clips in cross-modal latent space and that with semantic concepts are two mainstream approaches for ad-hoc video search (AVS). However, the effectiveness of existing approaches is bottlenecked by the small sizes of available video-text datasets and the low quality of concept banks, which results in the failures of unseen queries and the out-of-vocabulary problem. This paper addresses these two problems by constructing a new dataset and developing a multi-word concept bank. Specifically, capitalizing on a generative model, we construct a new dataset consisting of 7 million generated text and video pairs for pre-training. To …


Violet: Visual Analytics For Explainable Quantum Neural Networks, Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Robert Griffin, Xiaolin Wen, Yanna Lin, Yong Wang Jun 2024

Violet: Visual Analytics For Explainable Quantum Neural Networks, Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Robert Griffin, Xiaolin Wen, Yanna Lin, Yong Wang

Research Collection School Of Computing and Information Systems

With the rapid development of Quantum Machine Learning, quantum neural networks (QNN) have experienced great advancement in the past few years, harnessing the advantages of quantum computing to significantly speed up classical machine learning tasks. Despite their increasing popularity, the quantum neural network is quite counter-intuitive and difficult to understand, due to their unique quantum-specific layers (e.g., data encoding and measurement) in their architecture. It prevents QNN users and researchers from effectively understanding its inner workings and exploring the model training status. To fill the research gap, we propose VIOLET , a novel visual analytics approach to improve the explainability …


Inceptionnext: When Inception Meets Convnext, Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang Jun 2024

Inceptionnext: When Inception Meets Convnext, Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang

Research Collection School Of Computing and Information Systems

Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which …


Jollygesture: Exploring Dual-Purpose Gestures In Vr Presentations, Gun Woo Warren Park, Anthony Tang, Fanny Chevalier Jun 2024

Jollygesture: Exploring Dual-Purpose Gestures In Vr Presentations, Gun Woo Warren Park, Anthony Tang, Fanny Chevalier

Research Collection School Of Computing and Information Systems

Virtual reality (VR) offers new opportunities for presenters to use expressive body language to engage their audience. Yet, most VR presentation systems have adopted control mechanisms that mimic those found in face-to-face presentation systems. We explore the use of gestures that have dual-purpose: first, for the audience, a communicative purpose; second, for the presenter, a control purpose to alter content in slides. To support presenters, we provide guidance on what gestures are available and their effects. We realize our design approach in JollyGesture, a VR technology probe that recognizes dual-purpose gestures in a presentation scenario. We evaluate our approach through …


Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun Jun 2024

Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun

Research Collection School Of Computing and Information Systems

Cross-modal video retrieval aims to retrieve semantically relevant videos when given a textual query, and is one of the fundamental multimedia tasks. Most top-performing methods primarily leverage Vision Transformer (ViT) to extract video features [1]-[3]. However, they suffer from the high computational complexity of ViT, especially when encoding long videos. A common and simple solution is to uniformly sample a small number (e.g., 4 or 8) of frames from the target video (instead of using the whole video) as ViT inputs. The number of frames has a strong influence on the performance of ViT, e.g., using 8 frames yields better …


Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan May 2024

Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan

Research Collection School Of Computing and Information Systems

Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availability of task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the …


Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu May 2024

Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu

Research Collection School Of Computing and Information Systems

Visual analytics (VA) systems have been widely used in various application domains. However, VA systems are complex in design, which imposes a serious problem: although the academic community constantly designs and implements new designs, the designs are difficult to query, understand, and refer to by subsequent designers. To mark a major step forward in tackling this problem, we index VA designs in an expressive and accessible way, transforming the designs into a structured format. We first conducted a workshop study with VA designers to learn user requirements for understanding and retrieving professional designs in VA systems. Thereafter, we came up …


Exploring Diffusion Time-Steps For Unsupervised Representation Learning, Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang May 2024

Exploring Diffusion Time-Steps For Unsupervised Representation Learning, Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang

Research Collection School Of Computing and Information Systems

Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised learning. Specifically, the forward diffusion process incrementally adds Gaussian noise to samples at each time-step, which essentially collapses different samples into similar ones by losing attributes, e.g., fine-grained attributes such as texture are lost with less noise added (i.e., early time-steps), while coarse-grained ones such …


Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang May 2024

Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang

Research Collection School Of Computing and Information Systems

Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. …


The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger May 2024

The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger

Research Collection School Of Computing and Information Systems

To enhance interactions in VR, many techniques introduce offsets between the virtual and real-world position of users’ hands. Nevertheless, such hand redirection (HR) techniques are only effective as long as they go unnoticed by users—not disrupting the VR experience. While several studies consider how much unnoticeable redirection can be applied, these focus on mid-air floating hands that are disconnected from users’ bodies. Increasingly, VR avatars are embodied as being directly connected with the user’s body, which provide more visual cue anchoring, and may therefore reduce the unnoticeable redirection threshold. In this work, we studied more complete avatars and their effect …


Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura May 2024

Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura

Research Collection School Of Computing and Information Systems

Videos accompanied by documents—document-based videos—enable presenters to share contents beyond videos and audience to use them for detailed content comprehension. However, concurrently exploring multiple channels of information could be taxing. We propose SwapVid, a novel interface for viewing and exploring document-based videos. SwapVid seamlessly integrates a video and a document into a single view and lets the content behaves as both video and a document; it adaptively switches a document-based video to act as a video or a document upon direct manipulation (e.g., scrolling the document, manipulating the video timeline). We conducted a user study with twenty participants, comparing SwapVid …


Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu Apr 2024

Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu

Research Collection School Of Computing and Information Systems

Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue.In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to …


Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo Mar 2024

Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo

Research Collection School Of Computing and Information Systems

Automatic segmentation of medical images plays an important role in the diagnosis of diseases. On single-modal data, convolutional neural networks have demonstrated satisfactory performance. However, multi-modal data encompasses a greater amount of information rather than single-modal data. Multi-modal data can be effectively used to improve the segmentation accuracy of regions of interest by analyzing both spatial and temporal information. In this study, we propose a dual-path segmentation model for multi-modal medical images, named TranSiam. Taking into account that there is a significant diversity between the different modalities, TranSiam employs two parallel CNNs to extract the features which are specific to …


Iterative Graph Self-Distillation, Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric Xing Mar 2024

Iterative Graph Self-Distillation, Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric Xing

Research Collection School Of Computing and Information Systems

Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation in an unsupervised manner through instance discrimination using a self-supervised contrastive learning approach. IGSD involves a teacher-student distillation process that uses graph diffusion augmentations and constructs the teacher model using an exponential moving average of the student model. The intuition behind IGSD is to predict the teacher network representation of the graph pairs under different augmented views. As a natural extension, we also apply IGSD to semi-supervised scenarios by …


Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung Mar 2024

Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung

Research Collection School Of Computing and Information Systems

Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revolution of learning implicit representation and point cloud upsampling, which can produce high-quality 3D surface reconstruction and proximity-to-surface, respectively. Our idea is to leverage the implicit field reconstruction or point cloud upsampling techniques as a systematic way …


Towards Understanding Convergence And Generalization Of Adamw, Pan Zhou, Xingyu Xie, Zhouchen Lin, Shuicheng Yan Mar 2024

Towards Understanding Convergence And Generalization Of Adamw, Pan Zhou, Xingyu Xie, Zhouchen Lin, Shuicheng Yan

Research Collection School Of Computing and Information Systems

AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used ℓ2-regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and ℓ2-regularized Adam (ℓ2-Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and ℓ2-Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that …


Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan Feb 2024

Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan

Research Collection School Of Computing and Information Systems

Food computing has long been studied and deployed to several applications. Understanding a food image at the instance level, including recognition, counting and segmentation, is essential to quantifying nutrition and calorie consumption. Nevertheless, existing techniques are limited to either category-specific instance detection, which does not reflect precisely the instance size at the pixel level, or category-agnostic instance segmentation, which is insufficient for dish recognition. This paper presents a compact and fast multi-task network, namely FoodMask, for clustering-based food instance counting, segmentation and recognition. The network learns a semantic space simultaneously encoding food category distribution and instance height at pixel basis. …


Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang Feb 2024

Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang

Research Collection School Of Computing and Information Systems

Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on selfsupervised pretext tasks has become a popular paradigm, but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been …