Open Access. Powered by Scholars. Published by Universities.®
Graphics and Human Computer Interfaces Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Keyword
-
- Canvas-based Processing (2)
- Feature extraction (2)
- Food recognition (2)
- Prompting (2)
- Virtual reality (2)
-
- 3D point cloud (1)
- 3D semantic scene graph prediction (1)
- 3D-aware GANs (1)
- Ad-hoc video search (1)
- Adaptation models (1)
- Adaptive gradient algorithms (1)
- Analysis of AdamW (1)
- Annotated Portfolio (1)
- Attention mechanism (1)
- Audio-visual speech recognition (1)
- Avatar embodiment (1)
- Categorization (1)
- Causal Inference (1)
- Codes (1)
- Computational modeling (1)
- Concept bank construction (1)
- Context modeling (1)
- Contrastive Learning (1)
- Contrastive learning (1)
- Convergence of AdamW (1)
- Cross-Modal (1)
- Cross-modal fusion (1)
- Crowd Counting (1)
- Data Structure (1)
- Data privacy (1)
Articles 1 - 30 of 45
Full-Text Articles in Graphics and Human Computer Interfaces
Triadic Temporal-Semantic Alignment For Weakly-Supervised Video Moment Retrieval, Jin Liu, Jialong Xie, Fengyu Zhou, Shengfeng He
Triadic Temporal-Semantic Alignment For Weakly-Supervised Video Moment Retrieval, Jin Liu, Jialong Xie, Fengyu Zhou, Shengfeng He
Research Collection School Of Computing and Information Systems
Video Moment Retrieval (VMR) aims to identify specific event moments within untrimmed videos based on natural language queries. Existing VMR methods have been criticized for relying heavily on moment annotation bias rather than true multi-modal alignment reasoning. Weakly supervised VMR approaches inherently overcome this issue by training without precise temporal location information. However, they struggle with fine-grained semantic alignment and often yield multiple speculative predictions with prolonged video spans. In this paper, we take a step forward in the context of weakly supervised VMR by proposing a triadic temporalsemantic alignment model. Our proposed approach augments weak supervision by comprehensively addressing …
Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu
Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu
Research Collection School Of Computing and Information Systems
This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi-grained features from intricate 3D scenes, largely due to a focus on global scene processing and single-scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi-granularity analysis by predicting relation triplets from specific sub-scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape-adaptive local point cloud sampling, thereby …
Nonfactoid Question Answering As Query-Focused Summarization With Graph-Enhanced Multihop Inference, Yang Deng, Wenxuan Zhang, Weiwen Xu, Ying Shen, Wai Lam
Nonfactoid Question Answering As Query-Focused Summarization With Graph-Enhanced Multihop Inference, Yang Deng, Wenxuan Zhang, Weiwen Xu, Ying Shen, Wai Lam
Research Collection School Of Computing and Information Systems
Nonfactoid question answering (QA) is one of the most extensive yet challenging applications and research areas in natural language processing (NLP). Existing methods fall short of handling the long-distance and complex semantic relations between the question and the document sentences. In this work, we propose a novel query-focused summarization method, namely a graph-enhanced multihop query-focused summarizer (GMQS), to tackle the nonfactoid QA problem. Specifically, we leverage graph-enhanced reasoning techniques to elaborate the multihop inference process in nonfactoid QA. Three types of graphs with different semantic relations, namely semantic relevance, topical coherence, and coreference linking, are constructed for explicitly capturing the …
G2face: High-Fidelity Reversible Face Anonymization Via Generative And Geometric Priors, Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He
G2face: High-Fidelity Reversible Face Anonymization Via Generative And Geometric Priors, Haoxin Yang, Xuemiao Xu, Cheng Xu, Huaidong Zhang, Jing Qin, Yi Wang, Pheng-Ann Heng, Shengfeng He
Research Collection School Of Computing and Information Systems
Reversible face anonymization, unlike traditional face pixelization, seeks to replace sensitive identity information in facial images with synthesized alternatives, preserving privacy without sacrificing image clarity. Traditional methods, such as encoder-decoder networks, often result in significant loss of facial details due to their limited learning capacity. Additionally, relying on latent manipulation in pre-trained GANs can lead to changes in ID-irrelevant attributes, adversely affecting data utility due to GAN inversion inaccuracies. This paper introduces G 2 Face, which leverages both generative and geometric priors to enhance identity manipulation, achieving high-quality reversible face anonymization without compromising data utility. We utilize a 3D face …
Certified Robust Accuracy Of Neural Networks Are Bounded Due To Bayes Errors, Ruihan Zhang, Jun Sun
Certified Robust Accuracy Of Neural Networks Are Bounded Due To Bayes Errors, Ruihan Zhang, Jun Sun
Research Collection School Of Computing and Information Systems
Adversarial examples pose a security threat to many critical systems built on neural networks. While certified training improves robustness, it also decreases accuracy noticeably. Despite various proposals for addressing this issue, the significant accuracy drop remains. More importantly, it is not clear whether there is a certain fundamental limit on achieving robustness whilst maintaining accuracy. In this work, we offer a novel perspective based on Bayes errors. By adopting Bayes error to robustness analysis, we investigate the limit of certified robust accuracy, taking into account data distribution uncertainties. We first show that the accuracy inevitably decreases in the pursuit of …
Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra
Jigsaw: Edge-Based Streaming Perception Over Spatially Overlapped Multi-Camera Deployments, Ila Gokarn, Yigong Hu, Tarek Abdelzaher, Archan Misra
Research Collection School Of Computing and Information Systems
We present JIGSAW, a novel system that performs edge-based streaming perception over multiple video streams, while additionally factoring in the redundancy offered by the spatial overlap often exhibited in urban, multi-camera deployments. To assure high streaming throughput, JIGSAW extracts and spatially multiplexes multiple regions-of-interest from different camera frames into a smaller canvas frame. Moreover, to ensure that perception stays abreast of evolving object kinematics, JIGSAW includes a utility-based weighted scheduler to preferentially prioritize and even skip object-specific tiles extracted from an incoming stream of camera frames. Using the CityflowV2 traffic surveillance dataset, we show that JIGSAW can simultaneously process 25 …
How People Prompt Generative Ai To Create Interactive Vr Scenes, Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang
How People Prompt Generative Ai To Create Interactive Vr Scenes, Setareh Aghel Manesh, Tianyi Zhang, Yuki Onishi, Kotaro Hara, Scott Bateman, Jiannan Li, Anthony Tang
Research Collection School Of Computing and Information Systems
Generative AI tools can provide people with the ability to create virtual environments and scenes with natural language prompts. Yet, how people will formulate such prompts is unclear---particularly when they inhabit the environment that they are designing. For instance, it is likely that a person might say, "Put a chair here,'' while pointing at a location. If such linguistic and embodied features are common to people's prompts, we need to tune models to accommodate them. In this work, we present a Wizard of Oz elicitation study with 22 participants, where we studied people's implicit expectations when verbally prompting such programming …
Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He
Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He
Research Collection School Of Computing and Information Systems
Restoring old photographs can preserve cherished memories. Previous methods handled diverse damages within the same network structure, which proved impractical. In addition, these methods cannot exploit correlations among artifacts, especially in scratches versus patch-misses issues. Hence, a tailored network is particularly crucial. In light of this, we propose a unified framework consisting of two key components: ScratchNet and PatchNet. In detail, ScratchNet employs the parallel Multi-scale Partial Convolution Module to effectively repair scratches, learning from multi-scale local receptive fields. In contrast, the patch-misses necessitate the network to emphasize global information. To this end, we incorporate a transformer-based encoder and decoder …
Let’S Think Outside The Box: Exploring Leap-Of-Thought In Large Language Models With Multimodal Humor Generation, Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou
Let’S Think Outside The Box: Exploring Leap-Of-Thought In Large Language Models With Multimodal Humor Generation, Shanshan Zhong, Zhongzhan Huang, Shanghua Gao, Wushao Wen, Liang Lin, Marinka Zitnik, Pan Zhou
Research Collection School Of Computing and Information Systems
Chain-of-Thought (CoT) [2, 3] guides large language models (LLMs) to reason step-by-step, and can motivate their logical reasoning ability. While effective for logical tasks, CoT is not conducive to creative problem-solving which often requires out-of-box thoughts and is crucial for innovation advancements. In this paper, we explore the Leap-of-Thought (LoT) abilities within LLMs — a nonsequential, creative paradigm involving strong associations and knowledge leaps. To this end, we study LLMs on the popular Oogiri game which needs participants to have good creativity and strong associative thinking for responding unexpectedly and humorously to the given image, text, or both, and thus …
Few-Shot Learner Parameterization By Diffusion Time-Steps, Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Sun Qianru
Few-Shot Learner Parameterization By Diffusion Time-Steps, Zhongqi Yue, Pan Zhou, Richang Hong, Hanwang Zhang, Sun Qianru
Research Collection School Of Computing and Information Systems
Even when using large multi-modal foundation models, few-shot learning is still challenging—if there is no proper inductive bias, it is nearly impossible to keep the nuanced class attributes while removing the visually prominent attributes that spuriously correlate with class labels. To this end, we find an inductive bias that the time-steps of a Diffusion Model (DM) can isolate the nuanced class attributes, i.e., as the forward diffusion adds noise to an image at each time-step, nuanced attributes are usually lost at an earlier time-step than the spurious attributes that are visually prominent. Building on this, we propose Time-step Few-shot (TiF) …
Diffusion Time-Step Curriculum For One Image To 3d Generation, Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo Hwee Lim, Hanwang Zhang
Diffusion Time-Step Curriculum For One Image To 3d Generation, Xuanyu Yi, Zike Wu, Qingshan Xu, Pan Zhou, Joo Hwee Lim, Hanwang Zhang
Research Collection School Of Computing and Information Systems
Score distillation sampling (SDS) has been widely adopted to overcome the absence of unseen views in reconstructing 3D objects from a single image. It leverages pretrained 2D diffusion models as teacher to guide the reconstruction of student 3D models. Despite their remarkable success, SDS-based methods often encounter geometric artifacts and texture saturation. We find out the crux is the overlooked indiscriminate treatment of diffusion time-steps during optimization: it unreasonably treats the studentteacher knowledge distillation to be equal at all time-steps and thus entangles coarse-grained and fine-grained modeling. Therefore, we propose the Diffusion Time-step Curriculum one-image-to-3D pipeline (DTC123), which involves both …
Consistent3d: Towards Consistent High-Fidelity Text-To-3d Generation With Deterministic Sampling Prior, Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang
Consistent3d: Towards Consistent High-Fidelity Text-To-3d Generation With Deterministic Sampling Prior, Zike Wu, Pan Zhou, Xuanyu Yi, Xiaoding Yuan, Hanwang Zhang
Research Collection School Of Computing and Information Systems
Score distillation sampling (SDS) and its variants have greatly boosted the development of text-to-3D generation, but are vulnerable to geometry collapse and poor textures yet. To solve this issue, we first deeply analyze the SDS and find that its distillation sampling process indeed corresponds to the trajectory sampling of a stochastic differential equation (SDE): SDS samples along an SDE trajectory to yield a less noisy sample which then serves as a guidance to optimize a 3D model. However, the randomness in SDE sampling often leads to a diverse and unpredictable sample which is not always less noisy, and thus is …
Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan
Improving Interpretable Embeddings For Ad-Hoc Video Search With Generative Captions And Multi-Word Concept Bank, Jiaxin Wu, Chong-Wah Ngo, Wing-Kwong Chan
Research Collection School Of Computing and Information Systems
Aligning a user query and video clips in cross-modal latent space and that with semantic concepts are two mainstream approaches for ad-hoc video search (AVS). However, the effectiveness of existing approaches is bottlenecked by the small sizes of available video-text datasets and the low quality of concept banks, which results in the failures of unseen queries and the out-of-vocabulary problem. This paper addresses these two problems by constructing a new dataset and developing a multi-word concept bank. Specifically, capitalizing on a generative model, we construct a new dataset consisting of 7 million generated text and video pairs for pre-training. To …
Violet: Visual Analytics For Explainable Quantum Neural Networks, Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Robert Griffin, Xiaolin Wen, Yanna Lin, Yong Wang
Violet: Visual Analytics For Explainable Quantum Neural Networks, Shaolun Ruan, Zhiding Liang, Qiang Guan, Paul Robert Griffin, Xiaolin Wen, Yanna Lin, Yong Wang
Research Collection School Of Computing and Information Systems
With the rapid development of Quantum Machine Learning, quantum neural networks (QNN) have experienced great advancement in the past few years, harnessing the advantages of quantum computing to significantly speed up classical machine learning tasks. Despite their increasing popularity, the quantum neural network is quite counter-intuitive and difficult to understand, due to their unique quantum-specific layers (e.g., data encoding and measurement) in their architecture. It prevents QNN users and researchers from effectively understanding its inner workings and exploring the model training status. To fill the research gap, we propose VIOLET , a novel visual analytics approach to improve the explainability …
Inceptionnext: When Inception Meets Convnext, Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
Inceptionnext: When Inception Meets Convnext, Weihao Yu, Pan Zhou, Shuicheng Yan, Xinchao Wang
Research Collection School Of Computing and Information Systems
Inspired by the long-range modeling ability of ViTs, large-kernel convolutions are widely studied and adopted recently to enlarge the receptive field and improve model performance, like the remarkable work ConvNeXt which employs 7×7 depthwise convolution. Although such depthwise operator only consumes a few FLOPs, it largely harms the model efficiency on powerful computing devices due to the high memory access costs. For example, ConvNeXtT has similar FLOPs with ResNet-50 but only achieves ∼ 60% throughputs when trained on A100 GPUs with full precision. Although reducing the kernel size of ConvNeXt can improve speed, it results in significant performance degradation, which …
Jollygesture: Exploring Dual-Purpose Gestures In Vr Presentations, Gun Woo Warren Park, Anthony Tang, Fanny Chevalier
Jollygesture: Exploring Dual-Purpose Gestures In Vr Presentations, Gun Woo Warren Park, Anthony Tang, Fanny Chevalier
Research Collection School Of Computing and Information Systems
Virtual reality (VR) offers new opportunities for presenters to use expressive body language to engage their audience. Yet, most VR presentation systems have adopted control mechanisms that mimic those found in face-to-face presentation systems. We explore the use of gestures that have dual-purpose: first, for the audience, a communicative purpose; second, for the presenter, a control purpose to alter content in slides. To support presenters, we provide guidance on what gestures are available and their effects. We realize our design approach in JollyGesture, a VR technology probe that recognizes dual-purpose gestures in a presentation scenario. We evaluate our approach through …
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun
Efficient Cross-Modal Video Retrieval With Meta-Optimized Frames, Ning Han, Xun Yang, Ee-Peng Lim, Hao Chen, Qianru Sun
Research Collection School Of Computing and Information Systems
Cross-modal video retrieval aims to retrieve semantically relevant videos when given a textual query, and is one of the fundamental multimedia tasks. Most top-performing methods primarily leverage Vision Transformer (ViT) to extract video features [1]-[3]. However, they suffer from the high computational complexity of ViT, especially when encoding long videos. A common and simple solution is to uniformly sample a small number (e.g., 4 or 8) of frames from the target video (instead of using the whole video) as ViT inputs. The number of frames has a strong influence on the performance of ViT, e.g., using 8 frames yields better …
Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan
Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan
Research Collection School Of Computing and Information Systems
Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availability of task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the …
Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu
Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu
Research Collection School Of Computing and Information Systems
Visual analytics (VA) systems have been widely used in various application domains. However, VA systems are complex in design, which imposes a serious problem: although the academic community constantly designs and implements new designs, the designs are difficult to query, understand, and refer to by subsequent designers. To mark a major step forward in tackling this problem, we index VA designs in an expressive and accessible way, transforming the designs into a structured format. We first conducted a workshop study with VA designers to learn user requirements for understanding and retrieving professional designs in VA systems. Thereafter, we came up …
Exploring Diffusion Time-Steps For Unsupervised Representation Learning, Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang
Exploring Diffusion Time-Steps For Unsupervised Representation Learning, Zhongqi Yue, Jiankun Wang, Qianru Sun, Lei Ji, Eric I-Chao Chang, Hanwang Zhang
Research Collection School Of Computing and Information Systems
Representation learning is all about discovering the hidden modular attributes that generate the data faithfully. We explore the potential of Denoising Diffusion Probabilistic Model (DM) in unsupervised learning of the modular attributes. We build a theoretical framework that connects the diffusion time-steps and the hidden attributes, which serves as an effective inductive bias for unsupervised learning. Specifically, the forward diffusion process incrementally adds Gaussian noise to samples at each time-step, which essentially collapses different samples into similar ones by losing attributes, e.g., fine-grained attributes such as texture are lost with less noise added (i.e., early time-steps), while coarse-grained ones such …
Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang
Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang
Research Collection School Of Computing and Information Systems
Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. …
The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger
The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger
Research Collection School Of Computing and Information Systems
To enhance interactions in VR, many techniques introduce offsets between the virtual and real-world position of users’ hands. Nevertheless, such hand redirection (HR) techniques are only effective as long as they go unnoticed by users—not disrupting the VR experience. While several studies consider how much unnoticeable redirection can be applied, these focus on mid-air floating hands that are disconnected from users’ bodies. Increasingly, VR avatars are embodied as being directly connected with the user’s body, which provide more visual cue anchoring, and may therefore reduce the unnoticeable redirection threshold. In this work, we studied more complete avatars and their effect …
Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura
Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura
Research Collection School Of Computing and Information Systems
Videos accompanied by documents—document-based videos—enable presenters to share contents beyond videos and audience to use them for detailed content comprehension. However, concurrently exploring multiple channels of information could be taxing. We propose SwapVid, a novel interface for viewing and exploring document-based videos. SwapVid seamlessly integrates a video and a document into a single view and lets the content behaves as both video and a document; it adaptively switches a document-based video to act as a video or a document upon direct manipulation (e.g., scrolling the document, manipulating the video timeline). We conducted a user study with twenty participants, comparing SwapVid …
Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu
Coca: Improving And Explaining Graph Neural Network-Based Vulnerability Detection Systems, Sicong Cao, Xiaobing Sun, Xiaoxue Wu, David Lo, Lili Bo, Bin Li, Wei Liu
Research Collection School Of Computing and Information Systems
Recently, Graph Neural Network (GNN)-based vulnerability detection systems have achieved remarkable success. However, the lack of explainability poses a critical challenge to deploy black-box models in security-related domains. For this reason, several approaches have been proposed to explain the decision logic of the detection model by providing a set of crucial statements positively contributing to its predictions. Unfortunately, due to the weakly-robust detection models and suboptimal explanation strategy, they have the danger of revealing spurious correlations and redundancy issue.In this paper, we propose Coca, a general framework aiming to 1) enhance the robustness of existing GNN-based vulnerability detection models to …
Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo
Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo
Research Collection School Of Computing and Information Systems
Automatic segmentation of medical images plays an important role in the diagnosis of diseases. On single-modal data, convolutional neural networks have demonstrated satisfactory performance. However, multi-modal data encompasses a greater amount of information rather than single-modal data. Multi-modal data can be effectively used to improve the segmentation accuracy of regions of interest by analyzing both spatial and temporal information. In this study, we propose a dual-path segmentation model for multi-modal medical images, named TranSiam. Taking into account that there is a significant diversity between the different modalities, TranSiam employs two parallel CNNs to extract the features which are specific to …
Iterative Graph Self-Distillation, Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric Xing
Iterative Graph Self-Distillation, Hanlin Zhang, Shuai Lin, Weiyang Liu, Pan Zhou, Jian Tang, Xiaodan Liang, Eric Xing
Research Collection School Of Computing and Information Systems
Recently, there has been increasing interest in the challenge of how to discriminatively vectorize graphs. To address this, we propose a method called Iterative Graph Self-Distillation (IGSD) which learns graph-level representation in an unsupervised manner through instance discrimination using a self-supervised contrastive learning approach. IGSD involves a teacher-student distillation process that uses graph diffusion augmentations and constructs the teacher model using an exponential moving average of the student model. The intuition behind IGSD is to predict the teacher network representation of the graph pairs under different augmented views. As a natural extension, we also apply IGSD to semi-supervised scenarios by …
Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung
Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung
Research Collection School Of Computing and Information Systems
Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revolution of learning implicit representation and point cloud upsampling, which can produce high-quality 3D surface reconstruction and proximity-to-surface, respectively. Our idea is to leverage the implicit field reconstruction or point cloud upsampling techniques as a systematic way …
Towards Understanding Convergence And Generalization Of Adamw, Pan Zhou, Xingyu Xie, Zhouchen Lin, Shuicheng Yan
Towards Understanding Convergence And Generalization Of Adamw, Pan Zhou, Xingyu Xie, Zhouchen Lin, Shuicheng Yan
Research Collection School Of Computing and Information Systems
AdamW modifies Adam by adding a decoupled weight decay to decay network weights per training iteration. For adaptive algorithms, this decoupled weight decay does not affect specific optimization steps, and differs from the widely used ℓ2-regularizer which changes optimization steps via changing the first- and second-order gradient moments. Despite its great practical success, for AdamW, its convergence behavior and generalization improvement over Adam and ℓ2-regularized Adam (ℓ2-Adam) remain absent yet. To solve this issue, we prove the convergence of AdamW and justify its generalization advantages over Adam and ℓ2-Adam. Specifically, AdamW provably converges but minimizes a dynamically regularized loss that …
Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan
Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan
Research Collection School Of Computing and Information Systems
Food computing has long been studied and deployed to several applications. Understanding a food image at the instance level, including recognition, counting and segmentation, is essential to quantifying nutrition and calorie consumption. Nevertheless, existing techniques are limited to either category-specific instance detection, which does not reflect precisely the instance size at the pixel level, or category-agnostic instance segmentation, which is insufficient for dish recognition. This paper presents a compact and fast multi-task network, namely FoodMask, for clustering-based food instance counting, segmentation and recognition. The network learns a semantic space simultaneously encoding food category distribution and instance height at pixel basis. …
Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang
Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang
Research Collection School Of Computing and Information Systems
Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on selfsupervised pretext tasks has become a popular paradigm, but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been …