Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 603

Full-Text Articles in Computer Sciences

Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu Sep 2024

Granular3d: Delving Into Multi-Granularity 3d Scene Graph Prediction, Kaixiang Huang, Jingru Yang, Jin Wang, Shengfeng He, Zhan Wang, Haiyan He, Qifeng Zhang, Guodong Lu

Research Collection School Of Computing and Information Systems

This paper addresses the significant challenges in 3D Semantic Scene Graph (3DSSG) prediction, essential for understanding complex 3D environments. Traditional approaches, primarily using PointNet and Graph Convolutional Networks, struggle with effectively extracting multi-grained features from intricate 3D scenes, largely due to a focus on global scene processing and single-scale feature extraction. To overcome these limitations, we introduce Granular3D, a novel approach that shifts the focus towards multi-granularity analysis by predicting relation triplets from specific sub-scenes. One key is the Adaptive Instance Enveloping Method (AIEM), which establishes an approximate envelope structure around irregular instances, providing shape-adaptive local point cloud sampling, thereby …


Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He Jul 2024

Hierarchical Damage Correlations For Old Photo Restoration, Weiwei Cai, Xuemiao Xu, Jiajia Xu, Huaidong Zhang, Haoxin Yang, Kun Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Restoring old photographs can preserve cherished memories. Previous methods handled diverse damages within the same network structure, which proved impractical. In addition, these methods cannot exploit correlations among artifacts, especially in scratches versus patch-misses issues. Hence, a tailored network is particularly crucial. In light of this, we propose a unified framework consisting of two key components: ScratchNet and PatchNet. In detail, ScratchNet employs the parallel Multi-scale Partial Convolution Module to effectively repair scratches, learning from multi-scale local receptive fields. In contrast, the patch-misses necessitate the network to emphasize global information. To this end, we incorporate a transformer-based encoder and decoder …


Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang May 2024

Diffusion-Based Negative Sampling On Graphs For Link Prediction, Yuan Fang, Yuan Fang

Research Collection School Of Computing and Information Systems

Link prediction is a fundamental task for graph analysis with important applications on the Web, such as social network analysis and recommendation systems, etc. Modern graph link prediction methods often employ a contrastive approach to learn robust node representations, where negative sampling is pivotal. Typical negative sampling methods aim to retrieve hard examples based on either predefined heuristics or automatic adversarial approaches, which might be inflexible or difficult to control. Furthermore, in the context of link prediction, most previous methods sample negative nodes from existing substructures of the graph, missing out on potentially more optimal samples in the latent space. …


The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger May 2024

The Impact Of Avatar Completeness On Embodiment And The Detectability Of Hand Redirection In Virtual Reality, Martin Feick, Andre Zenner, Simon Seibert, Anthony Tang, Antonio Krüger

Research Collection School Of Computing and Information Systems

To enhance interactions in VR, many techniques introduce offsets between the virtual and real-world position of users’ hands. Nevertheless, such hand redirection (HR) techniques are only effective as long as they go unnoticed by users—not disrupting the VR experience. While several studies consider how much unnoticeable redirection can be applied, these focus on mid-air floating hands that are disconnected from users’ bodies. Increasingly, VR avatars are embodied as being directly connected with the user’s body, which provide more visual cue anchoring, and may therefore reduce the unnoticeable redirection threshold. In this work, we studied more complete avatars and their effect …


Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura May 2024

Swapvid: Integrating Video Viewing And Document Exploration With Direct Manipulation, Taichi Murakami, Kazuyuki Fujita, Kotaro Hara, Kazuki Takashima, Yoshifumi Kitamura

Research Collection School Of Computing and Information Systems

Videos accompanied by documents—document-based videos—enable presenters to share contents beyond videos and audience to use them for detailed content comprehension. However, concurrently exploring multiple channels of information could be taxing. We propose SwapVid, a novel interface for viewing and exploring document-based videos. SwapVid seamlessly integrates a video and a document into a single view and lets the content behaves as both video and a document; it adaptively switches a document-based video to act as a video or a document upon direct manipulation (e.g., scrolling the document, manipulating the video timeline). We conducted a user study with twenty participants, comparing SwapVid …


Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu May 2024

Vaid: Indexing View Designs In Visual Analytics System, Lu Ying, Aoyu Wu, Haotian Li, Zikun Deng, Ji Lan, Jiang Wu, Yong Wang, Huamin Qu, Dazhen Deng, Yingcai Wu

Research Collection School Of Computing and Information Systems

Visual analytics (VA) systems have been widely used in various application domains. However, VA systems are complex in design, which imposes a serious problem: although the academic community constantly designs and implements new designs, the designs are difficult to query, understand, and refer to by subsequent designers. To mark a major step forward in tackling this problem, we index VA designs in an expressive and accessible way, transforming the designs into a structured format. We first conducted a workshop study with VA designers to learn user requirements for understanding and retrieving professional designs in VA systems. Thereafter, we came up …


Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan May 2024

Multigprompt For Multi-Task Pre-Training And Prompting On Graphs, Xingtong Yu, Chang Zhou, Yuan Fang, Xinming Zhan

Research Collection School Of Computing and Information Systems

Graph Neural Networks (GNNs) have emerged as a mainstream technique for graph representation learning. However, their efficacy within an end-to-end supervised framework is significantly tied to the availability of task-specific labels. To mitigate labeling costs and enhance robustness in few-shot settings, pre-training on self-supervised tasks has emerged as a promising method, while prompting has been proposed to further narrow the objective gap between pretext and downstream tasks. Although there has been some initial exploration of prompt-based learning on graphs, they primarily leverage a single pretext task, resulting in a limited subset of general knowledge that could be learned from the …


Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung Mar 2024

Test-Time Augmentation For 3d Point Cloud Classification And Segmentation, Tuan-Anh Vu, Srinjay Sarkar, Zhiyuan Zhang, Binh-Son Hua, Sai-Kit Yeung

Research Collection School Of Computing and Information Systems

Data augmentation is a powerful technique to enhance the performance of a deep learning task but has received less attention in 3D deep learning. It is well known that when 3D shapes are sparsely represented with low point density, the performance of the downstream tasks drops significantly. This work explores test-time augmentation (TTA) for 3D point clouds. We are inspired by the recent revolution of learning implicit representation and point cloud upsampling, which can produce high-quality 3D surface reconstruction and proximity-to-surface, respectively. Our idea is to leverage the implicit field reconstruction or point cloud upsampling techniques as a systematic way …


Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo Mar 2024

Transiam: Aggregating Multi-Modal Visual Features With Locality For Medical Image Segmentation, Xuejian Li, Shiqiang Ma, Junhai Xu, Jijun Tang, Shengfeng He, Fei Guo

Research Collection School Of Computing and Information Systems

Automatic segmentation of medical images plays an important role in the diagnosis of diseases. On single-modal data, convolutional neural networks have demonstrated satisfactory performance. However, multi-modal data encompasses a greater amount of information rather than single-modal data. Multi-modal data can be effectively used to improve the segmentation accuracy of regions of interest by analyzing both spatial and temporal information. In this study, we propose a dual-path segmentation model for multi-modal medical images, named TranSiam. Taking into account that there is a significant diversity between the different modalities, TranSiam employs two parallel CNNs to extract the features which are specific to …


Leveraging Llms And Generative Models For Interactive Known-Item Video Search, Zhixin Ma, Jiaxin Wu, Chong-Wah Ngo Feb 2024

Leveraging Llms And Generative Models For Interactive Known-Item Video Search, Zhixin Ma, Jiaxin Wu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

While embedding techniques such as CLIP have considerably boosted search performance, user strategies in interactive video search still largely operate on a trial-and-error basis. Users are often required to manually adjust their queries and carefully inspect the search results, which greatly rely on the users’ capability and proficiency. Recent advancements in large language models (LLMs) and generative models offer promising avenues for enhancing interactivity in video retrieval and reducing the personal bias in query interpretation, particularly in the known-item search. Specifically, LLMs can expand and diversify the semantics of the queries while avoiding grammar mistakes or the language barrier. In …


M3sa: Multimodal Sentiment Analysis Based On Multi-Scale Feature Extraction And Multi-Task Learning, Changkai Lin, Hongju Cheng, Qiang Rao, Yang Yang Feb 2024

M3sa: Multimodal Sentiment Analysis Based On Multi-Scale Feature Extraction And Multi-Task Learning, Changkai Lin, Hongju Cheng, Qiang Rao, Yang Yang

Research Collection School Of Computing and Information Systems

Sentiment analysis plays an indispensable part in human-computer interaction. Multimodal sentiment analysis can overcome the shortcomings of unimodal sentiment analysis by fusing multimodal data. However, how to extracte improved feature representations and how to execute effective modality fusion are two crucial problems in multimodal sentiment analysis. Traditional work uses simple sub-models for feature extraction, and they ignore features of different scales and fuse different modalities of data equally, making it easier to incorporate extraneous information and affect analysis accuracy. In this paper, we propose a Multimodal Sentiment Analysis model based on Multi-scale feature extraction and Multi-task learning (M 3 SA). …


Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei Wang, Jianchen Mi, Boquan Li, Yixu Zhao, Jiaxiang Meng Feb 2024

Catnet: Cross-Modal Fusion For Audio-Visual Speech Recognition, Xingmei Wang, Jianchen Mi, Boquan Li, Yixu Zhao, Jiaxiang Meng

Research Collection School Of Computing and Information Systems

Automatic speech recognition (ASR) is a typical pattern recognition technology that converts human speeches into texts. With the aid of advanced deep learning models, the performance of speech recognition is significantly improved. Especially, the emerging Audio–Visual Speech Recognition (AVSR) methods achieve satisfactory performance by combining audio-modal and visual-modal information. However, various complex environments, especially noises, limit the effectiveness of existing methods. In response to the noisy problem, in this paper, we propose a novel cross-modal audio–visual speech recognition model, named CATNet. First, we devise a cross-modal bidirectional fusion model to analyze the close relationship between audio and visual modalities. Second, …


Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan Feb 2024

Foodmask: Real-Time Food Instance Counting, Segmentation And Recognition, Huu-Thanh Nguyen, Yu Cao, Chong-Wah Ngo, Wing-Kwong Chan

Research Collection School Of Computing and Information Systems

Food computing has long been studied and deployed to several applications. Understanding a food image at the instance level, including recognition, counting and segmentation, is essential to quantifying nutrition and calorie consumption. Nevertheless, existing techniques are limited to either category-specific instance detection, which does not reflect precisely the instance size at the pixel level, or category-agnostic instance segmentation, which is insufficient for dish recognition. This paper presents a compact and fast multi-task network, namely FoodMask, for clustering-based food instance counting, segmentation and recognition. The network learns a semantic space simultaneously encoding food category distribution and instance height at pixel basis. …


Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang Feb 2024

Hgprompt: Bridging Homogeneous And Heterogeneous Graphs For Few-Shot Prompt Learning, Xingtong Yu, Yuan Fang, Zemin Liu, Xinming Zhang

Research Collection School Of Computing and Information Systems

Graph neural networks (GNNs) and heterogeneous graph neural networks (HGNNs) are prominent techniques for homogeneous and heterogeneous graph representation learning, yet their performance in an end-to-end supervised framework greatly depends on the availability of task-specific supervision. To reduce the labeling cost, pre-training on selfsupervised pretext tasks has become a popular paradigm, but there is often a gap between the pre-trained model and downstream tasks, stemming from the divergence in their objectives. To bridge the gap, prompt learning has risen as a promising direction especially in few-shot settings, without the need to fully fine-tune the pre-trained model. While there has been …


Delving Into Multimodal Prompting For Fine-Grained Visual Classification, Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li Feb 2024

Delving Into Multimodal Prompting For Fine-Grained Visual Classification, Xin Jiang, Hao Tang, Junyao Gao, Xiaoyu Du, Shengfeng He, Zechao Li

Research Collection School Of Computing and Information Systems

Fine-grained visual classification (FGVC) involves categorizing fine subdivisions within a broader category, which poses challenges due to subtle inter-class discrepancies and large intra-class variations. However, prevailing approaches primarily focus on uni-modal visual concepts. Recent advancements in pre-trained vision-language models have demonstrated remarkable performance in various high-level vision tasks, yet the applicability of such models to FGVC tasks remains uncertain. In this paper, we aim to fully exploit the capabilities of cross-modal description to tackle FGVC tasks and propose a novel multimodal prompting solution, denoted as MP-FGVC, based on the contrastive language-image pertaining (CLIP) model. Our MP-FGVC comprises a multimodal prompts …


Simple Image-Level Classification Improves Open-Vocabulary Object Detection, Ruohuan Fang, Guansong Pang, Xiao Bai Feb 2024

Simple Image-Level Classification Improves Open-Vocabulary Object Detection, Ruohuan Fang, Guansong Pang, Xiao Bai

Research Collection School Of Computing and Information Systems

Open-Vocabulary Object Detection (OVOD) aims to detect novel objects beyond a given set of base categories on which the detection model is trained. Recent OVOD methods focus on adapting the image-level pre-trained vision-language models (VLMs), such as CLIP, to a region-level object detection task via, eg., region-level knowledge distillation, regional prompt learning, or region-text pre-training, to expand the detection vocabulary. These methods have demonstrated remarkable performance in recognizing regional visual concepts, but they are weak in exploiting the VLMs' powerful global scene understanding ability learned from the billion-scale image-level text descriptions. This limits their capability in detecting hard objects of …


Trust: The Feature That Vending Machines And Atms Share, But Simplygo Lacks, Sun Sun Lim Jan 2024

Trust: The Feature That Vending Machines And Atms Share, But Simplygo Lacks, Sun Sun Lim

Research Collection College of Integrative Studies

The article discussed the intricacies of trust in the SimplyGo debacle and highlighted how the design of physical interfaces like vending machines and ATMs and digital interfaces from apps like Grab, Parking.sg and ShopBack have critical features to instil trust. People need to be reassured that their transactions have proceeded as they should, and thay have not been short-changed.


Tracking People Across Ultra Populated Indoor Spaces By Matching Unreliable Wi-Fi Signals With Disconnected Video Feeds, Quang Hai Truong, Dheryta Jaisinghani, Shubham Jain, Arunesh Sinha, Jeong Gil Ko, Rajesh Krishna Balan Jan 2024

Tracking People Across Ultra Populated Indoor Spaces By Matching Unreliable Wi-Fi Signals With Disconnected Video Feeds, Quang Hai Truong, Dheryta Jaisinghani, Shubham Jain, Arunesh Sinha, Jeong Gil Ko, Rajesh Krishna Balan

Research Collection School Of Computing and Information Systems

Tracking in dense indoor environments where several thousands of people move around is an extremely challenging problem. In this paper, we present a system — DenseTrack for tracking people in such environments. DenseTrack leverages data from the sensing modalities that are already present in these environments — Wi-Fi (from enterprise network deployments) and Video (from surveillance cameras). We combine Wi-Fi information with video data to overcome the individual errors induced by these modalities. More precisely, the locations derived from video are used to overcome the localization errors inherent in using Wi-Fi signals where precise Wi-Fi MAC IDs are used to …


Instant3d: Instant Text-To-3d Generation, Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu Jan 2024

Instant3d: Instant Text-To-3d Generation, Ming Li, Pan Zhou, Jia-Wei Liu, Jussi Keppo, Min Lin, Shuicheng Yan, Xiangyu Xu

Research Collection School Of Computing and Information Systems

Text-to-3D generation has attracted much attention from the computer vision community. Existing methods mainly optimize a neural field from scratch for each text prompt, relying on heavy and repetitive training cost which impedes their practical deployment. In this paper, we propose a novel framework for fast text-to-3D generation, dubbed Instant3D. Once trained, Instant3D is able to create a 3D object for an unseen text prompt in less than one second with a single run of a feedforward network. We achieve this remarkable speed by devising a new network that directly constructs a 3D triplane from a text prompt. The core …


Dynamic Meta-Path Guided Temporal Heterogeneous Graph Neural Networks, Yugang Ji, Chuan Shi, Yuan Fang Jan 2024

Dynamic Meta-Path Guided Temporal Heterogeneous Graph Neural Networks, Yugang Ji, Chuan Shi, Yuan Fang

Research Collection School Of Computing and Information Systems

Graph Neural Networks (GNNs) have become the de facto standard for representation learning on topological graphs, which usually derive effective node representations via message passing from neighborhoods. Although GNNs have achieved great success, previous models are mostly confined to static and homogeneous graphs. However, there are multiple dynamic interactions between different-typed nodes in real-world scenarios like academic networks and e-commerce platforms, forming temporal heterogeneous graphs (THGs). Limited work has been done for representation learning on THGs and the challenges are in two aspects. First, there are abundant dynamic semantics between nodes while traditional techniques like meta-paths can only capture static …


Efficient Unsupervised Video Hashing With Contextual Modeling And Structural Controlling, Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, Xiang Wang Jan 2024

Efficient Unsupervised Video Hashing With Contextual Modeling And Structural Controlling, Jingru Duan, Yanbin Hao, Bin Zhu, Lechao Cheng, Pengyuan Zhou, Xiang Wang

Research Collection School Of Computing and Information Systems

The most important effect of the video hashing technique is to support fast retrieval, which is benefiting from the high efficiency of binary calculation. Current video hash approaches are thus mainly targeted at learning compact binary codes to represent video content accurately. However, they may overlook the generation efficiency for hash codes, i.e., designing lightweight neural networks. This paper proposes an method, which is not only for computing compact hash codes but also for designing a lightweight deep model. Specifically, we present an MLP-based model, where the video tensor is split into several groups and multiple axial contexts are explored …


Glance To Count: Learning To Rank With Anchors For Weakly-Supervised Crowd Counting, Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He Jan 2024

Glance To Count: Learning To Rank With Anchors For Weakly-Supervised Crowd Counting, Zheng Xiong, Liangyu Chai, Wenxi Liu, Yongtuo Liu, Sucheng Ren, Shengfeng He

Research Collection School Of Computing and Information Systems

Crowd image is arguably one of the most laborious data to annotate. In this paper, we devote to reduce the massive demand of densely labeled crowd data, and propose a novel weakly-supervised setting, in which we leverage the binary ranking of two images with highcontrast crowd counts as training guidance. To enable training under this new setting, we convert the crowd count regression problem to a ranking potential prediction problem. In particular, we tailor a Siamese Ranking Network that predicts the potential scores of two images indicating the ordering of the counts. Hence, the ultimate goal is to assign appropriate …


Predicting Viral Rumors And Vulnerable Users With Graph-Based Neural Multi-Task Learning For Infodemic Surveillance, Xuan Zhang, Wei Gao Jan 2024

Predicting Viral Rumors And Vulnerable Users With Graph-Based Neural Multi-Task Learning For Infodemic Surveillance, Xuan Zhang, Wei Gao

Research Collection School Of Computing and Information Systems

In the age of the infodemic, it is crucial to have tools for effectively monitoring the spread of rampant rumors that can quickly go viral, as well as identifying vulnerable users who may be more susceptible to spreading such misinformation. This proactive approach allows for timely preventive measures to be taken, mitigating the negative impact of false information on society. We propose a novel approach to predict viral rumors and vulnerable users using a unified graph neural network model. We pre-train network-based user embeddings and leverage a cross-attention mechanism between users and posts, together with a community-enhanced vulnerability propagation (CVP) …


Learning An Interpretable Stylized Subspace For 3d-Aware Animatable Artforms, Chenxi Zheng, Bangzhen Liu, Xuemiao Xu, Huaidong Zhang, Shengfeng He Jan 2024

Learning An Interpretable Stylized Subspace For 3d-Aware Animatable Artforms, Chenxi Zheng, Bangzhen Liu, Xuemiao Xu, Huaidong Zhang, Shengfeng He

Research Collection School Of Computing and Information Systems

Throughout history, static paintings have captivated viewers within display frames, yet the possibility of making these masterpieces vividly interactive remains intriguing. This research paper introduces 3DArtmator, a novel approach that aims to represent artforms in a highly interpretable stylized space, enabling 3D-aware animatable reconstruction and editing. Our rationale is to transfer the interpretability and 3D controllability of the latent space in a 3D-aware GAN to a stylized sub-space of a customized GAN, revitalizing the original artforms. To this end, the proposed two-stage optimization framework of 3DArtmator begins with discovering an anchor in the original latent space that accurately mimics the …


Mermaid: A Dataset And Framework For Multimodal Meme Semantic Understanding, Shaun Toh, Adriel Kuek, Wen Haw Chong, Roy Ka Wei Lee Dec 2023

Mermaid: A Dataset And Framework For Multimodal Meme Semantic Understanding, Shaun Toh, Adriel Kuek, Wen Haw Chong, Roy Ka Wei Lee

Research Collection School Of Computing and Information Systems

Memes are widely used to convey cultural and societal issues and have a significant impact on public opinion. However, little work has been done on understanding and explaining the semantics expressed in multimodal memes. To fill this research gap, we introduce MERMAID, a dataset consisting of 3,633 memes annotated with their entities and relations, and propose a novel MERF pipeline that extracts entities and their relationships in memes. Our framework combines state-of-the-art techniques from natural language processing and computer vision to extract text and image features and infer relationships between entities in memes. We evaluate the proposed framework on a …


Self-Supervised Pseudo Multi-Class Pre-Training For Unsupervised Anomaly Detection And Segmentation In Medical Images, Yu Tian, Fengbei Liu, Guansong Pang, Yuanhong Chen, Yuyuan Liu, Johan W. Verjans, Rajvinder Singh, Gustavo Carneiro Dec 2023

Self-Supervised Pseudo Multi-Class Pre-Training For Unsupervised Anomaly Detection And Segmentation In Medical Images, Yu Tian, Fengbei Liu, Guansong Pang, Yuanhong Chen, Yuyuan Liu, Johan W. Verjans, Rajvinder Singh, Gustavo Carneiro

Research Collection School Of Computing and Information Systems

Unsupervised anomaly detection (UAD) methods are trained with normal (or healthy) images only, but during testing, they are able to classify normal and abnormal (or disease) images. UAD is an important medical image analysis (MIA) method to be applied in disease screening problems because the training sets available for those problems usually contain only normal images. However, the exclusive reliance on normal images may result in the learning of ineffective low-dimensional image representations that are not sensitive enough to detect and segment unseen abnormal lesions of varying size, appearance, and shape. Pre-training UAD methods with self-supervised learning, based on computer …


Mrim: Lightweight Saliency-Based Mixed-Resolution Imaging For Low-Power Pervasive Vision, Jiyan Wu, Vithurson Subasharan, Minh Anh Tuan Tran, Kasun Pramuditha Gamlath, Archan Misra Dec 2023

Mrim: Lightweight Saliency-Based Mixed-Resolution Imaging For Low-Power Pervasive Vision, Jiyan Wu, Vithurson Subasharan, Minh Anh Tuan Tran, Kasun Pramuditha Gamlath, Archan Misra

Research Collection School Of Computing and Information Systems

While many pervasive computing applications increasingly utilize real-time context extracted from a vision sensing infrastructure, the high energy overhead of DNN-based vision sensing pipelines remains a challenge for sustainable in-the-wild deployment. One common approach to reducing such energy overheads is the capture and transmission of lower-resolution images to an edge node (where the DNN inferencing task is executed), but this results in an accuracy-vs-energy tradeoff, as the DNN inference accuracy typically degrades with a drop in resolution. In this work, we introduce MRIM, a simple but effective framework to tackle this tradeoff. Under MRIM, the vision sensor platform first executes …


Graph Contrastive Learning With Stable And Scalable Spectral Encoding, Deyu Bo, Yuan Fang, Yang Liu, Chuan Shi Dec 2023

Graph Contrastive Learning With Stable And Scalable Spectral Encoding, Deyu Bo, Yuan Fang, Yang Liu, Chuan Shi

Research Collection School Of Computing and Information Systems

Graph contrastive learning (GCL) aims to learn representations by capturing the agreements between different graph views. Traditional GCL methods generate views in the spatial domain, but it has been recently discovered that the spectral domain also plays a vital role in complementing spatial views. However, existing spectral-based graph views either ignore the eigenvectors that encode valuable positional information, or suffer from high complexity when trying to address the instability of spectral features. To tackle these challenges, we first design an informative, stable, and scalable spectral encoder, termed EigenMLP, to learn effective representations from the spectral features. Theoretically, EigenMLP is invariant …


Video Sentiment Analysis For Child Safety, Yee Sen Tan, Nicole Anne Huiying Teo, Ezekiel En Zhe Ghe, Jolie Zhi Yi Fong, Zhaoxia Wang Dec 2023

Video Sentiment Analysis For Child Safety, Yee Sen Tan, Nicole Anne Huiying Teo, Ezekiel En Zhe Ghe, Jolie Zhi Yi Fong, Zhaoxia Wang

Research Collection School Of Computing and Information Systems

The proliferation of online video content underscores the critical need for effective sentiment analysis, particularly in safeguarding children from potentially harmful material. This research addresses this concern by presenting a multimodal analysis method for assessing video sentiment, categorizing it as either positive (child-friendly) or negative (potentially harmful). This method leverages three key components: text analysis, facial expression analysis, and audio analysis, including music mood analysis, resulting in a comprehensive sentiment assessment. Our evaluation results validate the effectiveness of this approach, making significant contributions to the field of video sentiment analysis and bolstering child safety measures. This research serves as a …


Pro-Cap: Leveraging A Frozen Vision-Language Model For Hateful Meme Detection, Rui Cao, Ming Shan Hee, Adriel Kuek, Wen Haw Chong, Roy Ka-Wei Lee, Jing Jiang Nov 2023

Pro-Cap: Leveraging A Frozen Vision-Language Model For Hateful Meme Detection, Rui Cao, Ming Shan Hee, Adriel Kuek, Wen Haw Chong, Roy Ka-Wei Lee, Jing Jiang

Research Collection School Of Computing and Information Systems

Hateful meme detection is a challenging multimodal task that requires comprehension of both vision and language, as well as cross-modal interactions. Recent studies have tried to fine-tune pre-trained vision-language models (PVLMs) for this task. However, with increasing model sizes, it becomes important to leverage powerful PVLMs more efficiently, rather than simply fine-tuning them. Recently, researchers have attempted to convert meme images into textual captions and prompt language models for predictions. This approach has shown good performance but suffers from non-informative image captions. Considering the two factors mentioned above, we propose a probing-based captioning approach to leverage PVLMs in a zero-shot …