Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Artificial Intelligence and Robotics

PDF

Research Collection School Of Computing and Information Systems

2022

Vision transformer

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Wave-Vit: Unifying Wavelet And Transformers For Visual Representation Learning, Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei Oct 2022

Wave-Vit: Unifying Wavelet And Transformers For Visual Representation Learning, Ting Yao, Yingwei Pan, Yehao Li, Chong-Wah Ngo, Tao Mei

Research Collection School Of Computing and Information Systems

Multi-scale Vision Transformer (ViT) has emerged as a powerful backbone for computer vision tasks, while the self-attention computation in Transformer scales quadratically w.r.t. the input patch number. Thus, existing solutions commonly employ down-sampling operations (e.g., average pooling) over keys/values to dramatically reduce the computational cost. In this work, we argue that such over-aggressive down-sampling design is not invertible and inevitably causes information dropping especially for high-frequency components in objects (e.g., texture details). Motivated by the wavelet theory, we construct a new Wavelet Vision Transformer (Wave-ViT) that formulates the invertible down-sampling with wavelet transforms and self-attention learning in a unified way. …


Contrastive Transformer-Based Multiple Instance Learning For Weakly Supervised Polyp Frame Detection, Tian Yu, Guansong Pang, Fengbei Liu, Yuyuan Liu, Chong Wang, Yuanhong Chen, Johan Verjans, Gustavo Carneiro Sep 2022

Contrastive Transformer-Based Multiple Instance Learning For Weakly Supervised Polyp Frame Detection, Tian Yu, Guansong Pang, Fengbei Liu, Yuyuan Liu, Chong Wang, Yuanhong Chen, Johan Verjans, Gustavo Carneiro

Research Collection School Of Computing and Information Systems

Current polyp detection methods from colonoscopy videos use exclusively normal (i.e., healthy) training images, which i) ignore the importance of temporal information in consecutive video frames, and ii) lack knowledge about the polyps. Consequently, they often have high detection errors, especially on challenging polyp cases (e.g., small, flat, or partially visible polyps). In this work, we formulate polyp detection as a weakly-supervised anomaly detection task that uses video-level labelled training data to detect frame-level polyps. In particular, we propose a novel convolutional transformer-based multiple instance learning method designed to identify abnormal frames (i.e., frames with polyps) from anomalous videos (i.e., …