Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems Commons

Open Access. Powered by Scholars. Published by Universities.®

Information Security

PDF

Research Collection School Of Computing and Information Systems

Series

2022

Efficient learning and inference

Articles 1 - 1 of 1

Full-Text Articles in Databases and Information Systems

Shunted Self-Attention Via Multi-Scale Token Aggregation, Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang Jun 2022

Shunted Self-Attention Via Multi-Scale Token Aggregation, Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang

Research Collection School Of Computing and Information Systems

Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range dependencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields of each token feature within each layer. Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted selfattention (SSA), that allows ViTs to model the attentions at hybrid scales per …