Open Access. Powered by Scholars. Published by Universities.®
Databases and Information Systems Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Databases and Information Systems
Shunted Self-Attention Via Multi-Scale Token Aggregation, Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
Shunted Self-Attention Via Multi-Scale Token Aggregation, Sucheng Ren, Daquan Zhou, Shengfeng He, Jiashi Feng, Xinchao Wang
Research Collection School Of Computing and Information Systems
Recent Vision Transformer (ViT) models have demonstrated encouraging results across various computer vision tasks, thanks to its competence in modeling long-range dependencies of image patches or tokens via self-attention. These models, however, usually designate the similar receptive fields of each token feature within each layer. Such a constraint inevitably limits the ability of each self-attention layer in capturing multi-scale features, thereby leading to performance degradation in handling images with multiple objects of different scales. To address this issue, we propose a novel and generic strategy, termed shunted selfattention (SSA), that allows ViTs to model the attentions at hybrid scales per …