Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Physical Sciences and Mathematics

Spatio-Temporal Relation Modeling For Few-Shot Action Recognition, Anirudh Thatipelli, Sanath Narayan, Salman Hameed Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem Dec 2021

Spatio-Temporal Relation Modeling For Few-Shot Action Recognition, Anirudh Thatipelli, Sanath Narayan, Salman Hameed Khan, Rao Muhammad Anwer, Fahad Shahbaz Khan, Bernard Ghanem

Computer Vision Faculty Publications

We propose a novel few-shot action recognition framework, STRM, which enhances class-specific feature discriminability while simultaneously learning higher-order temporal representations. The focus of our approach is a novel spatio-temporal enrichment module that aggregates spatial and temporal contexts with dedicated local patch-level and global frame-level feature enrichment sub-modules. Local patch-level enrichment captures the appearance-based characteristics of actions. On the other hand, global framelevel enrichment explicitly encodes the broad temporal context, thereby capturing the relevant object features over time. The resulting spatio-temporally enriched representations are then utilized to learn the relational matching between query and support action sub-sequences. We further introduce a …


Ow-Detr: Open-World Detection Transformer, Akshita Gupta, Sanath Narayan, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah Dec 2021

Ow-Detr: Open-World Detection Transformer, Akshita Gupta, Sanath Narayan, K.J. Joseph, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

Open-world object detection (OWOD) is a challenging computer vision problem, where the task is to detect a known set of object categories while simultaneously identifying unknown objects. Additionally, the model must incrementally learn new classes that become known in the next training episodes. Distinct from standard object detection, the OWOD setting poses significant challenges for generating quality candidate proposals on potentially unknown objects, separating the unknown objects from the background and detecting diverse unknown objects. Here, we introduce a novel end-to-end transformer-based framework, OW-DETR, for open-world object detection. The proposed OW-DETR comprises three dedicated components namely, attention-driven pseudo-labeling, novelty classification …


Deep Learning Predicts Ebv Status In Gastric Cancer Based On Spatial Patterns Of Lymphocyte Infiltration, Baoyi Zhang, Kevin Yao, Min Xu, Jia Wu, Chao Cheng Nov 2021

Deep Learning Predicts Ebv Status In Gastric Cancer Based On Spatial Patterns Of Lymphocyte Infiltration, Baoyi Zhang, Kevin Yao, Min Xu, Jia Wu, Chao Cheng

Computer Vision Faculty Publications

EBV infection occurs in around 10% of gastric cancer cases and represents a distinct subtype, characterized by a unique mutation profile, hypermethylation, and overexpression of PD-L1. Moreover, EBV positive gastric cancer tends to have higher immune infiltration and a better prognosis. EBV infection status in gastric cancer is most commonly determined using PCR and in situ hybridization, but such a method requires good nucleic acid preservation. Detection of EBV status with histopathology images may complement PCR and in situ hybridization as a first step of EBV infection assessment. Here, we developed a deep learning-based algorithm to directly predict EBV infection …


Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang Nov 2021

Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

Computer Vision Faculty Publications

What constitutes an object? This has been a longstanding question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on …


Restormer: Efficient Transformer For High-Resolution Image Restoration, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang Nov 2021

Restormer: Efficient Transformer For High-Resolution Image Restoration, Syed Waqas Zamir, Aditya Arora, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Ming-Hsuan Yang

Computer Vision Faculty Publications

Since convolutional neural networks (CNNs) perform well at learning generalizable image priors from large-scale data, these models have been extensively applied to image restoration and related tasks. Recently, another class of neural architectures, Transformers, have shown significant performance gains on natural language and high-level vision tasks. While the Transformer model mitigates the shortcomings of CNNs (i.e., limited receptive field and in-adaptability to input content), its computational complexity grows quadratically with the spatial resolution, therefore making it infeasible to apply to most image restoration tasks involving high-resolution images. In this work, we propose an efficient Transformer model by making several key …


Tensor Pooling-Driven Instance Segmentation Framework For Baggage Threat Recognition, Taimur Hassan, Samet Akçay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi Sep 2021

Tensor Pooling-Driven Instance Segmentation Framework For Baggage Threat Recognition, Taimur Hassan, Samet Akçay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi

Computer Vision Faculty Publications

Automated systems designed for screening contraband items from the X-ray imagery are still facing difficulties with high clutter, concealment, and extreme occlusion. In this paper, we addressed this challenge using a novel multi-scale contour instance segmentation framework that effectively identifies the cluttered contraband data within the baggage X-ray scans. Unlike standard models that employ region-based or keypoint-based techniques to generate multiple boxes around objects, we propose to derive proposals according to the hierarchy of the regions defined by the contours. The proposed framework is rigorously validated on three public datasets, dubbed GDXray, SIXray, and OPIXray, where it outperforms the state-of-the-art …


Discriminative Region-Based Multi-Label Zero-Shot Learning, Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah Aug 2021

Discriminative Region-Based Multi-Label Zero-Shot Learning, Sanath Narayan, Akshita Gupta, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Mubarak Shah

Computer Vision Faculty Publications

Multi-label zero-shot learning (ZSL) is a more realistic counter-part of standard single-label ZSL since several objects can co-exist in a natural image. However, the occurrence of multiple objects complicates the reasoning and requires region-specific processing of visual features to preserve their contextual cues. We note that the best existing multi-label ZSL method takes a shared approach towards attending to region features with a common set of attention maps for all the classes. Such shared maps lead to diffused attention, which does not discriminatively focus on relevant locations when the number of classes are large. Moreover, mapping spatially-pooled visual features to …


Unsupervised Anomaly Instance Segmentation For Baggage Threat Recognition, Taimur Hassan, Samet Akçay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi Jul 2021

Unsupervised Anomaly Instance Segmentation For Baggage Threat Recognition, Taimur Hassan, Samet Akçay, Mohammed Bennamoun, Salman Khan, Naoufel Werghi

Computer Vision Faculty Publications

Identifying potential threats concealed within the baggage is of prime concern for the security staff. Many researchers have developed frameworks that can automatically detect baggage threats from security X-ray scans. However, to the best of our knowledge, all of these frameworks require extensive training efforts on large-scale and well-annotated datasets, which are hard to procure in the real world, especially for the rarely seen contraband items. This paper presents a novel unsupervised anomaly instance segmentation framework that recognizes baggage threats, in X-ray scans, as anomalies without requiring any ground truth labels. Furthermore, thanks to its stylization capacity, the framework is …


Structured Latent Embeddings For Recognizing Unseen Classes In Unseen Domains, Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N. Balasubramanian, Fahad Shahbaz Khan, Ling Shao Jul 2021

Structured Latent Embeddings For Recognizing Unseen Classes In Unseen Domains, Shivam Chandhok, Sanath Narayan, Hisham Cholakkal, Rao Muhammad Anwer, Vineeth N. Balasubramanian, Fahad Shahbaz Khan, Ling Shao

Computer Vision Faculty Publications

The need to address the scarcity of task-specific annotated data has resulted in concerted efforts in recent years for specific settings such as zero-shot learning (ZSL) and domain generalization (DG), to separately address the issues of semantic shift and domain shift, respectively. However, real-world applications often do not have constrained settings and necessitate handling unseen classes in unseen domains – a setting called Zero-shot Domain Generalization, which presents the issues of domain and semantic shifts simultaneously. In this work, we propose a novel approach that learns domain-agnostic structured latent embeddings by projecting images from different domains as well as class-specific …


P2v-Rcnn: Point To Voxel Feature Learning For 3d Object Detection From Point Clouds, Jiale Li, Yu Sun, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S. Krylov, Yong Ding, Ling Shao Jul 2021

P2v-Rcnn: Point To Voxel Feature Learning For 3d Object Detection From Point Clouds, Jiale Li, Yu Sun, Shujie Luo, Ziqi Zhu, Hang Dai, Andrey S. Krylov, Yong Ding, Ling Shao

Computer Vision Faculty Publications

The most recent 3D object detectors for point clouds rely on the coarse voxel-based representation rather than the accurate point-based representation due to a higher box recall in the voxel-based Region Proposal Network (RPN). However, the detection accuracy is severely restricted by the information loss of pose details in the voxels. Different from considering the point cloud as voxel or point representation only, we propose a point-to-voxel feature learning approach to voxelize the point cloud with both the point-wise semantic and local spatial features, which maintains the voxel-wise features to build the high-recall voxel-based RPN and also provides the accurate …


Edge Detail Analysis Of Wear Particles, Mohammad Shakeel Laghari, Ahmed Hassan, Mubashir Noman Jul 2021

Edge Detail Analysis Of Wear Particles, Mohammad Shakeel Laghari, Ahmed Hassan, Mubashir Noman

Computer Vision Faculty Publications

Tribology is the study of wear particles that are generated in all machines with interacting mechanical parts. Particles are separated from the surfaces due to friction and relative motion. These microscopic particles vary in certain characteristics of size, quantity, composition, and morphology. Wear particles or wear debris are categorized by six morphological attributes of shape, edge details, texture, color, size, and thickness ratio. Particles can be identified with the help of some or all of these attributes however, only edge details analysis is considered in this paper. The objective is to classify these particles in a coherent way based on …


Towards Open World Object Detection, K. J. Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N. Balasubramanian May 2021

Towards Open World Object Detection, K. J. Joseph, Salman Khan, Fahad Shahbaz Khan, Vineeth N. Balasubramanian

Computer Vision Faculty Publications

Humans have a natural instinct to identify unknown object instances in their environments. The intrinsic curiosity about these unknown instances aids in learning about them, when the corresponding knowledge is eventually available. This motivates us to propose a novel computer vision problem called: 'Open World Object Detection', where a model is tasked to: 1) identify objects that have not been introduced to it as 'unknown', without explicit supervision to do so, and 2) incrementally learn these identified unknown categories without forgetting previously learned classes, when the corresponding labels are progressively received. We formulate the problem, introduce a strong evaluation protocol …


Exploring Complementary Strengths Of Invariant And Equivariant Representations For Few-Shot Learning, Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah Apr 2021

Exploring Complementary Strengths Of Invariant And Equivariant Representations For Few-Shot Learning, Mamshad Nayeem Rizve, Salman Khan, Fahad Shahbaz Khan, Mubarak Shah

Computer Vision Faculty Publications

In many real-world problems, collecting a large number of labeled samples is infeasible. Few-shot learning (FSL) is the dominant approach to address this issue, where the objective is to quickly adapt to novel categories in presence of a limited number of samples. FSL tasks have been predominantly solved by leveraging the ideas from gradient-based meta-learning and metric learning approaches. However, recent works have demonstrated the significance of powerful feature representations with a simple embedding network that can outperform existing sophisticated FSL algorithms. In this work, we build on this insight and propose a novel training mechanism that simultaneously enforces equivariance …


Handwriting Transformers, Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak A. Shah Apr 2021

Handwriting Transformers, Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan, Mubarak A. Shah

Computer Vision Faculty Publications

We propose a novel transformer-based styled handwritten text image generation approach, HWT, that strives to learn both style-content entanglement as well as global and local writing style patterns. The proposed HWT captures the long and short range relationships within the style examples through a self-attention mechanism, thereby encoding both global and local style patterns. Further, the proposed transformer-based HWT comprises an encoder-decoder attention that enables style-content entanglement by gathering the style representation of each query character. To the best of our knowledge, we are the first to introduce a transformer-based generative network for styled handwritten text generation. Our proposed HWT …


Learning To Fuse Asymmetric Feature Maps In Siamese Trackers, Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen Mar 2021

Learning To Fuse Asymmetric Feature Maps In Siamese Trackers, Wencheng Han, Xingping Dong, Fahad Shahbaz Khan, Ling Shao, Jianbing Shen

Computer Vision Faculty Publications

Recently, Siamese-based trackers have achieved promising performance in visual tracking. Most recent Siamese-based trackers typically employ a depth-wise cross-correlation (DW-XCorr) to obtain multi-channel correlation information from the two feature maps (target and search region). However, DW-XCorr has several limitations within Siamese-based tracking: it can easily be fooled by distractors, has fewer activated channels and provides weak discrimination of object boundaries. Further, DW-XCorr is a handcrafted parameter-free module and cannot fully benefit from offline learning on large-scale data. We propose a learnable module, called the asymmetric convolution (ACM), which learns to better capture the semantic correlation information in offline training on …


Deep Gaussian Processes For Few-Shot Segmentation, Joakim Johnander, Johan Edstedt, Martin Danelljan, Michael Felsberg, Fahad Shahbaz Khan Mar 2021

Deep Gaussian Processes For Few-Shot Segmentation, Joakim Johnander, Johan Edstedt, Martin Danelljan, Michael Felsberg, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Few-shot segmentation is a challenging task, requiring the extraction of a generalizable representation from only a few annotated samples, in order to segment novel query images. A common approach is to model each class with a single prototype. While conceptually simple, these methods suffer when the target appearance distribution is multi-modal or not linearly separable in feature space. To tackle this issue, we propose a few-shot learner formulation based on Gaussian process (GP) regression. Through the expressivity of the GP, our approach is capable of modeling complex appearance distributions in the deep feature space. The GP provides a principled way …


On Generating Transferable Targeted Perturbations, Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli Mar 2021

On Generating Transferable Targeted Perturbations, Muzammal Naseer, Salman Khan, Munawar Hayat, Fahad Shahbaz Khan, Fatih Porikli

Computer Vision Faculty Publications

While the untargeted black-box transferability of adversarial perturbations has been extensively studied before, changing an unseen model's decisions to a specific 'targeted' class remains a challenging feat. In this paper, we propose a new generative approach for highly transferable targeted perturbations (TTP). We note that the existing methods are less suitable for this task due to their reliance on class-boundary information that changes from one model to another, thus reducing transferability. In contrast, our approach matches the perturbed image 'distribution' with that of the target class, leading to high targeted transferability rates. To this end, we propose a new objective …


Orthogonal Projection Loss, Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan Mar 2021

Orthogonal Projection Loss, Kanchana Ranasinghe, Muzammal Naseer, Munawar Hayat, Salman Khan, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Deep neural networks have achieved remarkable performance on a range of classification tasks, with softmax cross-entropy (CE) loss emerging as the de-facto objective function. The CE loss encourages features of a class to have a higher projection score on the true class-vector compared to the negative classes. However, this is a relative constraint and does not explicitly force different class features to be well-separated. Motivated by the observation that ground-truth class representations in CE loss are orthogonal (one-hot encoded vectors), we develop a novel loss function termed 'Orthogonal Projection Loss' (OPL) which imposes orthogonality in the feature space. OPL augments …


Efficient Cnn Building Blocks For Encrypted Data, Nayna Jain, Karthik Nandakumar, Nalini K. Ratha, Sharath U. Pankanti, Uttam Kumar Jan 2021

Efficient Cnn Building Blocks For Encrypted Data, Nayna Jain, Karthik Nandakumar, Nalini K. Ratha, Sharath U. Pankanti, Uttam Kumar

Computer Vision Faculty Publications

Machine learning on encrypted data can address the concerns related to privacy and legality of sharing sensitive data with untrustworthy service providers, while leveraging their resources to facilitate extraction of valuable insights from otherwise non-shareable data. Fully Homomorphic Encryption (FHE) is a promising technique to enable machine learning and inferencing while providing strict guarantees against information leakage. Since deep convolutional neural networks (CNNs) have become the machine learning tool of choice in several applications, several attempts have been made to harness CNNs to extract insights from encrypted data. However, existing works focus only on ensuring data security and ignore security …


Generative Multi-Label Zero-Shot Learning, Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van De Weijer Jan 2021

Generative Multi-Label Zero-Shot Learning, Akshita Gupta, Sanath Narayan, Salman Khan, Fahad Shahbaz Khan, Ling Shao, Joost Van De Weijer

Computer Vision Faculty Publications

Multi-label zero-shot learning strives to classify images into multiple unseen categories for which no data is available during training. The test samples can additionally contain seen categories in the generalized variant. Existing approaches rely on learning either shared or label-specific attention from the seen classes. Nevertheless, computing reliable attention maps for unseen classes during inference in a multi-label setting is still a challenge. In contrast, state-of-the-art single-label generative adversarial network (GAN) based approaches learn to directly synthesize the class-specific visual features from the corresponding class attribute embeddings. However, synthesizing multi-label features from GANs is still unexplored in the context of …


Low Light Image Enhancement Via Global And Local Context Modeling, Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang Jan 2021

Low Light Image Enhancement Via Global And Local Context Modeling, Aditya Arora, Muhammad Haris, Syed Waqas Zamir, Munawar Hayat, Fahad Shahbaz Khan, Ling Shao, Ming-Hsuan Yang

Computer Vision Faculty Publications

Images captured under low-light conditions manifest poor visibility, lack contrast and color vividness. Compared to conventional approaches, deep convolutional neural networks (CNNs) perform well in enhancing images. However, being solely reliant on confined fixed primitives to model dependencies, existing data-driven deep models do not exploit the contexts at various spatial scales to address low-light image enhancement. These contexts can be crucial towards inferring several image enhancement tasks, e.g., local and global contrast, brightness and color corrections; which requires cues from both local and global spatial extent. To this end, we introduce a context-aware deep network for low-light image enhancement. First, …