Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 13 of 13

Full-Text Articles in Entire DC Network

Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin Aug 2023

Ifseg: Image-Free Semantic Segmentation Via Vision-Language Model, Sukmin Yun, Seong Hyeon Park, Paul Hongsuck Seo, Jinwoo Shin

Machine Learning Faculty Publications

Vision-language (VL) pre-training has recently gained much attention for its transferability and flexibility in novel concepts (e.g., cross-modality transfer) across various visual tasks. However, VL-driven segmentation has been under-explored, and the existing approaches still have the burden of acquiring additional training images or even segmentation annotations to adapt a VL model to downstream segmentation tasks. In this paper, we introduce a novel image-free segmentation task where the goal is to perform semantic segmentation given only a set of the target semantic categories, but without any task-specific images and annotations. To tackle this challenging task, our proposed method, coined IFSeg, generates …


Team Thesyllogist At Semeval-2023 Task 3: Language-Agnostic Framing Detection In Multi-Lingual Online News: A Zero-Shot Transfer Approach, Osama Mohammed Afzal, Preslav Nakov Jul 2023

Team Thesyllogist At Semeval-2023 Task 3: Language-Agnostic Framing Detection In Multi-Lingual Online News: A Zero-Shot Transfer Approach, Osama Mohammed Afzal, Preslav Nakov

Natural Language Processing Faculty Publications

We describe our system for SemEval-2022 Task 3 subtask 2 which on detecting the frames used in a news article in a multi-lingual setup. We propose a multi-lingual approach based on machine translation of the input, followed by an English prediction model. Our system demonstrated good zero-shot transfer capability, achieving micro-F1 scores of 53% for Greek (4th on the leaderboard) and 56.1% for Georgian (3rd on the leaderboard), without any prior training on translated data for these languages. Moreover, our system achieved comparable performance on seven other languages, including German, English, French, Russian, Italian, Polish, and Spanish. Our results demonstrate …


Understanding Masked Autoencoders Via Hierarchical Latent Variable Models, Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis Philippe Morency, Kun Zhang Jun 2023

Understanding Masked Autoencoders Via Hierarchical Latent Variable Models, Lingjing Kong, Martin Q. Ma, Guangyi Chen, Eric P. Xing, Yuejie Chi, Louis Philippe Morency, Kun Zhang

Machine Learning Faculty Publications

Masked autoencoder (MAE), a simple and effective self-supervised learning framework based on the reconstruction of masked image regions, has recently achieved prominent success in a variety of vision tasks. Despite the emergence of intriguing empirical observations on MAE, a theoretically principled understanding is still lacking. In this work, we formally characterize and justify existing empirical in-sights and provide theoretical guarantees of MAE. We formulate the underlying data-generating process as a hierarchical latent variable model, and show that under reasonable assumptions, MAE provably identifies a set of latent variables in the hierarchical model, explaining why MAE can extract high-level information from …


Gconet+: A Stronger Group Collaborative Co-Salient Object Detector, Peng Zheng, Huazhu Fu, Deng Ping Fan, Qi Fan, Jie Qin, Yu Wing Tai, Chi Keung Tang, Luc Van Gool Apr 2023

Gconet+: A Stronger Group Collaborative Co-Salient Object Detector, Peng Zheng, Huazhu Fu, Deng Ping Fan, Qi Fan, Jie Qin, Yu Wing Tai, Chi Keung Tang, Luc Van Gool

Machine Learning Faculty Publications

In this paper, we present a novel end-to-end group collaborative learning network, termed GCoNet+, which can effectively and efficiently (250 fps) identify co-salient objects in natural scenes. The proposed GCoNet+ achieves the new state-of-the-art performance for co-salient object detection (CoSOD) through mining consensus representations based on the following two essential criteria: 1) intra-group compactness to better formulate the consistency among co-salient objects by capturing their inherent shared attributes using our novel group affinity module (GAM); 2) inter-group separability to effectively suppress the influence of noisy objects on the output by introducing our new group collaborating module (GCM) conditioning on the …


Intermediate Prototype Mining Transformer For Few-Shot Semantic Segmentation, Yuanwei Liu, Nian Liu, Xiwen Yao, Junwei Han Dec 2022

Intermediate Prototype Mining Transformer For Few-Shot Semantic Segmentation, Yuanwei Liu, Nian Liu, Xiwen Yao, Junwei Han

Computer Vision Faculty Publications

Few-shot semantic segmentation aims to segment the target objects in query under the condition of a few annotated support images. Most previous works strive to mine more effective category information from the support to match with the corresponding objects in query. However, they all ignored the category information gap between query and support images. If the objects in them show large intra-class diversity, forcibly migrating the category information from the support to the query is ineffective. To solve this problem, we are the first to introduce an intermediate prototype for mining both deterministic category information from the support and adaptive …


Face Pyramid Vision Transformer, Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood Nov 2022

Face Pyramid Vision Transformer, Khawar Islam, Muhammad Zaigham Zaheer, Arif Mahmood

Computer Vision Faculty Publications

A novel Face Pyramid Vision Transformer (FPVT) is proposed to learn a discriminative multi-scale facial representations for face recognition and verification. In FPVT, Face Spatial Reduction Attention (FSRA) and Dimensionality Reduction (FDR) layers are employed to make the feature maps compact, thus reducing the computations. An Improved Patch Embedding (IPE) algorithm is proposed to exploit the benefits of CNNs in ViTs (e.g., shared weights, local context, and receptive fields) to model lower-level edges to higher-level semantic primitives. Within FPVT framework, a Convolutional Feed-Forward Network (CFFN) is proposed that extracts locality information to learn low level facial information. The proposed FPVT …


Transresnet: Integrating The Strengths Of Vits And Cnns For High Resolution Medical Image Segmentation Via Feature Grafting, Muhammad Hamza Sharif, Dmitry Demidov, Asif Hanif, Mohammad Yaqub, Min Xu Nov 2022

Transresnet: Integrating The Strengths Of Vits And Cnns For High Resolution Medical Image Segmentation Via Feature Grafting, Muhammad Hamza Sharif, Dmitry Demidov, Asif Hanif, Mohammad Yaqub, Min Xu

Computer Vision Faculty Publications

High-resolution images are preferable in medical imaging domain as they significantly improve the diagnostic capability of the underlying method. In particular, high resolution helps substantially in improving automatic image segmentation. However, most of the existing deep learning-based techniques for medical image segmentation are optimized for input images having small spatial dimensions and perform poorly on high-resolution images. To address this shortcoming, we propose a parallel-in-branch architecture called TransResNet, which incorporates Transformer and CNN in a parallel manner to extract features from multi-resolution images independently. In TransResNet, we introduce Cross Grafting Module (CGM), which generates the grafted features, enriched in both …


Cmr3d: Contextualized Multi-Stage Refinement For 3d Object Detection, Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Anwer, Hisham Cholakkal Sep 2022

Cmr3d: Contextualized Multi-Stage Refinement For 3d Object Detection, Dhanalaxmi Gaddam, Jean Lahoud, Fahad Shahbaz Khan, Rao Anwer, Hisham Cholakkal

Computer Vision Faculty Publications

Existing deep learning-based 3D object detectors typically rely on the appearance of individual objects and do not explicitly pay attention to the rich contextual information of the scene. In this work, we propose Contextualized Multi-Stage Refinement for 3D Object Detection (CMR3D) framework, which takes a 3D scene as input and strives to explicitly integrate useful contextual information of the scene at multiple levels to predict a set of object bounding-boxes along with their corresponding semantic labels. To this end, we propose to utilize a context enhancement network that captures the contextual information at different levels of granularity followed by a …


Dynamic Prototype Convolution Network For Few-Shot Semantic Segmentation, Jie Liu, Yanqi Bao, Guo-Sen Xie, Huan Xiong, Jan-Jakob Sonke, Efstratios Gavves Jun 2022

Dynamic Prototype Convolution Network For Few-Shot Semantic Segmentation, Jie Liu, Yanqi Bao, Guo-Sen Xie, Huan Xiong, Jan-Jakob Sonke, Efstratios Gavves

Machine Learning Faculty Publications

The key challenge for few-shot semantic segmentation (FSS) is how to tailor a desirable interaction among sup-port and query features and/or their prototypes, under the episodic training scenario. Most existing FSS methods im-plement such support/query interactions by solely leveraging plain operations - e.g., cosine similarity and feature concatenation - for segmenting the query objects. How-ever, these interaction approaches usually cannot well capture the intrinsic object details in the query images that are widely encountered in FSS, e.g., if the query object to be segmented has holes and slots, inaccurate segmentation al-most always happens. To this end, we propose a dynamic …


Cocoa: Context-Conditional Adaptation For Recognizing Unseen Classes In Unseen Domains, Puneet Mangla, Shivam Chandhok, Vineeth N. Balasubramanian, Fahad Shahbaz Khan Feb 2022

Cocoa: Context-Conditional Adaptation For Recognizing Unseen Classes In Unseen Domains, Puneet Mangla, Shivam Chandhok, Vineeth N. Balasubramanian, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Recent progress towards designing models that can generalize to unseen domains (i.e domain generalization) or unseen classes (i.e zero-shot learning) has embarked interest towards building models that can tackle both domain-shift and semantic shift simultaneously (i.e zero-shot domain generalization). For models to generalize to unseen classes in unseen domains, it is crucial to learn feature representation that preserves class-level (domain-invariant) as well as domain-specific information. Motivated from the success of generative zero-shot approaches, we propose a feature generative framework integrated with a COntext COnditional Adaptive (COCOA) Batch-Normalization layer to seamlessly integrate class-level semantic and domain-specific information. The generated visual features …


Transformnet: Self-Supervised Representation Learning Through Predicting Geometric Transformations, Hashim Sayed, Muhammad Ali Feb 2022

Transformnet: Self-Supervised Representation Learning Through Predicting Geometric Transformations, Hashim Sayed, Muhammad Ali

Student Publications

Deep neural networks need a big amount of training data, while in the real world there is a scarcity of data available for training purposes. To resolve this issue unsupervised methods are used for training with limited data. In this report, we describe the unsupervised semantic feature learning approach for recognition of the geometric transformation applied to the input data. The basic concept of our approach is that if someone is unaware of the objects in the images, he/she would not be able to quantitatively predict the geometric transformation that was applied to them. This self supervised scheme is based …


Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang Nov 2021

Multi-Modal Transformers Excel At Class-Agnostic Object Detection, Muhammad Maaz, Hanoona Bangalath Rasheed, Salman Hameed Khan, Fahad Shahbaz Khan, Rao Muhammad Anwer, Ming-Hsuan Yang

Computer Vision Faculty Publications

What constitutes an object? This has been a longstanding question in computer vision. Towards this goal, numerous learning-free and learning-based approaches have been developed to score objectness. However, they generally do not scale well across new domains and for unseen objects. In this paper, we advocate that existing methods lack a top-down supervision signal governed by human-understandable semantics. To bridge this gap, we explore recent Multi-modal Vision Transformers (MViT) that have been trained with aligned image-text pairs. Our extensive experiments across various domains and novel objects show the state-of-the-art performance of MViTs to localize generic objects in images. Based on …


Human Parsing Based Texture Transfer From Single Image To 3d Human Via Cross-View Consistency, Fang Zhao, Shengcai Liao, Kaihao Zhang, Ling Shao Dec 2020

Human Parsing Based Texture Transfer From Single Image To 3d Human Via Cross-View Consistency, Fang Zhao, Shengcai Liao, Kaihao Zhang, Ling Shao

Machine Learning Faculty Publications

This paper proposes a human parsing based texture transfer model via cross-view consistency learning to generate the texture of 3D human body from a single image. We use the semantic parsing of human body as input for providing both the shape and pose information to reduce the appearance variation of human image and preserve the spatial distribution of semantic parts. Meanwhile, in order to improve the prediction for textures of invisible parts, we explicitly enforce the consistency across different views of the same subject by exchanging the textures predicted by two views to render images during training. The perceptual loss …