Open Access. Powered by Scholars. Published by Universities.®
- Keyword
-
- Computer Vision and Pattern Recognition (cs.CV) (27)
- Computer vision (20)
- Deep learning (20)
- Object detection (14)
- Machine Learning (cs.LG) (12)
-
- Image and Video Processing (eess.IV) (11)
- Convolutional neural networks (7)
- Image segmentation (7)
- Medical imaging (7)
- Artificial Intelligence (cs.AI) (6)
- Digital twin (6)
- Semantics (6)
- Artificial intelligence (5)
- Benchmarking (5)
- Computational modeling (5)
- Convolution (5)
- Large dataset (5)
- Object recognition (5)
- Performance (5)
- Transformers (5)
- COVID-19 (4)
- Computerized tomography (4)
- Deep neural networks (4)
- Training (4)
- Computer vision problems (3)
- Data augmentation (3)
- Diagnosis (3)
- Diseases (3)
- Learn+ (3)
- Learning systems (3)
Articles 1 - 30 of 100
Full-Text Articles in Entire DC Network
Conic Challenge: Pushing The Frontiers Of Nuclear Detection, Segmentation, Classification And Counting, Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, Jinxi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu
Conic Challenge: Pushing The Frontiers Of Nuclear Detection, Segmentation, Classification And Counting, Simon Graham, Quoc Dang Vu, Mostafa Jahanifar, Martin Weigert, Uwe Schmidt, Wenhua Zhang, Jun Zhang, Sen Yang, Jinxi Xiang, Xiyue Wang, Josef Lorenz Rumberger, Elias Baumann, Peter Hirsch, Lihao Liu, Chenyang Hong, Angelica I. Aviles-Rivero, Ayushi Jain, Heeyoung Ahn, Yiyu Hong, Hussam Azzuni, Min Xu
Computer Vision Faculty Publications
Nuclear detection, segmentation and morphometric profiling are essential in helping us further understand the relationship between histology and patient outcome. To drive innovation in this area, we setup a community-wide challenge using the largest available dataset of its kind to assess nuclear segmentation and cellular composition. Our challenge, named CoNIC, stimulated the development of reproducible algorithms for cellular recognition with real-time result inspection on public leaderboards. We conducted an extensive post-challenge analysis based on the top-performing models using 1,658 whole-slide images of colon tissue. With around 700 million detected nuclei per model, associated features were used for dysplasia grading and …
Mgmt Promoter Methylation Status Prediction Using Mri Scans? An Extensive Experimental Evaluation Of Deep Learning Models, Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub
Mgmt Promoter Methylation Status Prediction Using Mri Scans? An Extensive Experimental Evaluation Of Deep Learning Models, Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub
Computer Vision Faculty Publications
The number of studies on deep learning for medical diagnosis is expanding, and these systems are often claimed to outperform clinicians. However, only a few systems have shown medical efficacy. From this perspective, we examine a wide range of deep learning algorithms for the assessment of glioblastoma - a common brain tumor in older adults that is lethal. Surgery, chemotherapy, and radiation are the standard treatments for glioblastoma patients. The methylation status of the MGMT promoter, a specific genetic sequence found in the tumor, affects chemotherapy's effectiveness. MGMT promoter methylation improves chemotherapy response and survival in several cancers. MGMT promoter …
Hybrid Flexible (Hyflex) Learning Space Design And Implementation At Graduate Level: An Iterative Process, David Santandreu Calonge, Mark Thompson, Leisa Hassock, Mohammad Yaqub
Hybrid Flexible (Hyflex) Learning Space Design And Implementation At Graduate Level: An Iterative Process, David Santandreu Calonge, Mark Thompson, Leisa Hassock, Mohammad Yaqub
Computer Vision Faculty Publications
This paper investigates the process of designing HyFlex classrooms at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), a graduate-level research university located in Abu Dhabi, United Arab Emirates, underpinned by the application of the EDUCAUSE Learning Space Rating System (LSRS). This investigation takes the form of a case-study and specifically focuses on the rationale, planning, design, and technology behind the implementation of the flexible HyFlex spaces as deployed in several classroom environments at MBZUAI. Iterations’ performance was assessed with the LSRS—V3. The findings should make an important contribution to the field of HyFlex learning spaces and technology-enhanced classroom design …
Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir
Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir
Computer Vision Faculty Publications
As urbanization increases, streetlights have become significant consumers of electrical power, making it imperative to develop effective control methods for sustainability. This paper offers a comprehensive review on control methods of smart streetlight systems, setting itself apart by introducing a novel light scheme framework that provides a structured classification of various light control patterns, thus filling an existing gap in the literature. Unlike previous studies, this work dives into the technical specifics of individual research papers and methodologies, ranging from basic to advanced control methods like computer vision and deep learning, while also assessing the energy consumption associated with each …
Metaverse Key Requirements And Platforms Survey, Akbobek Abilkaiyrkyzy, Ahmed Elhagry, Fedwa Laamarti, Abdulmotaleb El Saddik
Metaverse Key Requirements And Platforms Survey, Akbobek Abilkaiyrkyzy, Ahmed Elhagry, Fedwa Laamarti, Abdulmotaleb El Saddik
Computer Vision Faculty Publications
The growing interest in the metaverse has led to an abundance of platforms, each with its own unique features and limitations. This paper's objective is two-fold. First, we aim at providing an objective analysis of requirements that need to be fulfilled by metaverse platforms. We survey a broad set of criteria including interoperability, immersiveness, persistence, multimodal and social interaction, scalability, level of openness, configurability, market access, security, and blockchain integration, among others. Second, we review a wide range of existing metaverse platforms, and we critically evaluate their ability to meet the requirements listed. We identify their limitations, which must be …
Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik
Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik
Computer Vision Faculty Publications
Recent statistics and studies show that the loss generated by insider threats is much higher than that generated by external attacks. More and more organizations are investing in or purchasing insider threat detection systems to prevent insider risks. However, the accurate and timely detection of insider threats faces significant challenges. In this study, we proposed an intelligent insider threat detection framework based on Digital Twins and self-attentions based deep learning models. First, this paper introduces insider threats and the challenges in detecting them. Then this paper presents recent related works on solving insider threat detection problems and their limitations. Next, …
Burstormer: Burst Image Restoration And Enhancement Transformer, Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming Hsuan Yang
Burstormer: Burst Image Restoration And Enhancement Transformer, Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming Hsuan Yang
Computer Vision Faculty Publications
On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image. However, individual frames in a burst are misaligned due to inevitable motions and contain multiple degradations. The challenge is to properly align the successive image shots and merge their complementary information to achieve high-quality outputs. Towards this direction, we propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement. In comparison to existing works, our approach exploits multi-scale local and non-local features to achieve improved alignment and feature fusion. Our key idea is to enable inter-frame communication …
Clip2protect: Protecting Facial Privacy Using Text-Guided Makeup Via Adversarial Latent Search, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
Clip2protect: Protecting Facial Privacy Using Text-Guided Makeup Via Adversarial Latent Search, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar
Computer Vision Faculty Publications
The success of deep learning based face recognition systems has given rise to serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Existing methods for enhancing privacy fail to generate 'naturalistic' images that can protect facial privacy without compromising user experience. We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low- dimensional manifold of a pretrained generative model. The first step inverts the given face image into the latent space and finetunes the generative model to achieve an accurate reconstruction of the …
Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan
Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan
Computer Vision Faculty Publications
Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make over-confident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and …
3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang
3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang
Computer Vision Faculty Publications
Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we …
Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan
Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan
Computer Vision Faculty Publications
Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation …
Dynamic Graph Enhanced Contrastive Learning For Chest X-Ray Report Generation, Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, Xiaojun Chang
Dynamic Graph Enhanced Contrastive Learning For Chest X-Ray Report Generation, Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, Xiaojun Chang
Computer Vision Faculty Publications
Automatic radiology reporting has great clinical potential to relieve radiologists from heavy workloads and improve diagnosis interpretation. Recently, researchers have enhanced data-driven neural networks with medical knowledge graphs to eliminate the severe visual and textual bias in this task. The structures of such graphs are exploited by using the clinical dependencies formed by the disease topic tags via general knowledge and usually do not update during the training process. Consequently, the fixed graphs can not guarantee the most appropriate scope of knowledge and limit the effectiveness. To address the limitation, we propose a knowledge graph with Dynamic structure and nodes …
3d Semantic Segmentation In The Wild: Learning Generalized Models For Adverse-Condition Point Clouds, Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing
3d Semantic Segmentation In The Wild: Learning Generalized Models For Adverse-Condition Point Clouds, Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing
Computer Vision Faculty Publications
Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal …
Vision Language Navigation With Knowledge-Driven Environmental Dreamer, Fengda Zhu, Vincent C.S. Lee, Xiaojun Chang, Xiaodan Liang
Vision Language Navigation With Knowledge-Driven Environmental Dreamer, Fengda Zhu, Vincent C.S. Lee, Xiaojun Chang, Xiaodan Liang
Computer Vision Faculty Publications
Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a …
Prompt-Based Tuning Of Transformer Models For Multi-Center Medical Image Segmentation Of Head And Neck Cancer, Numan Saeed, Muhammad Ridzuan, Roba Al Majzoub, Mohammad Yaqub
Prompt-Based Tuning Of Transformer Models For Multi-Center Medical Image Segmentation Of Head And Neck Cancer, Numan Saeed, Muhammad Ridzuan, Roba Al Majzoub, Mohammad Yaqub
Computer Vision Faculty Publications
Medical image segmentation is a vital healthcare endeavor requiring precise and efficient models for appropriate diagnosis and treatment. Vision transformer (ViT)-based segmentation models have shown great performance in accomplishing this task. However, to build a powerful backbone, the self-attention block of ViT requires large-scale pre-training data. The present method of modifying pre-trained models entails updating all or some of the backbone parameters. This paper proposes a novel fine-tuning strategy for adapting a pretrained transformer-based segmentation model on data from a new medical center. This method introduces a small number of learnable parameters, termed prompts, into the input space (less than …
Towards Enabling Haptic Communications Over 6g: Issues And Challenges, Muhammad Awais, Fasih Ullah Khan, Muhammad Zafar, Muhammad Mudassar, Muhammad Zaigham Zaheer, Khalid Mehmood Cheema, Muhammad Kamran, Woo Sung Jung
Towards Enabling Haptic Communications Over 6g: Issues And Challenges, Muhammad Awais, Fasih Ullah Khan, Muhammad Zafar, Muhammad Mudassar, Muhammad Zaigham Zaheer, Khalid Mehmood Cheema, Muhammad Kamran, Woo Sung Jung
Computer Vision Faculty Publications
This research paper provides a comprehensive overview of the challenges and potential solutions related to enabling haptic communication over the Tactile Internet in the context of 6G networks. The increasing demand for multimedia services and device proliferation has resulted in limited radio resources, posing challenges in their efficient allocation for Device-to-Device (D2D)-assisted haptic communications. Achieving ultra-low latency, security, and energy efficiency are crucial requirements for enabling haptic communication over TI. The paper explores various methodologies, technologies, and frameworks that can facilitate haptic communication, including backscatter communications (BsC), non-orthogonal multiple access (NOMA), and software-defined networks. Additionally, it discusses the potential of …
Class-Independent Regularization For Learning With Noisy Labels, Rumeng Yi, Dayan Guan, Yaping Huang, Shijian Lu
Class-Independent Regularization For Learning With Noisy Labels, Rumeng Yi, Dayan Guan, Yaping Huang, Shijian Lu
Computer Vision Faculty Publications
Training deep neural networks (DNNs) with noisy labels often leads to poorly generalized models as DNNs tend to memorize the noisy labels in training. Various strategies have been developed for improving sample selection precision and mitigating the noisy label memorization issue. However, most existing works adopt a class-dependent softmax classifier that is vulnerable to noisy labels by entangling the classification of multi-class features. This paper presents a class-independent regularization (CIR) method that can effectively alleviate the negative impact of noisy labels in DNN training. CIR regularizes the class-dependent softmax classifier by introducing multi-binary classifiers each of which takes care of …
Graphprompt: Graph-Based Prompt Templates For Biomedical Synonym Prediction, Hanwen Xu, Jiayou Zhang, Zhirui Wang, Shizhuo Zhang, Megh Bhalerao, Yucong Liu, Dawei Zhu, Sheng Wang
Graphprompt: Graph-Based Prompt Templates For Biomedical Synonym Prediction, Hanwen Xu, Jiayou Zhang, Zhirui Wang, Shizhuo Zhang, Megh Bhalerao, Yucong Liu, Dawei Zhu, Sheng Wang
Computer Vision Faculty Publications
In the expansion of biomedical dataset, the same category may be labeled with different terms, thus being tedious and onerous to curate these terms. Therefore, automatically mapping synonymous terms onto the ontologies is desirable, which we name as biomedical synonym prediction task. Unlike biomedical concept normalization (BCN), no clues from context can be used to enhance synonym prediction, making it essential to extract graph features from ontology. We introduce an expert-curated dataset OBO-syn encompassing 70 different types of concepts and 2 million curated concept-term pairs for evaluating synonym prediction methods. We find BCN methods perform weakly on this task for …
Person Image Synthesis Via Denoising Diffusion Model, Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Person Image Synthesis Via Denoising Diffusion Model, Ankan Kumar Bhunia, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Jorma Laaksonen, Mubarak Shah, Fahad Shahbaz Khan
Computer Vision Faculty Publications
The pose-guided person image generation task requires synthesizing photorealistic images of humans in arbitrary poses. The existing approaches use generative adversarial networks that do not necessarily maintain realistic textures or need dense correspondences that struggle to handle complex deformations and severe occlusions. In this work, we show how denoising diffusion models can be applied for high-fidelity person image synthesis with strong sample diversity and enhanced mode coverage of the learnt data distribution. Our proposed Person Image Diffusion Model (PIDM) disintegrates the complex transfer problem into a series of simpler forward-backward denoising steps. This helps in learning plausible source-to-target transformation trajectories …
Fine-Tuned Clip Models Are Efficient Video Learners, Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Fine-Tuned Clip Models Are Efficient Video Learners, Hanoona Rasheed, Muhammad Uzair Khattak, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Computer Vision Faculty Publications
Large-scale multi-modal training with image-text pairs imparts strong generalization to CLIP model. Since training on a similar scale for videos is infeasible, recent approaches focus on the effective transfer of image-based CLIP to the video domain. In this pursuit, new parametric modules are added to learn temporal information and inter-frame relationships which require meticulous design efforts. Furthermore, when the resulting models are learned on videos, they tend to overfit on the given task distribution and lack in generalization aspect. This begs the following question: How to effectively transfer image-level CLIP representations to videos? In this work, we show that a …
Digital Twin Haptic Robotic Arms: Towards Handshakes In The Metaverse, Mohd Faisal, Fedwa Laamarti, Abdulmotaleb El Saddik
Digital Twin Haptic Robotic Arms: Towards Handshakes In The Metaverse, Mohd Faisal, Fedwa Laamarti, Abdulmotaleb El Saddik
Computer Vision Faculty Publications
More daily interactions are happening in the digital world of the metaverse. Providing individuals with means to perform a handshake during these interactions can enhance the overall user experience. In this paper, we put forward the design and implementation of two right-handed underactuated Digital Twin robotic arms to mediate the physical handshake interaction between two individuals. This allows them to perform a handshake while they are in separate locations. The experimental findings are very promising as our evaluation shows that the participants were highly interested in using our system to shake hands with their loved ones when they are physically …
Suitability Of Sdn And Mec To Facilitate Digital Twin Communication Over Lte-A, Hikmat Adhami, Mohammad Alja'afreh, Mohamed Hoda, Jiaqi Zhao, Yong Zhou, Abdulmotaleb Elsaddik
Suitability Of Sdn And Mec To Facilitate Digital Twin Communication Over Lte-A, Hikmat Adhami, Mohammad Alja'afreh, Mohamed Hoda, Jiaqi Zhao, Yong Zhou, Abdulmotaleb Elsaddik
Computer Vision Faculty Publications
Haptic is the modality that complements traditional multimedia, i.e., audiovisual, to evolve the next wave of innovation at which the Internet data stream can be exchanged to enable remote skills and control applications. This will require ultra-low latency and ultra-high reliability to evolve the mobile experience into the era of Digital Twin and Tactile Internet. While the 5th generation of mobile networks is not yet widely deployed, Long-Term Evolution (LTE-A) latency remains much higher than the 1 ms requirement for the Tactile Internet and therefore the Digital Twin. This work investigates an interesting solution based on the incorporation of Software-defined …
Transformer-Based Feature Fusion Approach For Multimodal Visual Sentiment Recognition Using Tweets In The Wild, Fatimah Alzamzami, Abdulmotaleb El Saddik
Transformer-Based Feature Fusion Approach For Multimodal Visual Sentiment Recognition Using Tweets In The Wild, Fatimah Alzamzami, Abdulmotaleb El Saddik
Computer Vision Faculty Publications
We present an image-based real-time sentiment analysis system that can be used to recognize in-the-wild sentiment expressions on online social networks. The system deploys the newly proposed transformer architecture on online social networks (OSN) big data to extract emotion and sentiment features using three types of images: images containing faces, images containing text, and images containing no faces/text. We build three separate models, one for each type of image, and then fuse all the models to learn the online sentiment behavior. Our proposed methodology combines a supervised two-stage training approach and threshold-moving method, which is crucial for the data imbalance …
Tc-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network, Muhammad Ishaq, Mustaqeem Khan, Soonil Kwon
Tc-Net: A Modest & Lightweight Emotion Recognition System Using Temporal Convolution Network, Muhammad Ishaq, Mustaqeem Khan, Soonil Kwon
Computer Vision Faculty Publications
Speech signals play an essential role in communication and provide an efficient way to exchange information between humans and machines. Speech Emotion Recognition (SER) is one of the critical sources for human evaluation, which is applicable in many real-world applications such as healthcare, call centers, robotics, safety, and virtual reality. This work developed a novel TCN-based emotion recognition system using speech signals through a spatial-temporal convolution network to recognize the speaker's emotional state. The authors designed a Temporal Convolutional Network (TCN) core block to recognize long-term dependencies in speech signals and then feed these temporal cues to a dense network …
Arl-Wavelet-Bpf Optimization Using Pso Algorithm For Bearing Fault Diagnosis, Muhammad Ahsan, Dariusz Bismor, Muhammad Arslan Manzoor
Arl-Wavelet-Bpf Optimization Using Pso Algorithm For Bearing Fault Diagnosis, Muhammad Ahsan, Dariusz Bismor, Muhammad Arslan Manzoor
Computer Vision Faculty Publications
Rotating element bearings are the backbone of every rotating machine. Vibration signals measured from these bearings are used to diagnose the health of the machine, but when the signal-to-noise ratio is low, it is challenging to diagnose the fault frequency. In this paper, a new method is proposed to enhance the signal-to-noise ratio by applying the Asymmetric Real Laplace wavelet Bandpass Filter (ARL-wavelet-BPF). The Gaussian function of the ARL-wavelet represents an excellent BPF with smooth edges which helps to minimize the ripple effects. The bandwidth and center frequency of the ARL-wavelet-BPF are optimized using the Particle Swarm Optimization (PSO) algorithm. …
Digital Twin Of Atmospheric Environment: Sensory Data Fusion For High-Resolution Pm2.5 Estimation And Action Policies Recommendation, Kudaibergen Abutalip, Anas Al-Lahham, Abdulmotaleb Elsaddik
Digital Twin Of Atmospheric Environment: Sensory Data Fusion For High-Resolution Pm2.5 Estimation And Action Policies Recommendation, Kudaibergen Abutalip, Anas Al-Lahham, Abdulmotaleb Elsaddik
Computer Vision Faculty Publications
Particulate matter smaller than 2.5 microns (PM2.5) is one of the main pollutants that has considerable detrimental effects on human health. Estimating its concentration levels with ground monitors is inefficient for several reasons. In this study, we build a digital twin (DT) of an atmospheric environment by fusing remote sensing and observational data. Integral part of DT pipeline is a presence of feedback that can influence future input data. Estimated values of PM2.5 obtained from an ensemble of Random Forest and Gradient Boosting are used to provide recommendations for decreasing the agglomeration levels. A simple optimization problem is formulated for …
Self-Omics: A Self-Supervised Learning Framework For Multi-Omics Cancer Data, Sayed Hashim, Karthik Nandakumar, Mohammad Yaqub
Self-Omics: A Self-Supervised Learning Framework For Multi-Omics Cancer Data, Sayed Hashim, Karthik Nandakumar, Mohammad Yaqub
Computer Vision Faculty Publications
We have gained access to vast amounts of multi-omics data thanks to Next Generation Sequencing. However, it is challenging to analyse this data due to its high dimensionality and much of it not being annotated. Lack of annotated data is a significant problem in machine learning, and Self-Supervised Learning (SSL) methods are typically used to deal with limited labelled data. However, there is a lack of studies that use SSL methods to exploit inter-omics relationships on unlabelled multi-omics data. In this work, we develop a novel and efficient pre-training paradigm that consists of various SSL components, including but not limited …
Digital Twin For Railway: A Comprehensive Survey, Sara Ghaboura, Rahatara Ferdousi, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik
Digital Twin For Railway: A Comprehensive Survey, Sara Ghaboura, Rahatara Ferdousi, Fedwa Laamarti, Chunsheng Yang, Abdulmotaleb El Saddik
Computer Vision Faculty Publications
Digital transformation has been prioritized in the railway industry to bring automation to railway operations. Digital Twin (DT) technology has recently gained attention in the railway industry to fulfill this goal. Contemporary researchers argue that DT can be advantageous in Railway manufacturing logistics to planning and scheduling. Although underlying technologies of DT, e.g., modelling, computer vision, and the Internet of Things, have been studied for various railway industry applications, the DT has been least explored in the context of railways. Thus, in this paper, we aim to understand the state-of-the-art of DT for railway (DTR), for advanced railway systems. Besides, …
Maple: Multi-Modal Prompt Learning, Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Maple: Multi-Modal Prompt Learning, Muhammad Uzair Khattak, Hanoona Rasheed, Muhammad Maaz, Salman Khan, Fahad Shahbaz Khan
Computer Vision Faculty Publications
Pre-trained vision-language (V-L) models such as CLIP have shown excellent generalization ability to downstream tasks. However, they are sensitive to the choice of input text prompts and require careful selection of prompt templates to perform well. Inspired by the Natural Language Processing (NLP) literature, recent CLIP adaptation approaches learn prompts as the textual inputs to fine-tune CLIP for downstream tasks. We note that using prompting to adapt representations in a single branch of CLIP (language or vision) is sub-optimal since it does not allow the flexibility to dynamically adjust both representation spaces on a downstream task. In this work, we …
Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik
Towards A Machine Learning-Based Digital Twin For Non-Invasive Human Bio-Signal Fusion, Izaldein Al-Zyoud, Fedwa Laamarti, Xiaocong Ma, Diana Tobón, Abdulmotaleb Elsaddik
Computer Vision Faculty Publications
Human bio-signal fusion is considered a critical technological solution that needs to be advanced to enable modern and secure digital health and well-being applications in the metaverse. To support such efforts, we propose a new data-driven digital twin (DT) system to fuse three human physiological bio-signals: heart rate (HR), breathing rate (BR), and blood oxygen saturation level (SpO2). To accomplish this goal, we design a computer vision technology based on the non-invasive photoplethysmography (PPG) technique to extract raw time-series bio-signal data from facial video frames. Then, we implement machine learning (ML) technology to model and measure the bio-signals. We accurately …