Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 70

Full-Text Articles in Artificial Intelligence and Robotics

Mgmt Promoter Methylation Status Prediction Using Mri Scans? An Extensive Experimental Evaluation Of Deep Learning Models, Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub Dec 2023

Mgmt Promoter Methylation Status Prediction Using Mri Scans? An Extensive Experimental Evaluation Of Deep Learning Models, Numan Saeed, Muhammad Ridzuan, Hussain Alasmawi, Ikboljon Sobirov, Mohammad Yaqub

Computer Vision Faculty Publications

The number of studies on deep learning for medical diagnosis is expanding, and these systems are often claimed to outperform clinicians. However, only a few systems have shown medical efficacy. From this perspective, we examine a wide range of deep learning algorithms for the assessment of glioblastoma - a common brain tumor in older adults that is lethal. Surgery, chemotherapy, and radiation are the standard treatments for glioblastoma patients. The methylation status of the MGMT promoter, a specific genetic sequence found in the tumor, affects chemotherapy's effectiveness. MGMT promoter methylation improves chemotherapy response and survival in several cancers. MGMT promoter …


Offenseval 2023: Offensive Language Identification In The Age Of Large Language Models, Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe Nov 2023

Offenseval 2023: Offensive Language Identification In The Age Of Large Language Models, Marcos Zampieri, Sara Rosenthal, Preslav Nakov, Alphaeus Dmonte, Tharindu Ranasinghe

Natural Language Processing Faculty Publications

The OffensEval shared tasks organized as part of SemEval-2019-2020 were very popular, attracting over 1300 participating teams. The two editions of the shared task helped advance the state of the art in offensive language identification by providing the community with benchmark datasets in Arabic, Danish, English, Greek, and Turkish. The datasets were annotated using the OLID hierarchical taxonomy, which since then has become the de facto standard in general offensive language identification research and was widely used beyond OffensEval. We present a survey of OffensEval and related competitions, and we discuss the main lessons learned. We further evaluate the performance …


Hybrid Flexible (Hyflex) Learning Space Design And Implementation At Graduate Level: An Iterative Process, David Santandreu Calonge, Mark Thompson, Leisa Hassock, Mohammad Yaqub Nov 2023

Hybrid Flexible (Hyflex) Learning Space Design And Implementation At Graduate Level: An Iterative Process, David Santandreu Calonge, Mark Thompson, Leisa Hassock, Mohammad Yaqub

Computer Vision Faculty Publications

This paper investigates the process of designing HyFlex classrooms at Mohamed Bin Zayed University of Artificial Intelligence (MBZUAI), a graduate-level research university located in Abu Dhabi, United Arab Emirates, underpinned by the application of the EDUCAUSE Learning Space Rating System (LSRS). This investigation takes the form of a case-study and specifically focuses on the rationale, planning, design, and technology behind the implementation of the flexible HyFlex spaces as deployed in several classroom environments at MBZUAI. Iterations’ performance was assessed with the LSRS—V3. The findings should make an important contribution to the field of HyFlex learning spaces and technology-enhanced classroom design …


Preface: Special Issue On Nlp Approaches To Offensive Content Online, Marcos Zampieri, Isabelle Augenstein, Siddharth Krishnan, Joshua Melton, Preslav Nakov Nov 2023

Preface: Special Issue On Nlp Approaches To Offensive Content Online, Marcos Zampieri, Isabelle Augenstein, Siddharth Krishnan, Joshua Melton, Preslav Nakov

Natural Language Processing Faculty Publications

No abstract provided.


Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir Nov 2023

Smart Street Light Control: A Review On Methods, Innovations, And Extended Applications, Fouad Agramelal, Mohamed Sadik, Youssef Moubarak, Saad Abouzahir

Computer Vision Faculty Publications

As urbanization increases, streetlights have become significant consumers of electrical power, making it imperative to develop effective control methods for sustainability. This paper offers a comprehensive review on control methods of smart streetlight systems, setting itself apart by introducing a novel light scheme framework that provides a structured classification of various light control patterns, thus filling an existing gap in the literature. Unlike previous studies, this work dives into the technical specifics of individual research papers and methodologies, ranging from basic to advanced control methods like computer vision and deep learning, while also assessing the energy consumption associated with each …


Artst: Arabic Text And Speech Transformer, Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Al Darmaki Oct 2023

Artst: Arabic Text And Speech Transformer, Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Al Darmaki

Natural Language Processing Faculty Publications

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in …


Text Augmentation For Semantic Frame Induction And Parsing, Saba Anwar, Artem Shelmanov, Nikolay Arefyev, Alexander Panchenko, Chris Biemann Oct 2023

Text Augmentation For Semantic Frame Induction And Parsing, Saba Anwar, Artem Shelmanov, Nikolay Arefyev, Alexander Panchenko, Chris Biemann

Natural Language Processing Faculty Publications

Semantic frames are formal structures describing situations, actions or events, e.g., Commerce buy, Kidnapping, or Exchange. Each frame provides a set of frame elements or semantic roles corresponding to participants of the situation and lexical units (LUs)—words and phrases that can evoke this particular frame in texts. For example, for the frame Kidnapping, two key roles are Perpetrator and the Victim, and this frame can be evoked with lexical units abduct, kidnap, or snatcher. While formally sound, the scarce availability of semantic frame resources and their limited lexical coverage hinders the wider adoption of frame semantics across languages and domains. …


Yet Another Model For Arabic Dialect Identification, Ajinkya Kulkarni, Hanan Al Darmaki Oct 2023

Yet Another Model For Arabic Dialect Identification, Ajinkya Kulkarni, Hanan Al Darmaki

Natural Language Processing Faculty Publications

In this paper, we describe a spoken Arabic dialect identification (ADI) model for Arabic that consistently outperforms previously published results on two benchmark datasets: ADI-5 and ADI-17. We explore two architectural variations: ResNet and ECAPA-TDNN, coupled with two types of acoustic features: MFCCs and features exratected from the pre-trained self-supervised model UniSpeech-SAT Large, as well as a fusion of all four variants. We find that individually, ECAPA-TDNN network outperforms ResNet, and models with UniSpeech-SAT features outperform models with MFCCs by a large margin. Furthermore, a fusion of all four variants consistently outperforms individual models. Our best models outperform previously reported …


Metaverse Key Requirements And Platforms Survey, Akbobek Abilkaiyrkyzy, Ahmed Elhagry, Fedwa Laamarti, Abdulmotaleb El Saddik Oct 2023

Metaverse Key Requirements And Platforms Survey, Akbobek Abilkaiyrkyzy, Ahmed Elhagry, Fedwa Laamarti, Abdulmotaleb El Saddik

Computer Vision Faculty Publications

The growing interest in the metaverse has led to an abundance of platforms, each with its own unique features and limitations. This paper's objective is two-fold. First, we aim at providing an objective analysis of requirements that need to be fulfilled by metaverse platforms. We survey a broad set of criteria including interoperability, immersiveness, persistence, multimodal and social interaction, scalability, level of openness, configurability, market access, security, and blockchain integration, among others. Second, we review a wide range of existing metaverse platforms, and we critically evaluate their ability to meet the requirements listed. We identify their limitations, which must be …


Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik Oct 2023

Dtitd: An Intelligent Insider Threat Detection Framework Based On Digital Twin And Self-Attention Based Deep Learning Models, Zhi Qiang Wang, Abdulmotaleb El Saddik

Computer Vision Faculty Publications

Recent statistics and studies show that the loss generated by insider threats is much higher than that generated by external attacks. More and more organizations are investing in or purchasing insider threat detection systems to prevent insider risks. However, the accurate and timely detection of insider threats faces significant challenges. In this study, we proposed an intelligent insider threat detection framework based on Digital Twins and self-attentions based deep learning models. First, this paper introduces insider threats and the challenges in detecting them. Then this paper presents recent related works on solving insider threat detection problems and their limitations. Next, …


Adapting The Adapters For Code-Switching In Multilingual Asr, Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Al Darmaki Oct 2023

Adapting The Adapters For Code-Switching In Multilingual Asr, Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Al Darmaki

Natural Language Processing Faculty Publications

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also …


Overview Of The Clef-2023 Checkthat! Lab Task 4 On Factuality Of Reporting Of News Media, Preslav Nakov, Firoj Alam, Giovanni Da San Martino, Maram Hasanain, Dilshod Azizov, Rabindra Nath Nandi, Panayotov Panayot Sep 2023

Overview Of The Clef-2023 Checkthat! Lab Task 4 On Factuality Of Reporting Of News Media, Preslav Nakov, Firoj Alam, Giovanni Da San Martino, Maram Hasanain, Dilshod Azizov, Rabindra Nath Nandi, Panayotov Panayot

Natural Language Processing Faculty Publications

We present an overview of the CLEF-2023 CheckThat! lab Task 4, which focused on predicting the factuality of reporting of entire news outlets. This is a different level of granularity compared to previous efforts, which focused on fact-checking, where the target is a claim, or fake news detection, where the target is an article. We briefly summarize the participating systems and discuss the dataset, the task, and the evaluation setup. The task attracted a large number of registrations, and eventually five teams made submissions. All participants improved over the baseline by a margin using both deep learning and traditional machine …


Bare-Bones Based Salp Swarm Algorithm For Text Document Clustering, Mohammed Azmi Al-Betar, Ammar Kamal Abasi, Ghazi Al-Naymat, Kamran Arshad, Sharif Naser Makhadmeh Sep 2023

Bare-Bones Based Salp Swarm Algorithm For Text Document Clustering, Mohammed Azmi Al-Betar, Ammar Kamal Abasi, Ghazi Al-Naymat, Kamran Arshad, Sharif Naser Makhadmeh

Machine Learning Faculty Publications

Text Document Clustering (TDC) is a challenging optimization problem in unsupervised machine learning and text mining. The Salp Swarm Algorithm (SSA) has been found to be effective in solving complex optimization problems. However, the SSA’s exploitation phase requires improvement to solve the TDC problem effectively. In this paper, we propose a new approach, known as the Bare-Bones Salp Swarm Algorithm (BBSSA), which leverages Gaussian search equations, inverse hyperbolic cosine control strategies, and greedy selection techniques to create new individuals and guide the population towards solving the TDC problem. We evaluated the performance of the BBSSA on six benchmark datasets from …


A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray Sep 2023

A Study On Feature Selection Using Multi-Domain Feature Extraction For Automated K-Complex Detection, Yabing Li, Xinglong Dong, Kun Song, Xiangyun Bai, Hongye Li, Fakhreddine Karray

Machine Learning Faculty Publications

Background: K-complex detection plays a significant role in the field of sleep research. However, manual annotation for electroencephalography (EEG) recordings by visual inspection from experts is time-consuming and subjective. Therefore, there is a necessity to implement automatic detection methods based on classical machine learning algorithms. However, due to the complexity of EEG signal, current feature extraction methods always produce low relevance to k-complex detection, which leads to a great performance loss for the detection. Hence, finding compact yet effective integrated feature vectors becomes a crucially core task in k-complex detection. Method: In this paper, we first extract multi-domain features based …


Disease Progression Modelling Of Alzheimer's Disease Using Probabilistic Principal Components Analysis, Martin Saint-Jalmes, Victor Fedyashov, Daniel Beck, Timothy Baldwin, Noel G. Faux, Pierrick Bourgeat, Jurgen Fripp, Colin L. Masters, Benjamin Goudey Sep 2023

Disease Progression Modelling Of Alzheimer's Disease Using Probabilistic Principal Components Analysis, Martin Saint-Jalmes, Victor Fedyashov, Daniel Beck, Timothy Baldwin, Noel G. Faux, Pierrick Bourgeat, Jurgen Fripp, Colin L. Masters, Benjamin Goudey

Natural Language Processing Faculty Publications

The recent biological redefinition of Alzheimer's Disease (AD) has spurred the development of statistical models that relate changes in biomarkers with neurodegeneration and worsening condition linked to AD. The ability to measure such changes may facilitate earlier diagnoses for affected individuals and help in monitoring the evolution of their condition. Amongst such statistical tools, disease progression models (DPMs) are quantitative, data-driven methods that specifically attempt to describe the temporal dynamics of biomarkers relevant to AD. Due to the heterogeneous nature of this disease, with patients of similar age experiencing different AD-related changes, a challenge facing longitudinal mixed-effects-based DPMs is the …


Overview Of The Clef-2023 Checkthat! Lab Task 1 On Check-Worthiness Of Multimodal And Multigenre Content, Firoj Alam, Alberto Barrón-Cedeño, Gullal S. Cheema, Gautam Kishore Shahi, Sherzod Hakimov, Maram Hasanain, Chengkai Li, Rubén Míguez, Hamdy Mubarak, Wajdi Zaghouani, Preslav Nakov Sep 2023

Overview Of The Clef-2023 Checkthat! Lab Task 1 On Check-Worthiness Of Multimodal And Multigenre Content, Firoj Alam, Alberto Barrón-Cedeño, Gullal S. Cheema, Gautam Kishore Shahi, Sherzod Hakimov, Maram Hasanain, Chengkai Li, Rubén Míguez, Hamdy Mubarak, Wajdi Zaghouani, Preslav Nakov

Natural Language Processing Faculty Publications

We present an overview of CheckThat! Lab’s 2023 Task 1, which is part of CLEF-2023. Task 1 asks to determine whether a text item, or a text coupled with an image, is check-worthy. This task places a special emphasis on COVID-19, political debates and transcriptions, and it is conducted in three languages: Arabic, English, and Spanish. A total of 15 teams participated, and most submissions managed to achieve significant improvements over the baselines using Transformer-based models. Out of these, seven teams participated in the multimodal subtask (1A), and 12 teams participated in the Multigenre subtask (1B), collectively submitting 155 official …


Gpachov At Checkthat! 2023: A Diverse Multi-Approach Ensemble For Subjectivity Detection In News Articles, Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov Sep 2023

Gpachov At Checkthat! 2023: A Diverse Multi-Approach Ensemble For Subjectivity Detection In News Articles, Georgi Pachov, Dimitar Dimitrov, Ivan Koychev, Preslav Nakov

Natural Language Processing Faculty Publications

The wide-spread use of social networks has given rise to subjective, misleading, and even false information on the Internet. Thus, subjectivity detection can play an important role in ensuring the objectiveness and the quality of a piece of information. This paper presents the solution built by the Gpachov team for the CLEF-2023 CheckThat! lab Task 2 on subjectivity detection. Three different research directions are explored. The first one is based on fine-tuning a sentence embeddings encoder model and dimensionality reduction. The second one explores a sample-efficient few-shot learning model. The third one evaluates fine-tuning a multilingual transformer on an altered …


Enriched Pre-Trained Transformers For Joint Slot Filling And Intent Detection, Momchil Hardalov, Ivan Koychev, Preslav Nakov Sep 2023

Enriched Pre-Trained Transformers For Joint Slot Filling And Intent Detection, Momchil Hardalov, Ivan Koychev, Preslav Nakov

Natural Language Processing Faculty Publications

Detecting the user's intent and finding the corresponding slots among the utterance's words are important tasks in natural language understanding. Their interconnected nature makes their joint modeling a standard part of training such models. Moreover, data scarceness and specialized vocabularies pose additional challenges. Recently, the advances in pre-trained language models, namely contextualized models such as ELMo and BERT have revolutionized the field by tapping the potential of training very large models with just a few steps of fine-tuning on a task-specific dataset. Here, we leverage such models, and we design a novel architecture on top of them. Moreover, we propose …


Grammatical Error Correction: A Survey Of The State Of The Art, Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe Sep 2023

Grammatical Error Correction: A Survey Of The State Of The Art, Christopher Bryant, Zheng Yuan, Muhammad Reza Qorib, Hannan Cao, Hwee Tou Ng, Ted Briscoe

Natural Language Processing Faculty Publications

Grammatical Error Correction (GEC) is the task of automatically detecting and correcting errors in text. The task not only includes the correction of grammatical errors, such as missing prepositions and mismatched subject–verb agreement, but also orthographic and semantic errors, such as misspellings and word choice errors, respectively. The field has seen significant progress in the last decade, motivated in part by a series of five shared tasks, which drove the development of rule-based methods, statistical classifiers, statistical machine translation, and finally neural machine translation systems, which represent the current dominant state of the art. In this survey paper, we condense …


Burstormer: Burst Image Restoration And Enhancement Transformer, Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming Hsuan Yang Aug 2023

Burstormer: Burst Image Restoration And Enhancement Transformer, Akshay Dudhane, Syed Waqas Zamir, Salman Khan, Fahad Shahbaz Khan, Ming Hsuan Yang

Computer Vision Faculty Publications

On a shutter press, modern handheld cameras capture multiple images in rapid succession and merge them to generate a single image. However, individual frames in a burst are misaligned due to inevitable motions and contain multiple degradations. The challenge is to properly align the successive image shots and merge their complementary information to achieve high-quality outputs. Towards this direction, we propose Burstormer: a novel transformer-based architecture for burst image restoration and enhancement. In comparison to existing works, our approach exploits multi-scale local and non-local features to achieve improved alignment and feature fusion. Our key idea is to enable inter-frame communication …


Clip2protect: Protecting Facial Privacy Using Text-Guided Makeup Via Adversarial Latent Search, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar Aug 2023

Clip2protect: Protecting Facial Privacy Using Text-Guided Makeup Via Adversarial Latent Search, Fahad Shamshad, Muzammal Naseer, Karthik Nandakumar

Computer Vision Faculty Publications

The success of deep learning based face recognition systems has given rise to serious privacy concerns due to their ability to enable unauthorized tracking of users in the digital world. Existing methods for enhancing privacy fail to generate 'naturalistic' images that can protect facial privacy without compromising user experience. We propose a novel two-step approach for facial privacy protection that relies on finding adversarial latent codes in the low- dimensional manifold of a pretrained generative model. The first step inverts the given face image into the latent space and finetunes the generative model to achieve an accurate reconstruction of the …


Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan Aug 2023

Multiclass Confidence And Localization Calibration For Object Detection, Bimsara Pathiraja, Malitha Gunawardhana, Muhammad Haris Khan

Computer Vision Faculty Publications

Albeit achieving high predictive accuracy across many challenging computer vision problems, recent studies suggest that deep neural networks (DNNs) tend to make over-confident predictions, rendering them poorly calibrated. Most of the existing attempts for improving DNN calibration are limited to classification tasks and restricted to calibrating in-domain predictions. Surprisingly, very little to no attempts have been made in studying the calibration of object detection methods, which occupy a pivotal space in vision-based security-sensitive, and safety-critical applications. In this paper, we propose a new train-time technique for calibrating modern object detection methods. It is capable of jointly calibrating multiclass confidence and …


3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang Aug 2023

3d-Aware Multi-Class Image-To-Image Translation With Nerfs, Senmao Li, Joost Van De Weijer, Yaxing Wang, Fahad Shahbaz Khan, Meiqin Liu, Jian Yang

Computer Vision Faculty Publications

Recent advances in 3D-aware generative models (3D-aware GANs) combined with Neural Radiance Fields (NeRF) have achieved impressive results. However no prior works investigate 3D-aware GANs for 3D consistent multiclass image-to-image (3D-aware 121) translation. Naively using 2D-121 translation methods suffers from unrealistic shape/identity change. To perform 3D-aware multiclass 121 translation, we decouple this learning process into a multiclass 3D-aware GAN step and a 3D-aware 121 translation step. In the first step, we propose two novel techniques: a new conditional architecture and an effective training strategy. In the second step, based on the well-trained multiclass 3D-aware GAN architecture, that preserves view-consistency, we …


Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan Aug 2023

Discriminative Co-Saliency And Background Mining Transformer For Co-Salient Object Detection, Long Li, Junwei Han, Ni Zhang, Nian Liu, Salman Khan, Hisham Cholakkal, Rao Muhammad Anwer, Fahad Shahbaz Khan

Computer Vision Faculty Publications

Most previous co-salient object detection works mainly focus on extracting co-salient cues via mining the consistency relations across images while ignore explicit exploration of background regions. In this paper, we propose a Discriminative co-saliency and background Mining Transformer framework (DMT) based on several economical multi-grained correlation modules to explicitly mine both co-saliency and background information and effectively model their discrimination. Specifically, we first propose a region-to-region correlation module for introducing inter-image relations to pixel-wise segmentation features while maintaining computational efficiency. Then, we use two types of pre-defined tokens to mine co-saliency and background information via our proposed contrast-induced pixel-to-token correlation …


Dynamic Graph Enhanced Contrastive Learning For Chest X-Ray Report Generation, Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, Xiaojun Chang Aug 2023

Dynamic Graph Enhanced Contrastive Learning For Chest X-Ray Report Generation, Mingjie Li, Bingqian Lin, Zicong Chen, Haokun Lin, Xiaodan Liang, Xiaojun Chang

Computer Vision Faculty Publications

Automatic radiology reporting has great clinical potential to relieve radiologists from heavy workloads and improve diagnosis interpretation. Recently, researchers have enhanced data-driven neural networks with medical knowledge graphs to eliminate the severe visual and textual bias in this task. The structures of such graphs are exploited by using the clinical dependencies formed by the disease topic tags via general knowledge and usually do not update during the training process. Consequently, the fixed graphs can not guarantee the most appropriate scope of knowledge and limit the effectiveness. To address the limitation, we propose a knowledge graph with Dynamic structure and nodes …


3d Semantic Segmentation In The Wild: Learning Generalized Models For Adverse-Condition Point Clouds, Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing Aug 2023

3d Semantic Segmentation In The Wild: Learning Generalized Models For Adverse-Condition Point Clouds, Aoran Xiao, Jiaxing Huang, Weihao Xuan, Ruijie Ren, Kangcheng Liu, Dayan Guan, Abdulmotaleb El Saddik, Shijian Lu, Eric Xing

Computer Vision Faculty Publications

Robust point cloud parsing under all-weather conditions is crucial to level-5 autonomy in autonomous driving. However, how to learn a universal 3D semantic segmentation (3DSS) model is largely neglected as most existing benchmarks are dominated by point clouds captured under normal weather. We introduce SemanticSTF, an adverse-weather point cloud dataset that provides dense point-level annotations and allows to study 3DSS under various adverse weather conditions. We study all-weather 3DSS modeling under two setups: 1) domain adaptive 3DSS that adapts from normal-weather data to adverse-weather data; 2) domain generalizable 3DSS that learns all-weather 3DSS models from normal-weather data. Our studies reveal …


N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed Aug 2023

N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Natural Language Processing Faculty Publications

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in …


Vision Language Navigation With Knowledge-Driven Environmental Dreamer, Fengda Zhu, Vincent C.S. Lee, Xiaojun Chang, Xiaodan Liang Aug 2023

Vision Language Navigation With Knowledge-Driven Environmental Dreamer, Fengda Zhu, Vincent C.S. Lee, Xiaojun Chang, Xiaodan Liang

Computer Vision Faculty Publications

Vision-language navigation (VLN) requires an agent to perceive visual observation in a house scene and navigate step-by-step following natural language instruction. Due to the high cost of data annotation and data collection, current VLN datasets provide limited instruction-trajectory data samples. Learning vision-language alignment for VLN from limited data is challenging since visual observation and language instruction are both complex and diverse. Previous works only generate augmented data based on original scenes while failing to generate data samples from unseen scenes, which limits the generalization ability of the navigation agent. In this paper, we introduce the Knowledge-driven Environmental Dreamer (KED), a …


Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac Aug 2023

Reinforcement Learning Approach To Stochastic Vehicle Routing Problem With Correlated Demands, Zangir Iklassov, Ikboljon Sobirov, Ruben Solozabal, Martin Takac

Machine Learning Faculty Publications

We present a novel end-to-end framework for solving the Vehicle Routing Problem with stochastic demands (VRPSD) using Reinforcement Learning (RL). Our formulation incorporates the correlation between stochastic demands through other observable stochastic variables, thereby offering an experimental demonstration of the theoretical premise that non-i.i.d. stochastic demands provide opportunities for improved routing solutions. Our approach bridges the gap in the application of RL to VRPSD and consists of a parameterized stochastic policy optimized using a policy gradient algorithm to generate a sequence of actions that form the solution. Our model outperforms previous state-of-the-art metaheuristics and demonstrates robustness to changes in the …


A Multi-Layer Information Dissemination Model And Interference Optimization Strategy For Communication Networks In Disaster Areas, Yuexia Zhang, Yang Hong, Mohsen Guizani, Sheng Wu, Peiying Zhang, Ruiqi Liu Aug 2023

A Multi-Layer Information Dissemination Model And Interference Optimization Strategy For Communication Networks In Disaster Areas, Yuexia Zhang, Yang Hong, Mohsen Guizani, Sheng Wu, Peiying Zhang, Ruiqi Liu

Machine Learning Faculty Publications

The communication network in disaster areas (CNDA) can disseminate the key disaster information in time and provide basic information support for decision-making and rescuing. Therefore, it is of great significance to study the information dissemination mechanism of CNDA. However, a CNDA is vulnerable to interference, which affects information dissemination and rescuing. To solve this problem, this paper established a multi-layer information dissemination model of CNDA (MMND) which models the CNDA from the perspective of degree distribution of nodes. The information dissemination process and equilibrium state in CNDA is analyzed by an improved dynamic dissemination method. Then, the effects of the …