Open Access. Powered by Scholars. Published by Universities.®

MBZUAI

2023

Automatic speech recognition

Articles 1 - 3 of 3

Full-Text Articles in Artificial Intelligence and Robotics

Artst: Arabic Text And Speech Transformer, Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Al Darmaki Oct 2023

Artst: Arabic Text And Speech Transformer, Hawau Olamide Toyin, Amirbek Djanibekov, Ajinkya Kulkarni, Hanan Al Darmaki

Natural Language Processing Faculty Publications

We present ArTST, a pre-trained Arabic text and speech transformer for supporting open-source speech technologies for the Arabic language. The model architecture follows the unified-modal framework, SpeechT5, that was recently released for English, and is focused on Modern Standard Arabic (MSA), with plans to extend the model for dialectal and code-switched Arabic in future editions. We pre-trained the model from scratch on MSA speech and text data, and fine-tuned it for the following tasks: Automatic Speech Recognition (ASR), Text-To-Speech synthesis (TTS), and spoken dialect identification. In our experiments comparing ArTST with SpeechT5, as well as with previously reported results in …


Adapting The Adapters For Code-Switching In Multilingual Asr, Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Al Darmaki Oct 2023

Adapting The Adapters For Code-Switching In Multilingual Asr, Atharva Kulkarni, Ajinkya Kulkarni, Miguel Couceiro, Hanan Al Darmaki

Natural Language Processing Faculty Publications

Recently, large pre-trained multilingual speech models have shown potential in scaling Automatic Speech Recognition (ASR) to many low-resource languages. Some of these models employ language adapters in their formulation, which helps to improve monolingual performance and avoids some of the drawbacks of multi-lingual modeling on resource-rich languages. However, this formulation restricts the usability of these models on code-switched speech, where two languages are mixed together in the same utterance. In this work, we propose ways to effectively fine-tune such models on code-switched speech, by assimilating information from both language adapters at each language adaptation point in the network. We also …


N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed Aug 2023

N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Natural Language Processing Faculty Publications

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in …