Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2024

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 9 of 9

Full-Text Articles in Computational Linguistics

Computational Approaches To Linguistic Challenges In Arabic Speech Recognition, Enas Albasiri Jun 2024

Computational Approaches To Linguistic Challenges In Arabic Speech Recognition, Enas Albasiri

Dissertations, Theses, and Capstone Projects

This dissertation aims to document the linguistic features of Arabic that pose challenges to speech and language technologies and advance these technologies by developing state-of-the-art computational tools focusing on automatic speech recognition (ASR), text normalization (TN), and corpus development. TN converts expressions such as numbers, dates, and times—named semiotic classes—from their written to their spoken domain, such as converting ‘$84.00’ to ‘eighty-four dollars’, while inverse text normalization (ITN) converts verbalized text to its written form. This conversion is an essential preprocessing step for text-to-speech (TTS), and post-processing step for ASR. Arabic presents a challenge for TN and ITN because one …


Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez Jun 2024

Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez

Dissertations, Theses, and Capstone Projects

Word-of-mouth (WOM) in marketing occurs when consumers discuss a company's product or service or any consumption experience with their friends, family, and others with whom they have any relationship. With the advent of social media, this phenomenon has expanded rapidly into virtual environments where consumer conversation is enabled through chats, forums, social media posts, and online reviews. In response to this rapid growth of online WOM, academics and practitioners have focused their interest on this phenomenon and its implications on consumers, firms, and society. So far, the evidence of the critical role that online WOM plays in helping consumers make …


Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch Jun 2024

Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch

Dissertations, Theses, and Capstone Projects

Written modern Hebrew presents a unique challenge for training computational models for language processing because modern Hebrew text often lacks vocalization. The lack of available vocalized Hebrew data can lead to ambiguity in training these models and generally hinders work on natural language processing problems. The goal of this project is to contribute to the collection of vocalized Hebrew text by collecting and preprocessing a large corpus of unvocalized Hebrew text and building an online annotation tool. The annotation tool allows people to upload unvocalized Hebrew text, to annotate by adding Hebrew vocalization, and to download comma-separated values files of …


Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz Apr 2024

Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz

LING 590/Internet Language

Our class was studying the use of emojis across different platforms and wanted to explore how stories using emojis could impact young readers. Here, we try to translate the story of Skyler into emoji, providing translations along the way. We replace words completely with emoji, represent phrases with a few emoji, and use additional emoji to make sense of the content, including punctuation. In this book, we explore the character of Skyler, who is a picky eater. But they learn to eat the nutritious food that is good for them. In the end, they even get a reward!


Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez Feb 2024

Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez

Publications and Research

This research derives from the interest in learning the cultural differences in citation practices in the academic genre of Master's thesis of native Spanish (Ee), non-native Filipino writers of Spanish (Fe), native Filipino writers of English (Fi), and American writers of English. A total of thirty-two (32) master´s theses – eight (8) for each group – were analyzed. A quantitative and qualitative methodology was used to study this phenomenon based on the computerized textual analysis of the rhetorical function of citations arranged in typological classification that modified the outline proposed by Petrić in his 2007 article. The results obtained from …


Consonant (De)Gradation In Ingrian?, Andrea M. Harrison Feb 2024

Consonant (De)Gradation In Ingrian?, Andrea M. Harrison

Dissertations, Theses, and Capstone Projects

This paper will present a dual method toward data enrichment for low-resource languages. Using Yoyodyne -- a Fairseq-inspired neural library for small-vocabulary sequence-to-sequence generation -- a morphological generation task was tested across labeled data encompassing multiple stages of enrichment for the low-resource language Ingrian. Due to limitations in the available data for Ingrian, weighted finite-state transducers (WFSTs) were used to generate an expanded vocabulary via HFST's toolkit for Uralic languages, and GiellaLT, a source for FST-driven lexica for low-resource languages. Further stages of experimentation used labeled data from related, higher-resource languages (Finnish, Estonian) to encourage cross-lingual transfer in the interest …


How Do We Learn What We Cannot Say?, Daniel Yakubov Feb 2024

How Do We Learn What We Cannot Say?, Daniel Yakubov

Dissertations, Theses, and Capstone Projects

The contributions of this thesis are two-fold. First, this thesis presents UDTube, an easily usable software developed to perform morphological analysis in a multi-task fashion. This work shows the strong performance of UDTube versus the current state-of-the-art, UDPipe, across eight languages, primarily in the annotation of morphological features. The second contribution of this thesis is a exploration into the study of defectivity. UDTube is used to annotate a large amount of data in Greek and Russian which is ultimately used to investigate the plausibility of Indirect Negative Evidence (INE), a popular approach to the acquisition of morphological defectivity. The reported …


The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey Jan 2024

The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey

Journal of Tolkien Research

This article explores the journey taken by the One Ring across J.R.R. Tolkien’s Third Age writings. It employs a digital humanities approach to analyse linguistic patterns in Tolkien’s use of the word ring, across The Hobbit and The Lord of the Rings. Specifically, the article employs corpus linguistic methods to track shifts in the quantities and qualities of the Ring’s appearance across these texts. It uses techniques of keyness and collocation analysis to trace transformations in these quantities/qualities, including: a) the Ring’s transition from a central to a peripheral place in the Third Age’s narrative arc; and b) …


A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin Jan 2024

A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin

Theses and Dissertations--Linguistics

The disambiguation of loanwords and cognates can be a challenge, especially in areas where there has been intense language contact over an extended period of time, when the contact is between genetically related languages, and when the number of languages involved is large Over the past several decades, more and more computational approaches to automatic cognate and borrowing detection have been created in an attempt to ease the load of examining hundreds to thousands of individual lexemes, as well as determine language family relationships with allegedly greater accuracy. While these methods are not perfect and cannot replace the knowledge or …