Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

233 Full-Text Articles 347 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

233 full-text articles. Page 1 of 11.

Computational Approaches To Linguistic Challenges In Arabic Speech Recognition, Enas Albasiri 2024 The Graduate Center, City University of New York

Computational Approaches To Linguistic Challenges In Arabic Speech Recognition, Enas Albasiri

Dissertations, Theses, and Capstone Projects

This dissertation aims to document the linguistic features of Arabic that pose challenges to speech and language technologies and advance these technologies by developing state-of-the-art computational tools focusing on automatic speech recognition (ASR), text normalization (TN), and corpus development. TN converts expressions such as numbers, dates, and times—named semiotic classes—from their written to their spoken domain, such as converting ‘$84.00’ to ‘eighty-four dollars’, while inverse text normalization (ITN) converts verbalized text to its written form. This conversion is an essential preprocessing step for text-to-speech (TTS), and post-processing step for ASR. Arabic presents a challenge for TN and ITN because one …


Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez 2024 The Graduate Center, City University of New York

Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez

Dissertations, Theses, and Capstone Projects

Word-of-mouth (WOM) in marketing occurs when consumers discuss a company's product or service or any consumption experience with their friends, family, and others with whom they have any relationship. With the advent of social media, this phenomenon has expanded rapidly into virtual environments where consumer conversation is enabled through chats, forums, social media posts, and online reviews. In response to this rapid growth of online WOM, academics and practitioners have focused their interest on this phenomenon and its implications on consumers, firms, and society. So far, the evidence of the critical role that online WOM plays in helping consumers make …


Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch 2024 The Graduate Center, City University of New York

Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch

Dissertations, Theses, and Capstone Projects

Written modern Hebrew presents a unique challenge for training computational models for language processing because modern Hebrew text often lacks vocalization. The lack of available vocalized Hebrew data can lead to ambiguity in training these models and generally hinders work on natural language processing problems. The goal of this project is to contribute to the collection of vocalized Hebrew text by collecting and preprocessing a large corpus of unvocalized Hebrew text and building an online annotation tool. The annotation tool allows people to upload unvocalized Hebrew text, to annotate by adding Hebrew vocalization, and to download comma-separated values files of …


Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz 2024 University of Louisville

Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz

LING 590/Internet Language

Our class was studying the use of emojis across different platforms and wanted to explore how stories using emojis could impact young readers. Here, we try to translate the story of Skyler into emoji, providing translations along the way. We replace words completely with emoji, represent phrases with a few emoji, and use additional emoji to make sense of the content, including punctuation. In this book, we explore the character of Skyler, who is a picky eater. But they learn to eat the nutritious food that is good for them. In the end, they even get a reward!


Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez 2024 City University of New York (CUNY)

Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez

Publications and Research

This research derives from the interest in learning the cultural differences in citation practices in the academic genre of Master's thesis of native Spanish (Ee), non-native Filipino writers of Spanish (Fe), native Filipino writers of English (Fi), and American writers of English. A total of thirty-two (32) master´s theses – eight (8) for each group – were analyzed. A quantitative and qualitative methodology was used to study this phenomenon based on the computerized textual analysis of the rhetorical function of citations arranged in typological classification that modified the outline proposed by Petrić in his 2007 article. The results obtained from …


Consonant (De)Gradation In Ingrian?, Andrea M. Harrison 2024 The Graduate Center, City University of New York

Consonant (De)Gradation In Ingrian?, Andrea M. Harrison

Dissertations, Theses, and Capstone Projects

This paper will present a dual method toward data enrichment for low-resource languages. Using Yoyodyne -- a Fairseq-inspired neural library for small-vocabulary sequence-to-sequence generation -- a morphological generation task was tested across labeled data encompassing multiple stages of enrichment for the low-resource language Ingrian. Due to limitations in the available data for Ingrian, weighted finite-state transducers (WFSTs) were used to generate an expanded vocabulary via HFST's toolkit for Uralic languages, and GiellaLT, a source for FST-driven lexica for low-resource languages. Further stages of experimentation used labeled data from related, higher-resource languages (Finnish, Estonian) to encourage cross-lingual transfer in the interest …


How Do We Learn What We Cannot Say?, Daniel Yakubov 2024 The Graduate Center, City University of New York

How Do We Learn What We Cannot Say?, Daniel Yakubov

Dissertations, Theses, and Capstone Projects

The contributions of this thesis are two-fold. First, this thesis presents UDTube, an easily usable software developed to perform morphological analysis in a multi-task fashion. This work shows the strong performance of UDTube versus the current state-of-the-art, UDPipe, across eight languages, primarily in the annotation of morphological features. The second contribution of this thesis is a exploration into the study of defectivity. UDTube is used to annotate a large amount of data in Greek and Russian which is ultimately used to investigate the plausibility of Indirect Negative Evidence (INE), a popular approach to the acquisition of morphological defectivity. The reported …


The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey 2024 University of Sheffield

The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey

Journal of Tolkien Research

This article explores the journey taken by the One Ring across J.R.R. Tolkien’s Third Age writings. It employs a digital humanities approach to analyse linguistic patterns in Tolkien’s use of the word ring, across The Hobbit and The Lord of the Rings. Specifically, the article employs corpus linguistic methods to track shifts in the quantities and qualities of the Ring’s appearance across these texts. It uses techniques of keyness and collocation analysis to trace transformations in these quantities/qualities, including: a) the Ring’s transition from a central to a peripheral place in the Third Age’s narrative arc; and b) …


A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin 2024 University of Kentucky

A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin

Theses and Dissertations--Linguistics

The disambiguation of loanwords and cognates can be a challenge, especially in areas where there has been intense language contact over an extended period of time, when the contact is between genetically related languages, and when the number of languages involved is large Over the past several decades, more and more computational approaches to automatic cognate and borrowing detection have been created in an attempt to ease the load of examining hundreds to thousands of individual lexemes, as well as determine language family relationships with allegedly greater accuracy. While these methods are not perfect and cannot replace the knowledge or …


Guilty Machines: On Ab-Sens In The Age Of Ai, Dylan Lackey, Katherine Weinschenk 2023 Virginia Commonwealth University

Guilty Machines: On Ab-Sens In The Age Of Ai, Dylan Lackey, Katherine Weinschenk

Critical Humanities

For Lacan, guilt arises in the sublimation of ab-sens (non-sense) into the symbolic comprehension of sen-absexe (sense without sex, sense in the deficiency of sexual relation), or in the maturation of language to sensibility through the effacement of sex. Though, as Slavoj Žižek himself points out in a recent article regarding ChatGPT, the split subject always misapprehends the true reason for guilt’s manifestation, such guilt at best provides a sort of evidence for the inclusion of the subject in the order of language, acting as a necessary, even enjoyable mark of the subject’s coherence (or, more importantly, the subject’s separation …


The Near-Synonymous Classifiers In Mandarin Chinese: Etymology, Modern Usage, And Possible Problems In L2 Classroom, Irina Kavokina 2023 University of Massachusetts Amherst

The Near-Synonymous Classifiers In Mandarin Chinese: Etymology, Modern Usage, And Possible Problems In L2 Classroom, Irina Kavokina

Masters Theses

Many Chinese classifiers are nearly synonymic – they can be used with the same head nouns without changing the meaning of the sentence, in other words, such classifiers can be used interchangeably or almost interchangeably. This poses a challenge for Chinese language learners, especially those who lack such a grammatical category in their own native language. Another complication arises from the ambiguous English translations of many classifiers.

In this paper we investigate the collocation behavior of near-synonymous Chinese classifiers, focusing on their semantic nuances and interchangeability. Analyzing 6 pairs of classifiers — 栋 and 幢, 匹 and 头, 批 and …


Executive Order On The Safe, Secure, And Trustworthy Development And Use Of Artificial Intelligence, Joseph R. Biden 2023 United States Office of the President

Executive Order On The Safe, Secure, And Trustworthy Development And Use Of Artificial Intelligence, Joseph R. Biden

Copyright, Fair Use, Scholarly Communication, etc.

Section 1. Purpose. Artificial intelligence (AI) holds extraordinary potential for both promise and peril. Responsible AI use has the potential to help solve urgent challenges while making our world more prosperous, productive, innovative, and secure. At the same time, irresponsible use could exacerbate societal harms such as fraud, discrimination, bias, and disinformation; displace and disempower workers; stifle competition; and pose risks to national security. Harnessing AI for good and realizing its myriad benefits requires mitigating its substantial risks. This endeavor demands a society-wide effort that includes government, the private sector, academia, and civil society.

My Administration places the highest urgency …


Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore 2023 The Graduate Center, City University of New York

Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore

Dissertations, Theses, and Capstone Projects

We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension tasks. While much work has been done benchmarking language model performance at the task level, there is considerably less literature focused on how individual task items can contribute to interpretable evaluations of natural language understanding. Such work is essential to deepening our understanding of language models and ensuring their responsible use as a key tool in human machine communication. We perform an in depth mixed effects analysis on the behavior of three major generative language models, comparing their performance on a large reading comprehension …


A Computational Analysis Of Volodymyr Zelenskyy's Public Diplomacy Discourse In Times Of Crisis, Amber Brittain-Hale 2023 Pepperdine University

A Computational Analysis Of Volodymyr Zelenskyy's Public Diplomacy Discourse In Times Of Crisis, Amber Brittain-Hale

Education Division Scholarship

In this study, we delve into the public diplomacy discourse of Ukrainian President Volodymyr Zelenskyy during the ongoing crisis of the Russo-Ukrainian War. We aim to conduct a computational analysis of Zelenskyy's English, Russian, and Ukrainian speeches, exploring the linguistic patterns and code-switching employed in his discourse. The study period encompasses Russia’s build-up to and full-scale invasion of Ukraine from May 2019 to May 30, 2023. This time frame is crucial as it captures the dynamic development of the crisis and the expansion of Zelenskyy's presidency, providing a unique context for analyzing his public diplomacy efforts. By utilizing Linguistic Inquiry …


Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama 2023 CUHK Shenzhen

Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama

Business Faculty Articles and Research

We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral context vector independent of ideology, and a latent position vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents …


Destined Failure, Chengjun Pan 2023 Rhode Island School of Design

Destined Failure, Chengjun Pan

Masters Theses

I attempt to examine the complex structure of human communication, explaining why it is bound to fail. By reproducing experienceable phenomena, I demonstrate how they can expose communication structure and reveal the limitations of our perception and symbolization.I divide the process of communication into six stages: input, detection, symbolization, dictionary, interpretation, and output. In this thesis, I examine the flaws and challenges that arise in the first five stages. I argue that reception acts as a filter and that understanding relies on a symbolic system that is full of redundancies. Therefore, every interpretation is destined to be a deviation.


Neural Network Vs. Rule-Based G2p: A Hybrid Approach To Stress Prediction And Related Vowel Reduction In Bulgarian, Maria Karamihaylova 2023 The Graduate Center, City University of New York

Neural Network Vs. Rule-Based G2p: A Hybrid Approach To Stress Prediction And Related Vowel Reduction In Bulgarian, Maria Karamihaylova

Dissertations, Theses, and Capstone Projects

An effective grapheme-to-phoneme (G2P) conversion system is a critical element of speech synthesis. Rule-based systems were an early method for G2P conversion. In recent years, machine learning tools have been shown to outperform rule-based approaches in G2P tasks. We investigate neural network sequence-to-sequence modeling for the prediction of syllable stress and resulting vowel reductions in the Bulgarian language. We then develop a hybrid G2P approach which combines manually written grapheme-to-phoneme mapping rules with neural network-enabled syllable stress predictions by inserting stress markers in the predicted stress position of the transcription produced by the rule-based finite-state transducer. Finally, we apply vowel …


The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang 2023 The Chinese University of Hong Kong

The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang

Journal of English and Applied Linguistics

This paper examines the prevalence of Cantonese-English code-mixing in Hong Kong through an under-researched digital medium. Prior research on this code-alternation practice has often been limited to exploring either the social or linguistic constraints of code-switching in spoken or written communication. Our study takes a holistic approach to analyzing code-switching in a hybrid medium that exhibits features of both spoken and written discourse. We specifically analyze the code-switching patterns of 24 undergraduates from a Hong Kong university on WhatsApp and examine how both social and linguistic factors potentially constrain these patterns. Utilizing a self-compiled sociolinguistic corpus as well as survey …


Evaluating Neural Networks As Cognitive Models For Learning Quasi-Regularities In Language, Xiaomeng Ma 2023 The Graduate Center, City University of New York

Evaluating Neural Networks As Cognitive Models For Learning Quasi-Regularities In Language, Xiaomeng Ma

Dissertations, Theses, and Capstone Projects

Many aspects of language can be categorized as quasi-regular: the relationship between the inputs and outputs is systematic but allows many exceptions. Common domains that contain quasi-regularity include morphological inflection and grapheme-phoneme mapping. How humans process quasi-regularity has been debated for decades. This thesis implemented modern neural network models, transformer models, on two tasks: English past tense inflection and Chinese character naming, to investigate how transformer models perform quasi-regularity tasks. This thesis focuses on investigating to what extent the models' performances can represent human behavior. The results show that the transformers' performance is very similar to human behavior in many …


Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni 2023 The Graduate Center, City University of New York

Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni

Dissertations, Theses, and Capstone Projects

In this study, we used computational techniques to analyze the language used in news articles to describe female and male politicians. Our corpus included 370 subtexts for male candidates and 374 subtexts for female candidates, gathered through the New York Times API. We conducted two experiments: an LDA topic analysis to explore the data, and a logistic regression to classify the subtexts as either male or female. Our analysis revealed some noteworthy findings that suggest the possibility of developing a gender bias classifier in the future. However, to create a more robust understanding of bias, additional research and data are …


Digital Commons powered by bepress