Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

233 Full-Text Articles 347 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

233 full-text articles. Page 4 of 11.

Shifting The Perspectival Landscape: Methods For Encoding, Identifying, And Selecting Perspectives, Carolyn Jane Anderson 2021 University of Massachusetts Amherst

Shifting The Perspectival Landscape: Methods For Encoding, Identifying, And Selecting Perspectives, Carolyn Jane Anderson

Doctoral Dissertations

This dissertation explores the semantics and pragmatics of perspectival expressions. Perspective, or point-of-view, encompasses an individual’s thoughts, perceptions, and location. Many expressions in natural language have components of their meanings that shift depending on whose perspective they are evaluated against. In this dissertation, I explore two sets of questions relating to perspective sensitivity. The first set of questions relate to how perspective is encoded in the semantics of perspectival expressions. The second set of questions relate to how conversation participants treat perspectival expressions: the speaker’s selection of a perspective and the listener’s identification of the speaker’s perspective. In Part I, …


An Interactive Visual Database For American Sign Language Reveals How Signs Are Organized In The Mind, Zed Sevcikova Sehyr, Ariel Goldberg, Karen Emmory, Naomi Caselli 2021 Chapman University

An Interactive Visual Database For American Sign Language Reveals How Signs Are Organized In The Mind, Zed Sevcikova Sehyr, Ariel Goldberg, Karen Emmory, Naomi Caselli

Communication Sciences and Disorders Faculty Articles and Research

"We are four researchers who study psycholinguistics, linguistics, neuroscience and deaf education. Our team of deaf and hearing scientists worked with a group of software engineers to create the ASL-LEX database that anyone can use for free. We cataloged information on nearly 3,000 signs and built a visual, searchable and interactive database that allows scientists and linguists to work with ASL in entirely new ways."


Otrouha: A Corpus Of Arabic Etds And A Framework For Automatic Subject Classification, Eman Abdelrahman, Fatimah Alotaibi, Edward A. Fox, Osman Balci 2021 Virgnia Tech, Blacksburg

Otrouha: A Corpus Of Arabic Etds And A Framework For Automatic Subject Classification, Eman Abdelrahman, Fatimah Alotaibi, Edward A. Fox, Osman Balci

The Journal of Electronic Theses and Dissertations

Although the Arabic language is spoken by more than 300 million people and is one of the six official languages of the United Nations (UN), there has been less research done on Arabic text data (compared to English) in the realm of machine learning, especially in text classification. In the past decade, Arabic data such as news, tweets, etc. have begun to receive some attention. Although automatic text classification plays an important role in improving the browsability and accessibility of data, Electronic Theses and Dissertations (ETDs) have not received their fair share of attention, in spite of the huge number …


The Asl-Lex 2.0 Project: A Database Of Lexical And Phonological Properties For 2,723 Signs In American Sign Language, Zed Sevcikova Sehyr, Naomi Caselli, Ariel M. Cohen-Goldberg, Karen Emmory 2021 Chapman University

The Asl-Lex 2.0 Project: A Database Of Lexical And Phonological Properties For 2,723 Signs In American Sign Language, Zed Sevcikova Sehyr, Naomi Caselli, Ariel M. Cohen-Goldberg, Karen Emmory

Communication Sciences and Disorders Faculty Articles and Research

ASL-LEX is a publicly available, large-scale lexical database for American Sign Language (ASL). We report on the expanded database (ASL-LEX 2.0) that contains 2,723 ASL signs. For each sign, ASL-LEX now includes a more detailed phonological description, phonological density and complexity measures, frequency ratings (from deaf signers), iconicity ratings (from hearing non-signers and deaf signers), transparency (“guessability”) ratings (from non-signers), sign and videoclip durations, lexical class, and more. We document the steps used to create ASL-LEX 2.0 and describe the distributional characteristics for sign properties across the lexicon and examine the relationships among lexical and phonological properties of signs. Correlation …


When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller 2021 The Graduate Center, City University of New York

When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller

Dissertations, Theses, and Capstone Projects

As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an overview …


A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco 2021 The Graduate Center, City University of New York

A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco

Dissertations, Theses, and Capstone Projects

Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.

Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on …


Sigmorphon 2021 Shared Task On Morphological Reinflection: Generalization Across Languages, Tiago Pimentel, Maria Ryskina, Christopher Straughn 2021 University of Cambridge

Sigmorphon 2021 Shared Task On Morphological Reinflection: Generalization Across Languages, Tiago Pimentel, Maria Ryskina, Christopher Straughn

Library Faculty Publications

This year’s iteration of the SIGMORPHON Shared Task on morphological reinflection focuses on typological diversity and cross-lingual variation of morphosyntactic features. In terms of the task, we enrich UniMorph with new data for 32 languages from 13 language families, with most of them being under-resourced: Kunwinjku, Classical Syriac, Arabic (Modern Standard, Egyptian, Gulf), Hebrew, Amharic, Aymara, Magahi, Braj, Kurdish (Central, Northern, Southern), Polish, Karelian, Livvi, Ludic, Veps, Võro, Evenki, Xibe, Tuvan, Sakha, Turkish, Indonesian, Kodi, Seneca, Asháninka, Yanesha, Chukchi, Itelmen, Eibela. We evaluate six systems on the new data and conduct an extensive error analysis of the systems’ predictions. Transformer-based …


What You Do Or What You Say? An Examination Of Analyst Reactions To Prototypical And Non-Prototypical Ceos Linguistic And Competitive Behaviors, Courtney Hart 2021 University of Kentucky

What You Do Or What You Say? An Examination Of Analyst Reactions To Prototypical And Non-Prototypical Ceos Linguistic And Competitive Behaviors, Courtney Hart

Theses and Dissertations--Management

Non-prototypical CEOs are those that process different demographic characteristics from a target reference group. In the US, a non-prototypical CEO is both white and male. While the negative responses to non-prototypical leaders based on race and gender have been well documented, we know less on what these leaders do that may influence biased evaluations. In this dissertation I took an impression management view to examine analysts’ evaluative bias (AEB) on prototypical and non-prototypical CEOs hiding linguistic behaviors and competitive aggressiveness. Specifically, I examined hiding linguistic behaviors on quarterly conference calls and two attributes of competitive repertoire will be researched. Drawing …


Emergent Typological Effects Of Agent-Based Learning Models In Maximum Entropy Grammar, Coral Hughto 2020 University of Massachusetts Amherst

Emergent Typological Effects Of Agent-Based Learning Models In Maximum Entropy Grammar, Coral Hughto

Doctoral Dissertations

This dissertation shows how a theory of grammatical representations and a theory of learning can be combined to generate gradient typological predictions in phonology, predicting not only which patterns are expected to exist, but also their relative frequencies: patterns which are learned more easily are predicted to be more typologically frequent than those which are more difficult. In Chapter 1 I motivate and describe the specific implementation of this methodology in this dissertation. Maximum Entropy grammar (Goldwater & Johnson 2003) is combined with two agent-based learning models, the iterated and the interactive learning model, each of which mimics a type …


A Lexical Frequency Analysis Of Irish Sign Language, Robert G Smith, Markus Hofmann 2020 Technological University Dublin

A Lexical Frequency Analysis Of Irish Sign Language, Robert G Smith, Markus Hofmann

Other Resources

Word frequency has a significant impact on language acquisition and fluency. It is often a point of reference for the teaching and assessing of a language and indeed, as a control for psycholinguistic studies. This paper presents the results of the first objective frequency analysis of lexical tokens from the Signs of Ireland corpus. We investigate the frequency of fully lexical, partly lexical and non-lexical signs in Irish Sign Language as they are presented in the corpus. We confirm the accuracy of the lexical gloss frequency data with a supplementary corpus subset that is tagged for grammatical class and additional …


Mitigating Gender Bias In Neural Machine Translation Using Counterfactual Data, Alan Wong 2020 The Graduate Center, City University of New York

Mitigating Gender Bias In Neural Machine Translation Using Counterfactual Data, Alan Wong

Dissertations, Theses, and Capstone Projects

Recent advances in deep learning have greatly improved the ability of researchers to develop effective machine translation systems. In particular, the application of modern neural architectures, such as the Transformer, has achieved state-of-the-art BLEU scores in many translation tasks. However, it has been found that even state-of-the-art neural machine translation models can suffer from certain implicit biases, such as gender bias (Lu et al., 2019). In response to this issue, researchers have proposed various potential solutions: some have proposed approaches that inject missing gender information into models, while others have attempted modifying the training data itself. We focus on mitigating …


Does The Word "Chien" Bark? Representation Learning In Neural Machine Translation Encoders, Emily Campbell 2020 The Graduate Center, City University of New York

Does The Word "Chien" Bark? Representation Learning In Neural Machine Translation Encoders, Emily Campbell

Dissertations, Theses, and Capstone Projects

This thesis presents experiments with using representation learning to explore how neural networks learn. Neural networks which take text as input create internal representations of the text during their training. Recent work has found that these representations can be used to perform other downstream linguistic tasks, such as part-of-speech (POS) tagging. This demonstrates that the neural networks are learning linguistic information and storing this information in the representations. We focus on the representations created by neural machine translation (NMT) models and whether they can be used in POS tagging. We train 5 NMT models including an auto-encoder. We extract the …


On Polysemy: A Philosophical, Psycholinguistic, And Computational Study, Jiangtian Li 2020 The University of Western Ontario

On Polysemy: A Philosophical, Psycholinguistic, And Computational Study, Jiangtian Li

Electronic Thesis and Dissertation Repository

Most words in natural languages are polysemous, that is they have related but different meanings in different contexts. These polysemous meanings (senses) are marked by their structuredness, flexibility, productivity, and regularity. Previous theories have focused on some of these features but not all of them together. Thus, I propose a new theory of polysemy, which has two components. First, word meaning is actively modulated by broad contexts in a continuous fashion. Second, clustering arises from contextual modulations of a word and is then entrenched in our long term memory to facilitate future production and processing. Hence, polysemous senses are entrenched …


Lulling Waters: A Poetry Reading For Real-Time Music Generation Through Emotion Mapping, Ashley Muniz, Toshihisa Tsuruoka 2020 New York University

Lulling Waters: A Poetry Reading For Real-Time Music Generation Through Emotion Mapping, Ashley Muniz, Toshihisa Tsuruoka

Electronic Literature Organization Conference 2020

Through a poetic narrative, “Lulling Waters” tells the story of a whale overcoming the loss of his mother, who passed away from ingesting plastic, as he attempts to escape from the polluted oceanic world. The live performance of this poem utilizes a software system called Soundwriter, which was developed with the goal of enriching the oral storytelling experience through music. This video demonstrates how Soundwriter’s real-time hybrid system was able to analyze “Lulling Waters” through its lexical and auditory features. Emotionally salient words were given ratings based on arousal, valence, and dominance while the emotionally charged prosodic features of the …


Poetry For Seers Or The Peruvian Visual Poetic Tradition In Front Of New Media, Michael Hurtado, Pamela Medina, Enrique García, Michael Prado 2020 Pontificia Universidad Catolica del Peru

Poetry For Seers Or The Peruvian Visual Poetic Tradition In Front Of New Media, Michael Hurtado, Pamela Medina, Enrique García, Michael Prado

Electronic Literature Organization Conference 2020

Since the first decades of the twentieth century, Peruvian poetic tradition has been characterized by experimental uses of language. Among these possibilities, some records tensioned this medium from the link with the plastic arts, as in the case of the poetry of José María Eguren, while others opted for the playing with the spatiality and visuality of the blank sheet, such as in the case of the work of Carlos Oquendo de Amat. However, it is not until the appearance of the poetry of César Vallejo, specifically with a poems like Trilce in 1922, that these breakages force us to …


Automatic Learning Of Document Section Structure For Ontology-Based Semantic Search, Deya Banisakher 2020 Florida International University

Automatic Learning Of Document Section Structure For Ontology-Based Semantic Search, Deya Banisakher

FIU Electronic Theses and Dissertations

Modeling natural human behavior in understanding written language is crucial for developing true artificial intelligence. For people, words convey certain semantic concepts. While documents represent an abstract concept---they are collections of text organized in some logical structure, that is, sentences, paragraphs, sections, and so on. Similar to words, these document structures, are used to convey a logical flow of semantic concepts. Machines however, only view words as spans of characters and documents as mere collections of free-text, missing any underlying meanings behind words and the logical structure of those documents.

Automatic semantic concept detection is the process by which the …


Identifying Facets Of Reader-Generated Online Reviews Of Children’S Books Based On A Textual Analysis Approach, Yunseon Choi, Soohyung Joo 2020 Valdosta State University

Identifying Facets Of Reader-Generated Online Reviews Of Children’S Books Based On A Textual Analysis Approach, Yunseon Choi, Soohyung Joo

Information Science Faculty Publications

With the increasing popularity of social media, online reviews have become one of the primary information sources for book selection. Prior studies have analyzed online reviews, mostly in the domain of business. However, little research has examined the content of online book reviews of children’s books. Book reviews generated by book readers contain different aspects of information, such as opinions, feedback, or emotional responses, from the perspectives of readers. This study explores what aspects of the books are addressed in readers’ reviews, and then it intends to identify categorical features or facets of online book reviews of children’s books. We …


Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke 2020 Portland State University

Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke

University Honors Theses

The automatic extraction of keyphrases from scholarly papers is a necessary step for many Natural Language Processing (NLP) tasks, including text retrieval, machine translation, and text summarization. However, due to the different grammatical and semantic intricacies of languages, this is a highly language-dependent task. Many free and open source implementations of state-of-the-art keyphrase extraction techniques exist, but they are not adapted for processing Russian text. Furthermore, the multi-linguistic character of scholarly papers in the field of Russian computational linguistics and NLP introduces additional complexity to keyphrase extraction. This paper describes a free and open source program as a proof of …


Empirical Analysis Of Cbow And Skip Gram Nlp Models, Tejas Menon 2020 Portland State University

Empirical Analysis Of Cbow And Skip Gram Nlp Models, Tejas Menon

University Honors Theses

CBOW and Skip Gram are two NLP techniques to produce word embedding models that are accurate and performant. They were invented in the seminal paper by T. Mikolov et al. and have since observed optimizations such as negative sampling and subsampling. This paper implements a fully-optimized version of these models using Py-Torch and runs them through a toy sentiment/subject analysis. It is weakly observed that different corpus types affect the skew of word embeddings such that fictional corpus are better suited for sentiment analysis and non-fictional for subject analysis.


Doing Away With Defaults: Motivation For A Gradient Parameter Space, Katherine Howitt 2020 The Graduate Center, City University of New York

Doing Away With Defaults: Motivation For A Gradient Parameter Space, Katherine Howitt

Dissertations, Theses, and Capstone Projects

In this thesis, I propose a reconceptualization of the traditional syntactic parameter space of the principles and parameters framework (Chomsky, 1981). In lieu of binary parameter settings, parameter values exist on a gradient plane where a learner’s knowledge of their language is encoded in their confidence that a particular parametric target value, and thus grammatical construction of an encountered sentence, is likely to be licensed by their target grammar. First, I discuss other learnability models in the classic parameter space which lack either psychological plausibility, theoretical consistency, or some combination of the two. Then, I argue for the Gradient Parameter …


Digital Commons powered by bepress