Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

441 Full-Text Articles 665 Authors 135,653 Downloads 50 Institutions

All Articles in Computational Linguistics

Faceted Search

441 full-text articles. Page 1 of 22.

A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy 2023 The Graduate Center, City University of New York

A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy

Dissertations, Theses, and Capstone Projects

On social media, the use of “Filipinx” as a gender neutral, inclusive term for “Filipino” tends to generate high user engagement, at times without regard for the original context in which the word appears. This project applies computational methods to collect a large dataset in English/Filipino from Twitter containing “Filipinx”, and to train a Naïve Bayes model to classify tweets into three sentiments: positive, neutral, and negative. My methodology takes inspiration from that of four related studies that similarly conducted sentiment analysis on English/Filipino tweets involving various topics, and whose resulting accuracy scores were compared side-by-side. Conducting sentiment analysis on …


‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill 2023 University of Kentucky

‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill

Theses and Dissertations--Linguistics

The purpose of this study is to investigate how Mississippi Gulf Coast Creoles perceive language differences in their home area. A pile-sort task was carried out in which respondents were given stacks of cards with local communities written on them and instructed to stack together the regions where people “talk the same.” Once the piles were made, the fieldworker discussed their sortings with the respondents. The stacks were analyzed by means of a hierarchal agglomerative cluster analysis and non-parametric multidimensional scaling with k-means cluster analysis overlays to extract the perceived dialect areas. The groupings reveal that respondent strategies are based …


Technology In The Classroom: The Features Language Teachers Should Consider, Sophie Cuocci, Padideh Fattahi Marnani 2022 University of Central Florida

Technology In The Classroom: The Features Language Teachers Should Consider, Sophie Cuocci, Padideh Fattahi Marnani

Journal of English Learner Education

The fast development of technology and the new generation of highly computer literate students led to consider the integration of technology in school as essential. Throughout the last two decades, research has identified multiple factors leading to the successful and unsuccessful integration of technology in the classroom. Educators must consider these factors when deciding on which technology tools to use and how to integrate them to their lessons. Simultaneously, the increasing number of English learners in the United States calls for the identification of teaching strategies that will best support their needs. Many language teachers now rely on teaching techniques …


Data-Driven Neuroanatomical Subtypes In Various Stages Of Schizophrenia: Linking Cortical Thickness, Glutamate, And Language Functioning, Liangbing Liang 2022 The University of Western Ontario

Data-Driven Neuroanatomical Subtypes In Various Stages Of Schizophrenia: Linking Cortical Thickness, Glutamate, And Language Functioning, Liangbing Liang

Electronic Thesis and Dissertation Repository

The considerable variation in the spatial distribution of cortical thickness changes has been used to parse heterogeneity in schizophrenia. We aimed to recover a ‘cortical impoverishment’ subgroup with widespread cortical thinning. We applied hierarchical cluster analysis to cortical thickness data of three datasets in different stages of psychosis and studied the cognitive, functional, neurochemical, language and symptom profiles of the observed subgroups. Our consensus-based clustering procedure consistently produced a subgroup characterized by significantly lower cortical thickness. This ‘cortical impoverishment’ subgroup was associated with a higher symptom burden in a clinically stable sample and higher glutamate levels with language impairments in …


Restrictive Tier Induction, Seoyoung Kim 2022 University of Massachusetts Amherst

Restrictive Tier Induction, Seoyoung Kim

Doctoral Dissertations

This dissertation proposes the Restrictive Tier Learner, which automatically induces only the tiers that are absolutely necessary in capturing phonological long-distance dependencies. The core of my learner is the addition of an extra evaluation step to the existing Inductive Projection Learner (Gouskova and Gallagher 2020), where the necessity and accuracy of the candidate tiers are determined.

An important building block of my learner is a typological observation, namely the dichotomy between trigram-bound and unbounded patterns. The fact that this dichotomy is attested in both consonant interactions and vowel interactions allows for a unified approach to be used. Another important piece …


Linguistic Abstractions In Children’S Very Early Utterances, Qihui Xu 2022 The Graduate Center, City University of New York

Linguistic Abstractions In Children’S Very Early Utterances, Qihui Xu

Dissertations, Theses, and Capstone Projects

How early do children produce multiword utterances? Do children's early utterances reflect abstract syntactic knowledge or are they the result of data-driven learning? We examine this issue through corpus analysis, computational modeling, and adult simulation experiments. Chapter 1 investigates when children start producing multiword utterances; we use corpora to establish the development of multiword utterances and a probabilistic computational model to account for the quantitative change of early multiword utterances. We find that multiword utterances of different lengths appear early in acquisition and increase together, and the length growth pattern can be viewed as a probabilistic and dynamic process.

Chapter …


Predicting Stress In Russian Using Modern Machine-Learning Tools, John Schriner 2022 The Graduate Center, City University of New York

Predicting Stress In Russian Using Modern Machine-Learning Tools, John Schriner

Dissertations, Theses, and Capstone Projects

In the Russian language, stress on a word is determined via often complex patterns and rules. In this paper, after examining nearly a century of research in stress rules and methods in Russian, we turn to see if modern machine learning tools can aid in predicting stress. Using A.A. Zaliznyak’s dictionary grammar and over 300,000 word forms, we derived stress codes to aid in predicting which syllable primary stress falls on. We trained an LSTM neural network on the data and conducted eight experiments with added features such as lemma, part of speech, and morphology. While the model performed better …


Towards Explaining Variation In Entrainment, Andreas Weise 2022 The Graduate Center, City University of New York

Towards Explaining Variation In Entrainment, Andreas Weise

Dissertations, Theses, and Capstone Projects

Entrainment refers to the tendency of human speakers to adapt to their interlocutors to become more similar to them. This affects various dimensions and occurs in many contexts, allowing for rich applications in human-computer interaction. However, it is not exhibited by every speaker in every conversation but varies widely across features, speakers, and contexts, hindering broad application. This variation, whose guiding principles are poorly understood even after decades of entrainment research, is the subject of this thesis. We begin with a comprehensive literature review that serves as the foundation of our own work and provides a reference to guide future …


From Sesame Street To Beyond: Multi-Domain Discourse Relation Classification With Pretrained Bert, Isaac R. Raff 2022 The Graduate Center, City University of New York

From Sesame Street To Beyond: Multi-Domain Discourse Relation Classification With Pretrained Bert, Isaac R. Raff

Dissertations, Theses, and Capstone Projects

Research efforts in transfer learning have gained massive popularity in recent years. Pretrained language models have demonstrated the most successful results in producing high quality neural networks capable of quality inference after training across domains via transfer learning. This study expands on the domain transfer introduced in \cite{ferracane-etal-2019-news} exploring neural methods for transfer learning of discourse parsing between a news source domain and a medical target domain. \cite{ferracane-etal-2019-news} specifically discuss transfer learning from news articles to PubMed medical journal articles. Experiments in transfer learning in the current work expand to include three domains: Wall Street Journal articles previously annotated with …


Spectral Analysis Of Multiscale Cultural Traits On Twitter, Chandler Squires, Nikhil Kunapuli, Yaneer Bar-Yam, Alfredo Morales 2022 MIT

Spectral Analysis Of Multiscale Cultural Traits On Twitter, Chandler Squires, Nikhil Kunapuli, Yaneer Bar-Yam, Alfredo Morales

Northeast Journal of Complex Systems (NEJCS)

Understanding and mapping the emergence and boundaries of cultural areas is a challenge for social sciences. In this paper, we present a method for analyzing the cultural composition of regions via Twitter hashtags. Cultures can be described as distinct combination of traits which we capture via principal component analysis (PCA). We investigate the top 8 PCA components of an area including France, Spain, and Portugal, in terms of the geographic distribution of their hashtag composition. We also discuss relationships between components and the insights those relationships can provide into the structure of a cultural space. Finally, we compare the spatial …


Generic Ab Initio, James A. Heilpern, Earl Kjar Brown, William G. Eggington, Zachary D. Smith 2022 Brigham Young University Law School

Generic Ab Initio, James A. Heilpern, Earl Kjar Brown, William G. Eggington, Zachary D. Smith

Buffalo Law Review

From comic conventions to disbanded dioceses, courts continue to struggle with a unique but puzzling question of trademark law. Federal law protects certain terms that refer to a product or service from a specific producer instead of to a product generally. Terms that refer to products are considered generic and cannot receive protection. Courts have also held that a term that was generic at the time the party adopted the mark cannot receive protection, even if the public later views it as being specific to a particular producer. But, many marks were adopted decades or centuries ago. As a result, …


Yay…, ��, And #Sarcasm: Exploring How Sarcasm Is Marked In Text-Based Cmc, Bronte G. Gordon 2022 Portland State University

Yay…, ��, And #Sarcasm: Exploring How Sarcasm Is Marked In Text-Based Cmc, Bronte G. Gordon

University Honors Theses

Sarcasm is a complex phenomenon of indirect speech, when we intend a meaning different from that of the literal words we use. In face-to-face settings (FtF), facial expressions, body language, and prosodic cues can be helpful indicators of sarcasm. It becomes even harder to decipher when these physical cues are removed as in any written setting. This paper explores what text strategies are used to mark sarcasm in text-based English language communication online. Through a systematic literature review, the similarities and differences of irony and sarcasm were explored, as well as the issues these parallels and distinctions create in delineating …


Covert Determiners In Appalachian English Narrative Declarative Sentences, William Oliver 2022 The Graduate Center, City University of New York

Covert Determiners In Appalachian English Narrative Declarative Sentences, William Oliver

Dissertations, Theses, and Capstone Projects

In this thesis, I explore the syntax and semantics of covert determiners (Ds) in matrix subject determiner phrases (DPs) with definite specific interpretations. To conduct my investigation, I used the Audio-Aligned and Parsed Corpus of Appalachian English (AAPCAppE), a million-word Penn Treebank corpus, and the software CorpusSearch, a Java program that searches Penn Treebank corpora. My research shows that Appalachian English contains a linguistic phenomenon where speakers drop the D, replacing overt Ds with covert Ds, in definite specific DPs. For example, where Standard English speakers say The doctor came by horseback, Appalachian speakers may use a covert D …


Corrective Feedback Timing In Kanji Writing Instruction Apps, Phoenix Mulgrew 2022 Union College - Schenectady, NY

Corrective Feedback Timing In Kanji Writing Instruction Apps, Phoenix Mulgrew

Honors Theses

The focus of this research paper is to determine the correct time to provide corrective feedback to people who are learning how to write Japanese kanji. To do this, we developed a system that is able to recognize Japanese kanji that is handwritten onto an iPad screen and check for errors such as wrong stroke order. Previous research has achieved success in developing similar systems, but this project is unique because the research question involves the timing of corrective feedback. In particular, we are looking at whether immediate or delayed corrective feedback results in better learning.


A Machine Learning Approach To Text-Based Sarcasm Detection, Lara I. Novic 2022 The Graduate Center, City University of New York

A Machine Learning Approach To Text-Based Sarcasm Detection, Lara I. Novic

Dissertations, Theses, and Capstone Projects

Sarcasm and indirect language are commonplace for humans to produce and recognize but difficult for machines to detect. While artificial intelligence can accurately analyze sentiment and emotion in speech and text, it may struggle with insincere and sardonic content, although it is possible to train a machine to identify uttered and written sarcasm. This paper aims to detect sarcasm using logistic regression and a support vector machine (SVM) and compare their results to a baseline.

The models are trained on headlines from a Kaggle dataset containing headlines from the satirical news website The Onion and serious news website Huffpost (formerly …


“I Can See The Forest For The Trees”: Examining Personality Traits With Trasformers, Alexander Moore 2022 Clemson University

“I Can See The Forest For The Trees”: Examining Personality Traits With Trasformers, Alexander Moore

All Dissertations

Our understanding of Personality and its structure is rooted in linguistic studies operating under the assumptions made by the Lexical Hypothesis: personality characteristics that are important to a group of people will at some point be codified in their language, with the number of encoded representations of a personality characteristic indicating their importance. Qualitative and quantitative efforts in the dimension reduction of our lexicon throughout the mid-20th century have played a vital role in the field’s eventual arrival at the widely accepted Five Factor Model (FFM). However, there are a number of presently unresolved conflicts regarding the breadth and …


Metaphor Detection In Poems In Misurata Arabic Sub-Dialect : An Lstm Model, Azza Abugharsa 2022 Montclair State University

Metaphor Detection In Poems In Misurata Arabic Sub-Dialect : An Lstm Model, Azza Abugharsa

Theses, Dissertations and Culminating Projects

Natural Language Processing (NLP) in Arabic is witnessing an increasing interest in investigating different topics in the field. One of the topics that have drawn attention is the automatic processing of Arabic figurative language. The focus in previous projects is on detecting and interpreting metaphors in comments from social media as well as phrases and/or headlines from news articles. The current project focuses on metaphor detection in poems written in the Misurata Arabic sub-dialect spoken in Misurata, located in the North African region. The dataset is initially annotated by a group of linguists, and their annotation is treated as the …


Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian 2022 Çanakkale Onsekiz Mart University

Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian

Northeast Journal of Complex Systems (NEJCS)

In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts …


Representing Multiple Dependencies In Prosodic Structures, Kristine M. Yu 2022 University of Massachusetts - Amherst

Representing Multiple Dependencies In Prosodic Structures, Kristine M. Yu

Proceedings of the Society for Computation in Linguistics

Association of tones to prosodic trees was introduced in Pierrehumbert and Beckman (1988). This included: (i) tonal association to higher-level prosodic nodes such as intonational phrases, and (ii) multiple association of a tone to a higher-level prosodic node in addition to a tone bearing unit such as a syllable. Since then, these concepts have been broadly assumed in intonational phonology without much comment, even though Pierrehumbert and Beckman (1988)'s stipulation that tones associated to higher-level prosodic nodes are peripherally realized does not fit all the empirical data. We show that peripherally-realized tones associated to prosodic nodes can be naturally represented …


Incremental Acquisition Of A Minimalist Grammar Using An Smt-Solver, Sagar Indurkhya 2022 Massachusetts Institute of Technology

Incremental Acquisition Of A Minimalist Grammar Using An Smt-Solver, Sagar Indurkhya

Proceedings of the Society for Computation in Linguistics

We introduce a novel procedure that uses the Z3 SMT-solver, an interactive theorem prover, to incrementally infer a Minimalist Grammar (MG) from an input sequence of paired interface conditions, which corresponds to the primary linguistic data (PLD) a child is exposed to. The procedure outputs an MG lexicon, consisting of a set of (word, feature-sequence) pairings, that yields, for each entry in the PLD, a derivation that satisfies the listed interface conditions; the output MG lexicon corresponds to the Knowledge of Language that the child acquires from processing the PLD. We use the acquisition procedure to infer an MG lexicon …


Digital Commons powered by bepress