Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

388 Full-Text Articles 595 Authors 80,632 Downloads 45 Institutions

All Articles in Computational Linguistics

Faceted Search

388 full-text articles. Page 1 of 19.

Plprepare: A Grammar Checker For Challenging Cases, Jacob Hoyos 2021 East Tennessee State University

Plprepare: A Grammar Checker For Challenging Cases, Jacob Hoyos

Electronic Theses and Dissertations

This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications.


Shifting The Perspectival Landscape: Methods For Encoding, Identifying, And Selecting Perspectives, Carolyn Jane Anderson 2021 University of Massachusetts Amherst

Shifting The Perspectival Landscape: Methods For Encoding, Identifying, And Selecting Perspectives, Carolyn Jane Anderson

Doctoral Dissertations

This dissertation explores the semantics and pragmatics of perspectival expressions. Perspective, or point-of-view, encompasses an individual’s thoughts, perceptions, and location. Many expressions in natural language have components of their meanings that shift depending on whose perspective they are evaluated against. In this dissertation, I explore two sets of questions relating to perspective sensitivity. The first set of questions relate to how perspective is encoded in the semantics of perspectival expressions. The second set of questions relate to how conversation participants treat perspectival expressions: the speaker’s selection of a perspective and the listener’s identification of the speaker’s ...


Otrouha: A Corpus Of Arabic Etds And A Framework For Automatic Subject Classification, Eman Abdelrahman, Fatimah Alotaibi, Edward A. Fox, Osman Balci 2021 Virgnia Tech, Blacksburg

Otrouha: A Corpus Of Arabic Etds And A Framework For Automatic Subject Classification, Eman Abdelrahman, Fatimah Alotaibi, Edward A. Fox, Osman Balci

The Journal of Electronic Theses and Dissertations

Although the Arabic language is spoken by more than 300 million people and is one of the six official languages of the United Nations (UN), there has been less research done on Arabic text data (compared to English) in the realm of machine learning, especially in text classification. In the past decade, Arabic data such as news, tweets, etc. have begun to receive some attention. Although automatic text classification plays an important role in improving the browsability and accessibility of data, Electronic Theses and Dissertations (ETDs) have not received their fair share of attention, in spite of the huge number ...


When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller 2021 The Graduate Center, City University of New York

When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller

Dissertations, Theses, and Capstone Projects

As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an ...


A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco 2021 The Graduate Center, City University of New York

A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco

Dissertations, Theses, and Capstone Projects

Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.

Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on ...


Preface: Scil 2021 Editors' Note, Allyson Ettinger, Ellie Pavlich, Brandon Prickett 2021 University of Chicago

Preface: Scil 2021 Editors' Note, Allyson Ettinger, Ellie Pavlich, Brandon Prickett

Proceedings of the Society for Computation in Linguistics

No abstract provided.


A Network Science Approach To Bilingual Code-Switching, Qihui Xu, Magdalena Markowska, Martin Chodorow, Ping Li 2021 Graduate Center, City University of New York

A Network Science Approach To Bilingual Code-Switching, Qihui Xu, Magdalena Markowska, Martin Chodorow, Ping Li

Proceedings of the Society for Computation in Linguistics

Previous research has shown that the structure of the semantic network can influence language production, such that a word with low clustering coefficient (C) is more easily retrieved than a word with high C. In this study, we used a network science approach to examine whether the network structure accounts for why bilinguals code-switch. We established semantic networks for words in each language, then measured the C for each code-switched word and its translated equivalent. The results showed that words where language is switched have lower C than their translated equivalents in the other language, suggesting that the structures of ...


A Minimalist Approach To Facilitatory Effects In Stacked Relative Clauses, Aniello De Santo 2021 University of Utah

A Minimalist Approach To Facilitatory Effects In Stacked Relative Clauses, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

A top-down parser for Minimalist grammars (MGs; Stabler, 2013) can successfully predict a variety of off-line processing preferences, via metrics linking parsing behavior to memory load (Kobele et al., 2013; Gerth, 2015; Graf et al., 2017). The increasing empirical coverage of this model is intriguing, given its close association to modern minimalist syntax. Recently however, Zhang (2017) has argued that this framework is unable to account for a set of complexity profiles reported for English and Mandarin Chinese stacked relative clauses. Based on these observations, this paper proposes extensions to this model implementing a notion of memory reactivation, in the ...


Drivers Of English Syntactic Change In The Canadian Parliament, Liwen Hou, David Smith 2021 Northeastern University

Drivers Of English Syntactic Change In The Canadian Parliament, Liwen Hou, David Smith

Proceedings of the Society for Computation in Linguistics

Corpus linguists have long noted the "colloquialization'' of many genres of English. While the average decline in many features of formal speech is obvious in aggregate, we are better able to disentangle drivers of change by examining Canadian parliamentary speeches coded for characteristics of individual speakers across more than 100 years---much longer than previous studies of individuals' language change in a common environment. While many language changes proceed by cohort replacement and often originate with female speakers, the Canadian Hansard shows that most speakers employed increasingly colloquial language over their careers and that gender effects are mostly explained by the ...


Capturing Gradience In Long-Distance Phonology Using Probabilistic Tier-Based Strictly Local Grammars, Connor Mayer 2021 University of California, Los Angeles

Capturing Gradience In Long-Distance Phonology Using Probabilistic Tier-Based Strictly Local Grammars, Connor Mayer

Proceedings of the Society for Computation in Linguistics

Phonological processes often exhibit gradience, both in response frequencies and in acceptability judgments. This paper presents a variation of tier-based strictly local grammars, probabilistic tier-based strictly local (pTSL) grammars, which calculate the conditional probability that a given input string has some grammatical projection. pTSL grammars are well-suited to modeling gradience, particularly for long-distance processes, and naturally extend categorical tier-based strictly local grammars by probabilizing the projection function. After describing the formal properties of pTSL, I illustrate its application using data from Hungarian and Uyghur. pTSL is able to capture distance-based decay in these languages without an explicit notion of distance ...


Look At That! Bert Can Be Easily Distracted From Paying Attention To Morphosyntax, Rui P. Chaves, Stephanie N. Richter 2021 University at Buffalo

Look At That! Bert Can Be Easily Distracted From Paying Attention To Morphosyntax, Rui P. Chaves, Stephanie N. Richter

Proceedings of the Society for Computation in Linguistics

Syntactic knowledge involves not only the ability to combine words and phrases, but also the capacity to relate different and yet truth-preserving structural variations (e.g. passivization, inversion, topicalization, extraposition, clefting, etc.), as well as the ability to infer that these syntactic variations all adhere to common morphosyntactic rules, like subject-verb agreement. Although there is some evidence that BERT has rich syntactic knowledge, our adversarial approach suggests that it is not deployed in a robust and linguistically appropriate way. English BERT can be tricked to miss even quite simple syntactic generalizations, when compared with GPT-2, underscoring the need for stronger ...


Emergent Gestural Scores In A Recurrent Neural Network Model Of Vowel Harmony, Caitlin Smith, Charlie O'Hara, Eric Rosen, Paul Smolensky 2021 Johns Hopkins University

Emergent Gestural Scores In A Recurrent Neural Network Model Of Vowel Harmony, Caitlin Smith, Charlie O'Hara, Eric Rosen, Paul Smolensky

Proceedings of the Society for Computation in Linguistics

In this paper, we present the results of neural network modeling of speech production. We introduce GestNet, a sequence-to-sequence, encoder-decoder neural network architecture in which a string of input symbols is translated into sequences of vocal tract articulator movements. We train our models to produce movements of lip and tongue body articulators consistent with a pattern of stepwise vowel height harmony. Though we provide our models with no linguistic structure, they reliably learn this harmony pattern. In addition, by probing these models we find evidence of emergent linguistic structure. Specifically, we examine patterns of encoder-decoder attention (degree of influence of ...


Emerging English Transitives Over The Last Two Centuries, Liwen Hou, David Smith 2021 Northeastern University

Emerging English Transitives Over The Last Two Centuries, Liwen Hou, David Smith

Proceedings of the Society for Computation in Linguistics

We analyze an automatically-parsed British Hansard to identify approximately 200 verbs that first appeared in transitive constructions in British English in the 19th and 20th centuries. We use this list of verbs to test two hypotheses about new verb forms. First, we test the hypothesis that rarer verb lemmas are more likely to experience language change compared to more common verb lemmas. As measured by our specific notion of language change, we find that this is true only up to a certain rarity, and extremely rare lemmas are actually less likely to change compared to somewhat rare lemmas. Second, for ...


Epistemic Semantics In Guarded String Models, Eric H. Campbell, Mats Rooth 2021 Cornell University

Epistemic Semantics In Guarded String Models, Eric H. Campbell, Mats Rooth

Proceedings of the Society for Computation in Linguistics

Constructive and computable multi-agent epistemic possible worlds models are interpreted as sets of guarded string models in an epistemic extension of Kleene Algebra with Tests (KAT}). The account is framed as a formal language EpiKAT (Epistemic KAT) for defining such models. The language is implemented by translation into the finite state calculus, and alternatively by modeling propositions as lazy lists in Haskell. The syntax-semantics interface for a fragment of English is defined by a categorial grammar.


Effects Of Duration, Locality, And Surprisal In Speech Disfluency Prediction In English Spontaneous Speech, Samvit Dammalapati, Rajakrishnan Rajkumar, Sumeet Agarwal 2021 Indian Institute of Technology Delhi

Effects Of Duration, Locality, And Surprisal In Speech Disfluency Prediction In English Spontaneous Speech, Samvit Dammalapati, Rajakrishnan Rajkumar, Sumeet Agarwal

Proceedings of the Society for Computation in Linguistics

This study examines the role of two influential theories of language processing, Surprisal Theory and Dependency Locality Theory (DLT), in predicting disfluencies (fillers and reparandums) in the Switchboard corpus of English conversational speech. Using Generalized Linear Mixed Models for this task, we incorporate syntactic factors (DLT-inspired costs and syntactic surprisal) in addition to lexical surprisal and duration, thus going beyond the local lexical frequency and predictability used in previous work on modelling word durations in Switchboard speech. Our results indicate that compared to fluent words, words preceding disfluencies tend to have lower lexical surprisal (hence higher activation levels) and lower ...


Formalizing Inflectional Paradigm Shape With Information Theory, Grace LeFevre, Micha Elsner, Andrea D. Sims 2021 The Ohio State University

Formalizing Inflectional Paradigm Shape With Information Theory, Grace Lefevre, Micha Elsner, Andrea D. Sims

Proceedings of the Society for Computation in Linguistics

“Paradigm shape,” our term for the morphological structure formed by implicative relations between inflected forms, has not been formally quantified in a gradient manner. We develop a method to formalize paradigm shape by modeling the joint effect of stem alternations and affixes. Applied to Spanish verbs, our model successfully captures aspects of both allomorphic and distributional classes. These results are replicable and extendable to other languages.


Global Divergence And Local Convergence Of Utterance Semantic Representations In Dialogue, Yang Xu 2021 San Diego State University

Global Divergence And Local Convergence Of Utterance Semantic Representations In Dialogue, Yang Xu

Proceedings of the Society for Computation in Linguistics

We use deep contextualized embedding models (BERT & ELMo) and shallow word embedding models (Fasttext & GloVe) to study the alignment between dialogue interlocutors at the semantic representation level, with the goal to examine the interactive alignment model (IAM) theory. We have observed both divergence and convergence patterns in dialogue: First, the semantic distance between two adjacent utterances increases with their relative positions within the dialogue, i.e., utterances at the later stage are more semantically apart than the earlier ones. Second, semantic distance also increases with the physical distance between utterances, i.e., utterances that are physically closer have more similar ...


Inflectional Paradigms As Interacting Systems, Eric R. Rosen 2021 Johns Hopkins University

Inflectional Paradigms As Interacting Systems, Eric R. Rosen

Proceedings of the Society for Computation in Linguistics

In the framework of Gradient Symbolic Computation, (Smolensky and Goldrick, 2016), we present a model that predicts correct forms in complex inflectional paradigms through a single underlying form for a lexeme along with underlying forms for certain morphosyntactic combinations. Output-Output Correspondence constraints (Burzio, 1996; Benua, 1997; Burzio, 1999) capture interdependencies between forms in different paradigm cells. Our model avoids complex sets of rules as well as the need to index lexemes to inflectional classes. Instead, the ways that an exponent can vary across lexemes derive from a lexeme's underlying representation, which can contain partially-activated blends of segments. This approach ...


Learnability Of Indexed Constraint Analyses Of Phonological Opacity, Aleksei Nazarov 2021 Utrecht University

Learnability Of Indexed Constraint Analyses Of Phonological Opacity, Aleksei Nazarov

Proceedings of the Society for Computation in Linguistics

This paper explores the learnability of indexed constraint (Pater, 2000) analyses of opacity based on the case study of raising in Canadian English (Chomsky, 1964; Chambers, 1973). Such analyses, while avoiding multiple levels of derivation or representation, require the learner to induce indexed constraints, connect these constraints to particular segments in the lexicon, and rank these constraints. An implementation of Round’s (2017) learner for indexed constraints, which is an extension of Biased Constraint Demotion (Prince and Tesar, 2004), is used here to test whether a simple learner can rise to this challenge and learn a restrictive analysis of the ...


Learning Interactions Of Local And Non-Local Phonotactic Constraints From Positive Input, Aniello De Santo, Alëna Aksënova 2021 University of Utah

Learning Interactions Of Local And Non-Local Phonotactic Constraints From Positive Input, Aniello De Santo, Alëna Aksënova

Proceedings of the Society for Computation in Linguistics

This paper proposes a grammatical inference algorithm to learn input-sensitive tier-based strictly local languages across multiple tiers from positive data only, when the locality of the tier-constraints and the tier-projection function is set to 2 (MITSL; De Santo and Graf, 2019). We conduct simulations showing that the algorithm succeeds in learning MITSL patterns over a set of artificial languages.


Digital Commons powered by bepress