Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

510 Full-Text Articles 751 Authors 146,189 Downloads 57 Institutions

All Articles in Computational Linguistics

Faceted Search

510 full-text articles. Page 1 of 25.

Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore 2023 The Graduate Center, City University of New York

Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore

Dissertations, Theses, and Capstone Projects

We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension tasks. While much work has been done benchmarking language model performance at the task level, there is considerably less literature focused on how individual task items can contribute to interpretable evaluations of natural language understanding. Such work is essential to deepening our understanding of language models and ensuring their responsible use as a key tool in human machine communication. We perform an in depth mixed effects analysis on the behavior of three major generative language models, comparing their performance on a large reading comprehension …


Destined Failure, Chengjun Pan 2023 Rhode Island School of Design

Destined Failure, Chengjun Pan

Masters Theses

I attempt to examine the complex structure of human communication, explaining why it is bound to fail. By reproducing experienceable phenomena, I demonstrate how they can expose communication structure and reveal the limitations of our perception and symbolization.I divide the process of communication into six stages: input, detection, symbolization, dictionary, interpretation, and output. In this thesis, I examine the flaws and challenges that arise in the first five stages. I argue that reception acts as a filter and that understanding relies on a symbolic system that is full of redundancies. Therefore, every interpretation is destined to be a deviation.


The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang 2023 The Chinese University of Hong Kong

The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang

Journal of English and Applied Linguistics

This paper examines the prevalence of Cantonese-English code-mixing in Hong Kong through an under-researched digital medium. Prior research on this code-alternation practice has often been limited to exploring either the social or linguistic constraints of code-switching in spoken or written communication. Our study takes a holistic approach to analyzing code-switching in a hybrid medium that exhibits features of both spoken and written discourse. We specifically analyze the code-switching patterns of 24 undergraduates from a Hong Kong university on WhatsApp and examine how both social and linguistic factors potentially constrain these patterns. Utilizing a self-compiled sociolinguistic corpus as well as survey …


Corpus-Based Investigation Of The Markedness And Frequency Of Japanese Passives In Contemporary Written Japanese, Tatsuya Aoyama 2023 Georgetown University

Corpus-Based Investigation Of The Markedness And Frequency Of Japanese Passives In Contemporary Written Japanese, Tatsuya Aoyama

Proceedings of the Society for Computation in Linguistics

Japanese passives are traditionally considered to have two types: direct and indirect passives. However, more recent studies, such as Ishizuka (2012), suggest the two types can be unified under the same syntactic movement analysis. Utilizing the Balanced Corpus of Contemporary Written Japanese (BCCWJ; Maekawa, 2008; Maekawa et al., 2014), this study aims to investigate how likely different types of passives appear in the naturally occurring texts, especially in relation to markedness-based hierarchy called Noun Phrase Accessibility Hierarchy (NPAH; Keenan and Comrie, 1977), and to investigate if true indirect passives occur in contemporary written Japanese.


Processing French Rcs With Postverbal Subjects In A Minimalist Parser, Daniel Del Valle, Aniello De Santo 2023 University of Utah

Processing French Rcs With Postverbal Subjects In A Minimalist Parser, Daniel Del Valle, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

Computational models with explicit assumptions about the connection between syntactic representations and processing difficulty can help strengthen bridges between theoretical linguistics and psycholinguistics. In this sense, a model based on Stabler’s parser for Minimalist grammars (MGs; Stabler 2013) has been shown to predict a variety of off-line processing pref- erences, by exploiting complexity metrics tracking how syntactic structure affects memory load (Graf et al. 2017:a.o.). This model provides an interpretable linking theory between fine-grained syntactic structure in the generative tradition and precise sentence processing results, through transparently specified notions of cognitive cost. Here, we build on recent work on the …


An Algebraic Characterization Of Total Input Strictly Local Functions, Dakotah Lambert, Jeffrey Heinz 2023 Universite Jean Monnet

An Algebraic Characterization Of Total Input Strictly Local Functions, Dakotah Lambert, Jeffrey Heinz

Proceedings of the Society for Computation in Linguistics

This paper provides an algebraic characteriza- tion of the total input strictly local functions. Simultaneous, noniterative rules of the form A→B/C D, common in phonology, are defin- able as functions in this class whenever CAD represents a finite set of strings. The algebraic characterization highlights a fundamental con- nection between input strictly local functions and the simple class of definite string languages, as well as connections to string functions stud- ied in the computer science literature, the def- inite functions and local functions. No effec- tive decision procedure for the input strictly local maps was previously available, but one arises …


Learning Phonotactics Of Any Span And Distance, Ignas Rudaitis 2023 Vilnius University

Learning Phonotactics Of Any Span And Distance, Ignas Rudaitis

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Noise-Tolerant Learning As Selection Among Deterministic Grammatical Hypotheses, Laurel Perkins, Tim Hunter 2023 University of California, Los Angeles

Noise-Tolerant Learning As Selection Among Deterministic Grammatical Hypotheses, Laurel Perkins, Tim Hunter

Proceedings of the Society for Computation in Linguistics

Children acquire their language's canonical word order from data that contains a messy mixture of canonical and non-canonical clause types. We model this as noise-tolerant learning of grammars that deterministically produce a single word order. In simulations on English and French, our model successfully separates signal from the noise introduced by non-canonical clause types, in order to identify that both languages are SVO. No such preference for the target word order emerges from a comparison model which operates with a fully-gradient hypothesis space and an explicit numerical regularization bias. This provides an alternative general mechanism for regularization in various learning …


Towards A Learning-Based Account Of Underlying Forms: A Case Study In Turkish, Caleb Belth 2023 University of Michigan - Ann Arbor

Towards A Learning-Based Account Of Underlying Forms: A Case Study In Turkish, Caleb Belth

Proceedings of the Society for Computation in Linguistics

A traditional concept in phonological theory is that of the underlying form. However, the history of phonology has witnessed a debate about how abstract underlying representations ought to be allowed to be, and a number of arguments have been given that phonology should abandon such representations altogether. In this paper, we consider a learning-based approach to the question. We propose a model that, by default, constructs concrete representations of morphemes. When and only when such concrete representations make it challenging to generalize in the face of the sparse statistical profile of language, our proposed model constructs abstract underlying forms that …


Processing Advantages Of End-Weight, Lei Liu 2023 Universitat Leipzig

Processing Advantages Of End-Weight, Lei Liu

Proceedings of the Society for Computation in Linguistics

Previous research has established that English end-weight configurations, where sentence components of greater grammatical complexity appear at the ends of sentences, demonstrate processing advantages over alternative word orders. To evaluate these processing advantages, I analyze how a Minimalist Grammar (MG) parser generates syntactic structures for different word orders. The parser's behavior suggests that end-weight configurations require fewer memory resources for parsing than alternative structures. This memory load difference accounts for the end-weight advantage in processing. The results highlight the validity of the MG processing approach as a linking theory connecting syntactic structures to behavioral observations. Additionally, the results have implications …


Morpheme Combinatorics Of Compound Words Through Box Embeddings, Eric R. Rosen 2023 (formerly at) University of Leipzig

Morpheme Combinatorics Of Compound Words Through Box Embeddings, Eric R. Rosen

Proceedings of the Society for Computation in Linguistics

In this study I probe the combinatoric properties of Japanese morphemes that participate in compounding. By representing morphemes through box embeddings (Vilnis et al., 2018; Patel et al., 2020; Li et al., 2019), a model learns preferences for one morpheme to combine with another in two-member compounds. These learned preferences are represented by the degree to which the box-hyperrectangles for two morphemes overlap in representational space. After learning, these representations are applied to test how well they encode a speaker’s knowledge of the properties of each morpheme that predict the plausibility of novel compounds in which they could occur.


Phonological Processes With Intersecting Tier Alphabets, Daniel Gleim, Johannes Schneider 2023 Universität Leipzig

Phonological Processes With Intersecting Tier Alphabets, Daniel Gleim, Johannes Schneider

Proceedings of the Society for Computation in Linguistics

Aksënova and Deshmukh (2018) conjecture that if the phonology of a language requires projection to multiple tiers, the tier alphabets of those tiers are either disjoint or stand in a subset/superset relation, but never form a nontrivial intersection. We provide three counterexamples to this claim.


On The Spectra Of Syntactic Structures, Isabella Senturia, Robert Frank 2023 Yale University

On The Spectra Of Syntactic Structures, Isabella Senturia, Robert Frank

Proceedings of the Society for Computation in Linguistics

This paper explores the application of spectral graph theory to the problem of characterizing linguistically significant classes of tree structures. As a case study, we focus on three classes of trees, binary, X-bar, and asymmetric c-command extensional, and show that the spectral properties of different matrix representations of these classes of trees provide insight into the properties that characterize these classes. More generally, our goal is to provide another route to understanding the structure of natural language, one that does not come from extensive definitions and rules taken by extrapolating from the syntactic structure, but instead is extracted directly from …


Modeling Substitution Errors In Spanish Morphology Learning, Libby Barak, Nathalie Fernandez Echeverri, Naomi H. Feldman, Patrick Shafto 2023 Rutgers University, Newark

Modeling Substitution Errors In Spanish Morphology Learning, Libby Barak, Nathalie Fernandez Echeverri, Naomi H. Feldman, Patrick Shafto

Proceedings of the Society for Computation in Linguistics

In early stages of language acquisition, children often make inflectional errors on regular verbs, e.g., Spanish-speaking children produce –a (present-tense 3rd person singular) when other inflections are expected. Most previous models of morphology learning have focused on later stages of learning relating to production
of irregular verbs. We propose a computational model of Spanish inflection learning to examine the earlier stages of learning and present a novel data set of gold-standard inflectional annotations for Spanish verbs. Our model replicates
data from Spanish-learning children, capturing the acquisition order of different inflections and correctly predicting the substitution errors they make. Analyses show …


Parsing "Early English Books Online" For Linguistic Search, Seth Kulick, Neville Ryant, Beatrice Santorini 2023 University of Pennsylvania

Parsing "Early English Books Online" For Linguistic Search, Seth Kulick, Neville Ryant, Beatrice Santorini

Proceedings of the Society for Computation in Linguistics

This work addresses the question of how to evaluate a state-of-the-art parser on Early English Books Online (EEBO), a 1.5-billion-word collection of unannotated text, for utility in linguistic research. Earlier work has trained and evaluated a parser on the 1.7-million-word Penn-Helsinki Parsed Corpus of Early Modern English (PPCEME) and defined a query-based evaluation to score the retrieval of 6 specific sentence types of interest. However, significant differences between EEBO and the manually-annotated PPCEME make it inappropriate to assume that these results will generalize to EEBO. Fortunately, an overlap of source material in PPCEME and EEBO allows us to establish a …


Extracting Binary Features From Speech Production Errors And Perceptual Confusions Using Redundancy-Corrected Transmission, Zhanao Fu, Ewan Dunbar 2023 University of Toronto

Extracting Binary Features From Speech Production Errors And Perceptual Confusions Using Redundancy-Corrected Transmission, Zhanao Fu, Ewan Dunbar

Proceedings of the Society for Computation in Linguistics

We develop a mutual information-based feature extraction method and apply it to English speech production and perception error data. The extracted features show different phoneme groupings than conventional phonological features, especially in the place features. We evaluate how well the extracted features can define natural classes to account for English phonological patterns. The features extracted from production errors had performance close to conventional phonological features, while the features extracted from perception errors performed worse. The study shows that featural information can be extracted from underused sources of data such as confusion matrices of production and perception errors, and the results …


Bridging Production And Comprehension: Toward An Integrated Computational Model Of Error Correction, Shiva Upadhye, Jiaxuan Li, Richard Futrell 2023 University of California, Irvine

Bridging Production And Comprehension: Toward An Integrated Computational Model Of Error Correction, Shiva Upadhye, Jiaxuan Li, Richard Futrell

Proceedings of the Society for Computation in Linguistics

Error correction in production and comprehension has traditionally been studied sepa- rately. In real-time communication, however, correction may not only depend on speaker or comprehender-internal preferences, but also the interlocutors’ knowledge of each other’s strategies. We present an integrated computational framework for error correction in both production and comprehension systems. Modeling error correction as Bayesian inference, we propose that both speaker and comprehender’s correction strategies are influenced by their prior expectations about the intended message and their knowledge of a noise monitoring model. Our results indicate that speakers and comprehenders tend to weigh phonological and semantic cues differently, and these …


Analogy In Contact: Modeling Maltese Plural Inflection, Sara Court, Andrea D. Sims, Micha Elsner 2023 The Ohio State University

Analogy In Contact: Modeling Maltese Plural Inflection, Sara Court, Andrea D. Sims, Micha Elsner

Proceedings of the Society for Computation in Linguistics

Maltese is often described as having a hybrid morphological system resulting from extensive contact between Semitic and Romance language varieties. Such a designation reflects an etymological divide as much as it does a larger tradition in the literature to consider concatenative and non-concatenative morphological patterns as distinct in the language architecture. Using a combination of computational modeling and information theoretic methods, we quantify the extent to which the phonology and etymology of a Maltese singular noun may predict the morphological process (affixal vs. templatic) as well as the specific plural allomorph (affix or template) relating a singular noun to its …


L0-Regularization Induces Subregular Biases In Lstms, Charles J. Torres, Richard Futrell 2023 University of California, Irvine

L0-Regularization Induces Subregular Biases In Lstms, Charles J. Torres, Richard Futrell

Proceedings of the Society for Computation in Linguistics

Ongoing work attempts to identify the formal language patterns in natural language. In phonology, recent work has identified the subregular languages as a good candidate (Heinz, 2018). However, there remain few explanations for the source of this bias. This abstract proposes a means of investigating formal language learnability. We propose using a variant of minimum description length (MDL) as defined on LSTMs with varying priors on LSTM size. We will show its utility on a test case from Heinz and Idsardi (2013) and Rawski et al. (2017).


Unbounded Recursion In Two Dimensions, Where Syntax And Prosody Meet, Edward P. Stabler, Kristine M. Yu 2023 University of California, Los Angeles

Unbounded Recursion In Two Dimensions, Where Syntax And Prosody Meet, Edward P. Stabler, Kristine M. Yu

Proceedings of the Society for Computation in Linguistics

Both syntax and prosody seem to require structures with unbounded branching, something that is not immediately provided by multiple context free grammars or other equivalently expressive formalisms. That extension is easy, and does not disrupt an appealing model of prosody/syntax interaction. Rather than computing prosodic and syntactic structures independently and then selecting optimally corresponding pairs, prosodic structures can be computed directly from the syntax, eliminating alignment issues and the need for bracket-insertion or other ad hoc devices. To illustrate, a simple model of prosodically-defined Irish pronoun displacement is briefly compared to previous proposals.


Digital Commons powered by bepress