Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

499 Full-Text Articles 731 Authors 146,189 Downloads 55 Institutions

All Articles in Computational Linguistics

Faceted Search

499 full-text articles. Page 1 of 24.

Destined Failure, Chengjun Pan 2023 Rhode Island School of Design

Destined Failure, Chengjun Pan

Masters Theses

I attempt to examine the complex structure of human communication, explaining why it is bound to fail. By reproducing experienceable phenomena, I demonstrate how they can expose communication structure and reveal the limitations of our perception and symbolization.I divide the process of communication into six stages: input, detection, symbolization, dictionary, interpretation, and output. In this thesis, I examine the flaws and challenges that arise in the first five stages. I argue that reception acts as a filter and that understanding relies on a symbolic system that is full of redundancies. Therefore, every interpretation is destined to be a deviation.


Preface: Scil 2023 Editors’ Note, Tim Hunter, Brandon Prickett 2023 University of California, Los Angeles

Preface: Scil 2023 Editors’ Note, Tim Hunter, Brandon Prickett

Proceedings of the Society for Computation in Linguistics

This volume contains research presented at the sixth annual meeting of the Society for Computation in Linguistics (SCiL), held in Amherst, Massachusetts, June 15–17, 2023.


Analogy In Contact: Modeling Maltese Plural Inflection, Sara Court, Andrea D. Sims, Micha Elsner 2023 The Ohio State University

Analogy In Contact: Modeling Maltese Plural Inflection, Sara Court, Andrea D. Sims, Micha Elsner

Proceedings of the Society for Computation in Linguistics

Maltese is often described as having a hybrid morphological system resulting from extensive contact between Semitic and Romance language varieties. Such a designation reflects an etymological divide as much as it does a larger tradition in the literature to consider concatenative and non-concatenative morphological patterns as distinct in the language architecture. Using a combination of computational modeling and information theoretic methods, we quantify the extent to which the phonology and etymology of a Maltese singular noun may predict the morphological process (affixal vs. templatic) as well as the specific plural allomorph (affix or template) relating a singular noun to its …


Extracting Binary Features From Speech Production Errors And Perceptual Confusions Using Redundancy-Corrected Transmission, Zhanao Fu, Ewan Dunbar 2023 University of Toronto

Extracting Binary Features From Speech Production Errors And Perceptual Confusions Using Redundancy-Corrected Transmission, Zhanao Fu, Ewan Dunbar

Proceedings of the Society for Computation in Linguistics

We develop a mutual information-based feature extraction method and apply it to English speech production and perception error data. The extracted features show different phoneme groupings than conventional phonological features, especially in the place features. We evaluate how well the extracted features can define natural classes to account for English phonological patterns. The features extracted from production errors had performance close to conventional phonological features, while the features extracted from perception errors performed worse. The study shows that featural information can be extracted from underused sources of data such as confusion matrices of production and perception errors, and the results …


Language Models Can Learn Exceptions To Syntactic Rules, Cara Su-Yi Leong, Tal Linzen 2023 New York University

Language Models Can Learn Exceptions To Syntactic Rules, Cara Su-Yi Leong, Tal Linzen

Proceedings of the Society for Computation in Linguistics

Artificial neural networks can generalize productively to novel contexts. Can they also learn exceptions to those productive rules? We explore this question using the case of restrictions on English passivization (e.g., the fact that ''The vacation lasted five days'' is grammatical, but ''*Five days was lasted by the vacation'' is not). We collect human acceptability judgments for passive sentences with a range of verbs, and show that the probability distribution defined by GPT-2, a language model, matches the human judgments with high correlation. We also show that the relative acceptability of a verb in the active vs. passive voice is …


An Mg Parsing View Into The Processing Of Subject And Object Relative Clauses In Basque, Matteo Fiorini, Jillian Chang, Aniello De Santo 2023 University of Utah

An Mg Parsing View Into The Processing Of Subject And Object Relative Clauses In Basque, Matteo Fiorini, Jillian Chang, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

Stabler (2013)'s top-down parser for Minimalist grammars has been used to account for a variety of off-line processing preferences, with measures of memory load sensitive to subtle structural details. This paper expands the model's empirical coverage to ergative languages by looking at the processing asymmetries reported for Basque relative clauses. Our results show that the model predicts a subject over object preference as identified in the relevant psycholinguistic literature.


Modeling Island Effects With Probabilistic Tier-Based Strictly Local Grammars Over Trees, Charles J. Torres, Kenneth Hanson, Thomas Graf, Connor Mayer 2023 University of California, Irvine

Modeling Island Effects With Probabilistic Tier-Based Strictly Local Grammars Over Trees, Charles J. Torres, Kenneth Hanson, Thomas Graf, Connor Mayer

Proceedings of the Society for Computation in Linguistics

We fuse two recent strands of work in subregular linguistics—probabilistic tier projections (Mayer, 2021) and tier-based perspectives on movement (Graf, 2022a)—into a probabilistic model of syntax that makes it easy to add gradience to traditional, categorical analyses from the syntactic literature. As a case study, we test this model on experimental data from Sprouse et al. (2016) for a number of island effects in English. We show that the model correctly replicates the superadditive effects and gradience that have been observed in the psycholinguistic literature.


Morpheme Combinatorics Of Compound Words Through Box Embeddings, Eric R. Rosen 2023 (formerly at) University of Leipzig

Morpheme Combinatorics Of Compound Words Through Box Embeddings, Eric R. Rosen

Proceedings of the Society for Computation in Linguistics

In this study I probe the combinatoric properties of Japanese morphemes that participate in compounding. By representing morphemes through box embeddings (Vilnis et al., 2018; Patel et al., 2020; Li et al., 2019), a model learns preferences for one morpheme to combine with another in two-member compounds. These learned preferences are represented by the degree to which the box-hyperrectangles for two morphemes overlap in representational space. After learning, these representations are applied to test how well they encode a speaker’s knowledge of the properties of each morpheme that predict the plausibility of novel compounds in which they could occur.


Processing Advantages Of End-Weight, Lei Liu 2023 Universitat Leipzig

Processing Advantages Of End-Weight, Lei Liu

Proceedings of the Society for Computation in Linguistics

Previous research has established that English end-weight configurations, where sentence components of greater grammatical complexity appear at the ends of sentences, demonstrate processing advantages over alternative word orders. To evaluate these processing advantages, I analyze how a Minimalist Grammar (MG) parser generates syntactic structures for different word orders. The parser's behavior suggests that end-weight configurations require fewer memory resources for parsing than alternative structures. This memory load difference accounts for the end-weight advantage in processing. The results highlight the validity of the MG processing approach as a linking theory connecting syntactic structures to behavioral observations. Additionally, the results have implications …


Rethinking Representations: A Log-Bilinear Model Of Phonotactics, Huteng Dai, Connor Mayer, Richard Futrell 2023 Rutgers University

Rethinking Representations: A Log-Bilinear Model Of Phonotactics, Huteng Dai, Connor Mayer, Richard Futrell

Proceedings of the Society for Computation in Linguistics

Models of phonotactics include subsegmental representations in order to generalize to unattested sequences. These representations can be encoded in at least two ways: as discrete, phonetically-based features, or as continuous, distribution-based representations induced from the statistical patterning of sounds. Because phonological theory typically assumes that representations are discrete, past work has reduced continuous representations to discrete ones, which eliminates potentially relevant information. In this paper we present a model of phonotactics that can use continuous representations directly, and show that this approach yields competitive performance on modeling experimental judgments of English sonority sequencing. The proposed model broadens the space of …


Stochastic Harmonic Grammars Do Not Peak On The Mappings Singled Out By Categorical Harmonic Grammars, Giorgio Magri 2023 MIT, CNRS

Stochastic Harmonic Grammars Do Not Peak On The Mappings Singled Out By Categorical Harmonic Grammars, Giorgio Magri

Proceedings of the Society for Computation in Linguistics

A candidate surface phonological realization is called a peak of a probabilistic constraint-based phonological grammar provided it achieves the largest probability mass over its candidate set. Obviously, the set of peaks of a maximum entropy grammar is the categorical harmonic grammar corresponding to the same weights. This paper shows that the set of peaks of a stochastic harmonic grammar instead can be different from the categorical harmonic grammar corresponding to any weights. Thus in particular, maximum entropy and stochastic harmonic grammars can peak on different candidates.


Subject-Verb Agreement With Seq2seq Transformers: Bigger Is Better, But Still Not Best, Michael A. Wilson, Zhenghao Zhou, Robert Frank 2023 Yale University

Subject-Verb Agreement With Seq2seq Transformers: Bigger Is Better, But Still Not Best, Michael A. Wilson, Zhenghao Zhou, Robert Frank

Proceedings of the Society for Computation in Linguistics

Past work (Linzen et al., 2016; Goldberg, 2019, a.o.) has used the performance of neural network language models on subject-verb agreement to argue that such models possess structure-sensitive grammatical knowledge. We investigate what properties of the model or of the training regimen are implicated in such success in sequence to sequence transformer models that use the T5 architecture (Raffel et al., 2019; Tay et al., 2021). We find that larger models exhibit improved performance, especially in sentences with singular subjects. We also find that larger pre-training datasets are generally associated with higher performance, though models trained with less complex language …


Text Segmentation Similarity Revisited: A Flexible Distance-Based Approach For Multiple Boundary Types, Ryan Ka Yau Lai, Yujie Li, Shujie Zhang 2023 University of California, Santa Barbara

Text Segmentation Similarity Revisited: A Flexible Distance-Based Approach For Multiple Boundary Types, Ryan Ka Yau Lai, Yujie Li, Shujie Zhang

Proceedings of the Society for Computation in Linguistics

Segmentation of texts into discourse and prosodic units is a ubiquitous problem in corpus linguistics and psycholinguistics, yet best practices for its evaluation – whether evaluating consistency between human segmenters or humanlikeness of machine segmenters – remain understudied. Building on segmentation edit distance (Fournier & Inkpen 2012, Fournier 2013), this paper introduces a new measure for evaluating similarity between two segmentations of the same text with multiple, mutually exclusive boundary types, accounting for varying identifiability and confusability between these types. We implement a dynamic programming algorithm for calculation specifically geared towards this type of segmentation problem, apply it to a …


Unbounded Recursion In Two Dimensions, Where Syntax And Prosody Meet, Edward P. Stabler, Kristine M. Yu 2023 University of California, Los Angeles

Unbounded Recursion In Two Dimensions, Where Syntax And Prosody Meet, Edward P. Stabler, Kristine M. Yu

Proceedings of the Society for Computation in Linguistics

Both syntax and prosody seem to require structures with unbounded branching, something that is not immediately provided by multiple context free grammars or other equivalently expressive formalisms. That extension is easy, and does not disrupt an appealing model of prosody/syntax interaction. Rather than computing prosodic and syntactic structures independently and then selecting optimally corresponding pairs, prosodic structures can be computed directly from the syntax, eliminating alignment issues and the need for bracket-insertion or other ad hoc devices. To illustrate, a simple model of prosodically-defined Irish pronoun displacement is briefly compared to previous proposals.


Corpus-Based Measures Discriminate Inflection And Derivation Cross-Linguistically, Coleman Haley, Edoardo M. Ponti, Sharon Goldwater 2023 University of Edinburgh

Corpus-Based Measures Discriminate Inflection And Derivation Cross-Linguistically, Coleman Haley, Edoardo M. Ponti, Sharon Goldwater

Proceedings of the Society for Computation in Linguistics

Japanese passives are traditionally considered to have two types: direct and indirect passives. However, more recent studies, such as Ishizuka (2012), suggest the two types can be unified un- der the same syntactic movement analysis. Uti- lizing the Balanced Corpus of Contemporary Written Japanese (BCCWJ; Maekawa, 2008; Maekawa et al., 2014), this study aims to in- vestigate how likely different types of passives appear in the naturally occurring texts, espe- cially in relation to markedness-based hierar- chy called Noun Phrase Accessibility Hierar- chy (NPAH; Keenan and Comrie, 1977), and to investigate if true indirect passives occur in contemporary written Japanese.


Evaluating A Phonotactic Learner For Mitsl-(2,2) Languages, Jacob K. Johnson, Aniello De Santo 2023 University of Utah

Evaluating A Phonotactic Learner For Mitsl-(2,2) Languages, Jacob K. Johnson, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

We provide an implementation of De Santo and Aksënova (2021) 's grammatical inference learning algorithm for Multiple Input-sensitive Tier-based Strictly Local languages (De Santo and Graf, 2019) — following the standard of SigmaPie (Aksënova, 2020), and evaluate it on an array of patterns with varying degrees of (subregular) complexity. MISTL languages are able to capture the interaction of local and non-local constraints, and while also handling multiple dependencies simultaneously. Their practical learnability thus has strong implications for the viability of grammatical inference/subregular approaches to phonotactic learning broadly. Additionally, the transparency and provable correctness of the learning algorithms developed for such …


Bridging Production And Comprehension: Toward An Integrated Computational Model Of Error Correction, Shiva Upadhye, Jiaxuan Li, Richard Futrell 2023 University of California, Irvine

Bridging Production And Comprehension: Toward An Integrated Computational Model Of Error Correction, Shiva Upadhye, Jiaxuan Li, Richard Futrell

Proceedings of the Society for Computation in Linguistics

Error correction in production and comprehension has traditionally been studied sepa- rately. In real-time communication, however, correction may not only depend on speaker or comprehender-internal preferences, but also the interlocutors’ knowledge of each other’s strategies. We present an integrated computational framework for error correction in both production and comprehension systems. Modeling error correction as Bayesian inference, we propose that both speaker and comprehender’s correction strategies are influenced by their prior expectations about the intended message and their knowledge of a noise monitoring model. Our results indicate that speakers and comprehenders tend to weigh phonological and semantic cues differently, and these …


What Affects Priming Strength? Simulating Structural Priming Effect With Pips, Zhenghao Zhou, Robert Frank 2023 Yale University

What Affects Priming Strength? Simulating Structural Priming Effect With Pips, Zhenghao Zhou, Robert Frank

Proceedings of the Society for Computation in Linguistics

No abstract provided.


A Tsl Analysis Of Japanese Case, Kenneth Hanson 2023 Stony Brook University

A Tsl Analysis Of Japanese Case, Kenneth Hanson

Proceedings of the Society for Computation in Linguistics

Recent work in subregular syntax has revealed deep parallels among syntactic phenomena, many of which fall under the computational class TSL (Graf 2018, 2022). Vu et al. (2019) argue that case dependencies are yet another member of this class. But their analysis focuses mainly on English, which is famously case-poor. In this paper I present a TSL analysis of Japanese, which features a much wider range of case-marking patterns, adding support to the claim that case dependencies, and by extension syntactic dependencies, are TSL.


An Algebraic Characterization Of Total Input Strictly Local Functions, Dakotah Lambert, Jeffrey Heinz 2023 Universite Jean Monnet

An Algebraic Characterization Of Total Input Strictly Local Functions, Dakotah Lambert, Jeffrey Heinz

Proceedings of the Society for Computation in Linguistics

This paper provides an algebraic characteriza- tion of the total input strictly local functions. Simultaneous, noniterative rules of the form A→B/C D, common in phonology, are defin- able as functions in this class whenever CAD represents a finite set of strings. The algebraic characterization highlights a fundamental con- nection between input strictly local functions and the simple class of definite string languages, as well as connections to string functions stud- ied in the computer science literature, the def- inite functions and local functions. No effec- tive decision procedure for the input strictly local maps was previously available, but one arises …


Digital Commons powered by bepress