Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

298 Full-Text Articles 484 Authors 37,428 Downloads 40 Institutions

All Articles in Computational Linguistics

Faceted Search

298 full-text articles. Page 1 of 14.

Computational Approaches To The Syntax–Prosody Interface: Using Prosody To Improve Parsing, Hussein M. Ghaly 2020 The Graduate Center, City University of New York

Computational Approaches To The Syntax–Prosody Interface: Using Prosody To Improve Parsing, Hussein M. Ghaly

All Dissertations, Theses, and Capstone Projects

Prosody has strong ties with syntax, since prosody can be used to resolve some syntactic ambiguities. Syntactic ambiguities have been shown to negatively impact automatic syntactic parsing, hence there is reason to believe that prosodic information can help improve parsing. This dissertation considers a number of approaches that aim to computationally examine the relationship between prosody and syntax of natural languages, while also addressing the role of syntactic phrase length, with the ultimate goal of using prosody to improve parsing.

Chapter 2 examines the effect of syntactic phrase length on prosody in double center embedded sentences in French. Data collected ...


Ghost Peppers: Using Ensemble Models To Detect Professor Attractiveness Commentary On Ratemyprofessors.Com, Angie Waller 2020 The Graduate Center, City University of New York

Ghost Peppers: Using Ensemble Models To Detect Professor Attractiveness Commentary On Ratemyprofessors.Com, Angie Waller

All Dissertations, Theses, and Capstone Projects

In June 2018, RateMyProfessors.com (RMP), a popular website for students to leave professor reviews, removed a controversial feature known as the “chili pepper” which allowed students to rate their professors as “hot” or “not hot.” Though past research has rigorously analyzed the correlation of the chili pepper with higher ratings in other categories \parencite{easiness, attractive}, none has measured the effect of the removal of the chili pepper on the text content submitted by students. While it is a positive step that the chili pepper has been removed, text commentary on teacher attractiveness persists and is submitted to the ...


Phonologically-Informed Speech Coding For Automatic Speech Recognition-Based Foreign Language Pronunciation Training, Anthony J. Vicario 2020 The Graduate Center, City University of New York

Phonologically-Informed Speech Coding For Automatic Speech Recognition-Based Foreign Language Pronunciation Training, Anthony J. Vicario

All Dissertations, Theses, and Capstone Projects

Automatic speech recognition (ASR) and computer-assisted pronunciation training (CAPT) systems used in foreign-language educational contexts are often not developed with the specific task of second-language acquisition in mind. Systems that are built for this task are often excessively targeted to one native language (L1) or a single phonemic contrast and are therefore burdensome to train. Current algorithms have been shown to provide erroneous feedback to learners and show inconsistencies between human and computer perception. These discrepancies have thus far hindered more extensive application of ASR in educational systems.

This thesis reviews the computational models of the human perception of American ...


What Code-Switching Strategies Are Effective In Dialogue Systems?, Emily Ahn, Cecilia Jimenez, Yulia Tsvetkov, Alan Black 2020 University of Washington

What Code-Switching Strategies Are Effective In Dialogue Systems?, Emily Ahn, Cecilia Jimenez, Yulia Tsvetkov, Alan Black

Proceedings of the Society for Computation in Linguistics

Since most people in the world today are multilingual, code-switching is ubiquitous in spoken and written interactions. Paving the way for future adaptive, multilingual conversational agents, we incorporate linguistically-motivated strategies of code-switching into a rule-based goal-oriented dialogue system. We collect and release CommonAmigos, a corpus of 587 human-computer text conversations between our dialogue system and human users in mixed Spanish and English. From this new corpus, we analyze the amount of elicited code-switching, preferred patterns of user code-switching, and the impact of user demographics on code-switching. Based on these exploratory findings, we give recommendations for future effective code-switching dialogue systems ...


Preface: Scil 2020 Editors' Note, Allyson Ettinger, Gaja Jarosz, Max Nelson 2020 University of Chicago

Preface: Scil 2020 Editors' Note, Allyson Ettinger, Gaja Jarosz, Max Nelson

Proceedings of the Society for Computation in Linguistics

No abstract provided.


The Stability Of Segmental Properties Across Genre And Corpus Types In Low-Resource Languages, Uriel Cohen Priva, Shiying Yang, Emily Strand 2020 Brown University

The Stability Of Segmental Properties Across Genre And Corpus Types In Low-Resource Languages, Uriel Cohen Priva, Shiying Yang, Emily Strand

Proceedings of the Society for Computation in Linguistics

Are written corpora useful for phonological research? Word frequency lists for low-resource languages have become ubiquitous in recent years (Scannell, 2007). For many languages there is direct correspondence between their written forms and their alphabets, but it is not clear whether written corpora can adequately represent language use. We use 15 low-resource languages and compare several information-theoretic properties across three corpus types. We show that despite differences in origin and genre, estimates in one corpus are highly correlated with estimates in other corpora.


Modeling Behavior In Truth Value Judgment Task Experiments, Brandon Waldon, Judith Degen 2020 Stanford University

Modeling Behavior In Truth Value Judgment Task Experiments, Brandon Waldon, Judith Degen

Proceedings of the Society for Computation in Linguistics

Truth Value Judgment Task experiments (TVJTs) are a common means of investigating pragmatic competence, particularly with regards to scalar inference. We present a novel quantitative linking function from pragmatic competence to participant behavior on TVJTs, based upon a Bayesian probabilistic model of linguistic production. Our model captures a range of observed phenomena on TVJTs, including intermediate responses on a non-binary scale, population and individual-level variation, participant endorsement of false utterances, and variation in response due to so-called scalar diversity.


Probing Rnn Encoder-Decoder Generalization Of Subregular Functions Using Reduplication, Max Nelson, Hossep Dolatian, Jonathan Rawski, Brandon Prickett 2020 University of Massachusetts Amherst

Probing Rnn Encoder-Decoder Generalization Of Subregular Functions Using Reduplication, Max Nelson, Hossep Dolatian, Jonathan Rawski, Brandon Prickett

Proceedings of the Society for Computation in Linguistics

This paper examines the generalization abilities of encoder-decoder networks on a class of subregular functions characteristic of natural language reduplication. We find that, for the simulations we run, attention is a necessary and sufficient mechanism for learning generalizable reduplication. We examine attention alignment to connect RNN computation to a class of 2-way transducers.


Where New Words Are Born: Distributional Semantic Analysis Of Neologisms And Their Semantic Neighborhoods, Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, Yulia Tsvetkov 2020 Carnegie Mellon University

Where New Words Are Born: Distributional Semantic Analysis Of Neologisms And Their Semantic Neighborhoods, Maria Ryskina, Ella Rabinovich, Taylor Berg-Kirkpatrick, David R. Mortensen, Yulia Tsvetkov

Proceedings of the Society for Computation in Linguistics

We perform statistical analysis of the phenomenon of neology, the process by which new words emerge in a language, using large diachronic corpora of English. We investigate the importance of two factors, semantic sparsity and frequency growth rates of semantic neighbors, formalized in the distributional semantics paradigm. We show that both factors are predictive of word emergence although we find more support for the latter hypothesis. Besides presenting a new linguistic application of distributional semantics, this study tackles the linguistic question of the role of language-internal factors (in our case, sparsity) in language change motivated by language-external factors (reflected in ...


Mg Parsing As A Model Of Gradient Acceptability In Syntactic Islands, Aniello De Santo 2020 Stony Brook University

Mg Parsing As A Model Of Gradient Acceptability In Syntactic Islands, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

It is well-known that the acceptability judgments at the core of current syntactic theories are continuous. However, an open debate is whether the source of such gradience is situated in the grammar itself, or can be derived from extra-grammatical factors. In this paper, we propose the use of a top-down parser for Minimalist grammars (Stabler, 2013; Kobele et al., 2013; Graf et al., 2017), as a formal model of how gradient acceptability can arise from categorical grammars. As a test case, we target the acceptability judgments for island effects collected by Sprouse et al. (2012a).


Interpreting Verbal Irony: Linguistic Strategies And The Connection To The Type Of Semantic Incongruity, Debanjan Ghosh, Elena Musi, Kartikeya Upasani, Smaranda Muresan 2020 McGovern Institute for Brain Research, MIT

Interpreting Verbal Irony: Linguistic Strategies And The Connection To The Type Of Semantic Incongruity, Debanjan Ghosh, Elena Musi, Kartikeya Upasani, Smaranda Muresan

Proceedings of the Society for Computation in Linguistics

Human communication often involves the use of verbal irony or sarcasm, where the speakers usually mean the opposite of what they say. To better understand how verbal irony is expressed by the speaker and interpreted by the hearer we conduct a crowdsourcing task: given an utterance expressing verbal irony, users are asked to verbalize their interpretation of the speaker's ironic message. We propose a typology of linguistic strategies for verbal irony interpretation and link it to various theoretical linguistic frameworks. We design computational models to capture these strategies and present empirical studies aimed to answer three questions: (1) what ...


Inflectional Networks: Graph-Theoretic Tools For Inflectional Typology, Andrea D. Sims 2020 The Ohio State University

Inflectional Networks: Graph-Theoretic Tools For Inflectional Typology, Andrea D. Sims

Proceedings of the Society for Computation in Linguistics

The interpredictability of the inflected forms of lexemes is increasingly important to questions of morphological complexity and typology, but tools to quantify and visualize this aspect of inflectional organization are lacking, inhibiting effective cross-linguistic comparison. In this paper I use metrics from graph theory to describe and compare the organizational structure of inflectional systems. Graph theory offers a well-established toolbox for describing the properties of networks, making it ideal for this purpose. Comparison of nine languages reveals previously unobserved generalizations about the typological space of morphological systems. This is the first paper to apply graph-theoretic tools to the goal of ...


Acquisition Of Inflectional Morphology In Artificial Neural Networks With Prior Knowledge, Katharina Kann 2020 New York University

Acquisition Of Inflectional Morphology In Artificial Neural Networks With Prior Knowledge, Katharina Kann

Proceedings of the Society for Computation in Linguistics

How does knowledge of one language’s morphology influence learning of inflection rules in a second one? In order to investigate this question in artificial neural network models, we perform experiments with a sequence-to-sequence architecture, which we train on different combinations of eight source and three target languages. A detailed analysis of the model outputs suggests the following conclusions: (i) if source and target language are closely related, acquisition of the target language’s inflectional morphology constitutes an easier task for the model; (ii) knowledge of a prefixing (resp. suffixing) language makes acquisition of a suffixing (resp. prefixing) language’s ...


Modeling Conventionalization And Predictability Within Mwes At The Brain Level, Shohini Bhattasali, Murielle Popa-Fabre, Christophe Pallier, John Hale 2020 University of Maryland, College Park

Modeling Conventionalization And Predictability Within Mwes At The Brain Level, Shohini Bhattasali, Murielle Popa-Fabre, Christophe Pallier, John Hale

Proceedings of the Society for Computation in Linguistics

While expressions have traditionally been binarized as compositional and noncompositional in linguistic theory, Multiword Expressions (MWEs) demonstrate finer-grained distinctions. Using Association Measures like Pointwise Mutual Information and Dice's Coefficient, MWEs can be characterized as having different degrees of conventionalization and predictability. Our goal is to investigate how these gradiences could reflect cognitive processes. In this study, fMRI recordings of naturalistic narrative comprehension is used to probe to what extent these computational measures and the cognitive processes they could operationalize are observable during on-line sentence processing. Our results show that Dice's Coefficent, representing lexical predictability, is a better predictor ...


A Principled Derivation Of Harmonic Grammar, Giorgio Magri 2020 CNRS

A Principled Derivation Of Harmonic Grammar, Giorgio Magri

Proceedings of the Society for Computation in Linguistics

Phonologists focus on a few processes at the time. This practice is motivated by the intuition that phonological processes factorize into clusters with no interactions across clusters (e.g., obstruent voicing does not interact with vowel harmony). To formalize this intuition, we factorize a full-blown representation into under-specified representations, each encoding only the information needed by the corresponding phonological cluster. And we require a grammar for the original full-blown representations to factorize into gram- mars that handle the under-specified representations separately, independently of each other. Within a harmony-based implementation of constraint-based phonology, HG is shown to follow axiomatically from this ...


Phonotactic Learning With Neural Language Models, Connor Mayer, Max Nelson 2020 University of California, Los Angeles

Phonotactic Learning With Neural Language Models, Connor Mayer, Max Nelson

Proceedings of the Society for Computation in Linguistics

Computational models of phonotactics share much in common with language models, which assign probabilities to sequences of words. While state of the art language models are implemented using neural networks, phonotactic models have not followed suit. We present several neural models of phonotactics, and show that they perform favorably when compared to existing models. In addition, they provide useful insights into the role of representations on phonotactic learning and generalization. This work provides a promising starting point for future modeling of human phonotactic knowledge.


An Ibsp Description Of Sanskrit /N/-Retroflexion, Ayla Karakaş 2020 Stony Brook University

An Ibsp Description Of Sanskrit /N/-Retroflexion, Ayla Karakaş

Proceedings of the Society for Computation in Linguistics

Graf and Mayer (2018) analyze the process of Sanskrit /n/-retroflexion (nati) from a subregular perspective. They show that nati, which might be the most complex phenomenon in segmental phonology, belongs to the class of input-output tier-based strictly local languages (IO-TSL). However, the generative capacity and linguistic relevance of IO-TSL is still largely unclear compared to other recent classes like the interval-based strictly piecewise languages (IBSP: Graf, 2017, 2018). This paper shows that IBSP has a much harder time capturing nati than IO-TSL does, due to two major shortcomings: namely, the requirement of an upper bound on relevant segments, and ...


The Rhetorical Structure Of Modus Tollens: An Exploration In Logic-Mining, Andrew Potter 2020 University of North Alabama

The Rhetorical Structure Of Modus Tollens: An Exploration In Logic-Mining, Andrew Potter

Proceedings of the Society for Computation in Linguistics

A general method for mining discourse for occurrences of the rules of inference would be useful in a variety of natural language processing applications. The method described here has its roots in Rhetorical Structure Theory (RST). An RST analysis of a rule of inference can be used as an exemplar to produce a relational complex in the form of a nested relational proposition. This relational complex can be transformed into a logical expression using the logic of relational propositions. The expression can then be generalized as a logical signature for use in logic-mining discourse for instances of the rule. Generalized ...


Assessing The Ability Of Transformer-Based Neural Models To Represent Structurally Unbounded Dependencies, Jillian K. Da Costa, Rui P. Chaves 2020 University at Buffalo

Assessing The Ability Of Transformer-Based Neural Models To Represent Structurally Unbounded Dependencies, Jillian K. Da Costa, Rui P. Chaves

Proceedings of the Society for Computation in Linguistics

Filler-gap dependencies are among the most challenging syntactic constructions for com- putational models at large. Recently, Wilcox et al. (2018) and Wilcox et al. (2019b) provide some evidence suggesting that large-scale general-purpose LSTM RNNs have learned such long-distance filler-gap dependencies. In the present work we provide evidence that such models learn filler-gap dependencies only very imperfectly, despite being trained on massive amounts of data. Finally, we compare the LSTM RNN models with more modern state-of-the-art Transformer models, and find that these have poor-to-mixed degrees of success, despite their sheer size and low perplexity.


Neural Network Learning Of The Russian Genitive Of Negation: Optionality And Structure Sensitivity, Natalia Talmina, Tal Linzen 2020 Johns Hopkins University

Neural Network Learning Of The Russian Genitive Of Negation: Optionality And Structure Sensitivity, Natalia Talmina, Tal Linzen

Proceedings of the Society for Computation in Linguistics

A number of recent studies have investigated the ability of language models (specifically, neural network language models without syntactic supervision) to capture syntactic dependencies. In this paper, we contribute to this line of work and investigate the neural network learning of the Russian genitive of negation. The genitive case can optionally mark direct objects of negated verbs, but it is obligatory in the existential copula construction under negation. We find that the recurrent neural network language model we tested can learn this grammaticality pattern, although it is not clear whether it learns the locality constraint on the genitive objects. Our ...


Digital Commons powered by bepress