Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

250 Full-Text Articles 445 Authors 37,428 Downloads 39 Institutions

All Articles in Computational Linguistics

Faceted Search

250 full-text articles. Page 3 of 12.

What Do Neural Networks Actually Learn, When They Learn To Identify Idioms?, Marco Silvio Giuseppe Senaldi, Yuri Bizzoni, Alessandro Lenci 2019 Scuola Normale Superiore of Pisa

What Do Neural Networks Actually Learn, When They Learn To Identify Idioms?, Marco Silvio Giuseppe Senaldi, Yuri Bizzoni, Alessandro Lenci

Proceedings of the Society for Computation in Linguistics

In this ablation study we observed whether the abstractness and ambiguity of idioms constitute key factors for a Neural Network when classifying idioms vs literals. For 174 Italian idioms and literals, we collected concreteness and ambiguity judgments and extracted Word2vec and fastText vectors from itWaC. The dataset was split into 5 random training and test sets. We trained a NN on the entire training sets, after removing the most concrete literals and most abstract idioms and after removing the most ambiguous idioms. F1 decreased considerably when flattening concreteness. The results were replicated on an English dataset from the COCA corpus.


Jabberwocky Parsing: Dependency Parsing With Lexical Noise, Jungo Kasai, Robert Frank 2019 University of Washington

Jabberwocky Parsing: Dependency Parsing With Lexical Noise, Jungo Kasai, Robert Frank

Proceedings of the Society for Computation in Linguistics

Parsing models have long benefited from the use of lexical information, and indeed current state-of-the art neural network models for dependency parsing achieve substantial improvements by benefiting from distributed representations of lexical information. At the same time, humans can easily parse sentences with unknown or even novel words, as in Lewis Carroll’s poem Jabberwocky. In this paper, we carry out jabberwocky parsing experiments, exploring how robust a state-of-the-art neural network parser is to the absence of lexical information. We find that current parsing models, at least under usual training regimens, are in fact overly dependent on lexical information, and ...


Adpositional Supersenses For Mandarin Chinese, Yilun Zhu, Yang Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, Nathan Schneider 2019 Georgetown University

Adpositional Supersenses For Mandarin Chinese, Yilun Zhu, Yang Liu, Siyao Peng, Austin Blodgett, Yushi Zhao, Nathan Schneider

Proceedings of the Society for Computation in Linguistics

This study adapts Semantic Network of Adposition and Case Supersenses (SNACS) annotation to Mandarin Chinese and demonstrates that the same supersense categories are appropriate for Chinese adposition semantics. We annotated 20 chapters of The Little Prince, with high interannotator agreement. The parallel corpus substantiates the applicability of construal analysis in Chinese and gives insight into the differences in construals between adpositions in two languages. The corpus can further support automatic disambiguation of adpositions in Chinese, and the common inventory of supersenses between the two languages can potentially serve cross-linguistic tasks such as machine translation.


The Computational Cost Of Generalizations: An Example From Micromorphology, Sedigheh Moradi, Alëna Aksënova, Thomas Graf 2019 Stony Brook University

The Computational Cost Of Generalizations: An Example From Micromorphology, Sedigheh Moradi, Alëna Aksënova, Thomas Graf

Proceedings of the Society for Computation in Linguistics

Morphotactics has been argued to be limited to the formal class of tier-based strictly local languages (Aksënova et al., 2016). We claim that the level of the complexity of a pattern largely depends on the way it is morphologically analyzed. Using an example from adjectival inflection in Noon (Niger-Congo), we show that the complexity of this pattern falls in two different classes within the subregular hierarchy if viewed from different perspectives. In particular, the traditional segmentation of Noon affixes (Soukka 2000) yields a 3-TSL grammar, while the same pattern is 3-SSTSL under the perspective of micromorphology (Stump 2017). Both grammars ...


Colorless Green Recurrent Networks Dream Hierarchically, Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni 2019 Universitat Pompeu Fabra

Colorless Green Recurrent Networks Dream Hierarchically, Kristina Gulordava, Piotr Bojanowski, Edouard Grave, Tal Linzen, Marco Baroni

Proceedings of the Society for Computation in Linguistics

Recurrent neural networks (RNNs) have achieved impressive results in a variety of linguistic processing tasks, suggesting that they can induce non-trivial properties of language. We investigate here to what extent RNNs learn to track abstract hierarchical syntactic structure. We test whether RNNs trained with a generic language modeling objective in four languages (Italian, English, Hebrew, Russian) can predict long-distance number agreement in various constructions. We include in our evaluation nonsensical sentences where RNNs cannot rely on semantic or lexical cues ("The colorless green ideas I ate with the chair sleep furiously"), and, for Italian, we compare model performance to human ...


Are All Languages Equally Hard To Language-Model?, Ryan Cotterell, Sebastian J. Mielke, Jason Eisner, Brian Roark 2019 Johns Hopkins University

Are All Languages Equally Hard To Language-Model?, Ryan Cotterell, Sebastian J. Mielke, Jason Eisner, Brian Roark

Proceedings of the Society for Computation in Linguistics

How cross-linguistically applicable are NLP models, specifically language models? A fair comparison between languages is tricky: not only do training corpora in different languages have different sizes and topics, some of which may be harder to predict than others, but standard metrics for language modeling depend on the orthography of a language. We argue for a fairer metric based on the bits per utterance using utterance-aligned multi-text. We conduct a study on 21 languages, training and testing both n-gram and LSTM language models on “the same” set of utterances in each language (modulo translation), demonstrating that in some languages, especially ...


Place And Position Are Computationally Different, Charlie O'Hara 2019 University of Southern California

Place And Position Are Computationally Different, Charlie O'Hara

Proceedings of the Society for Computation in Linguistics

Pater & Moreton (2012) argue that learning biases against complex patterns lead to underrepresentation of such patterns cross-linguistically. Here, complexity is reduced to featural complexity: the fewer features needed to describe a pattern, the simpler it is. However, computational features do not map cleanly onto the classic sets of phonological features. Intuitively, an inherent featural property of a segment like place of articulation is different than a contextually derived property like its syllable position. Typological data shows that the observed typology shows that these two properties cannot be treated identically, but constraints from previous literature make the correct distinction.


Measuring Phonological Distance In A Tonal Language: An Experimental And Computational Study With Cantonese, Youngah Do, Ryan Ka Yau Lai 2019 University of Hong Kong

Measuring Phonological Distance In A Tonal Language: An Experimental And Computational Study With Cantonese, Youngah Do, Ryan Ka Yau Lai

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Case Assignment In Tsl Syntax: A Case Study, Mai Ha Vu, Nazila Shafiei, Thomas Graf 2019 University of Delaware

Case Assignment In Tsl Syntax: A Case Study, Mai Ha Vu, Nazila Shafiei, Thomas Graf

Proceedings of the Society for Computation in Linguistics

Recent work suggests that the subregular complexity of syntax might be comparable to that of phonology and morphology. More specifically, whereas phonological and morphological dependencies are tier-based strictly local over strings, syntactic dependencies are tier-based strictly local over derivation trees. However, a broader range of empirical phenomena must be considered in order to solidify this claim. This paper investigates various phenomena related to morphological case, and we argue that they, too, are tier-based strictly local. Not only do our findings provide empirical support for a kind of computational parallelism across language modules, they also offer a new, computationally unified perspective ...


Normalization May Be Ineffective For Phonetic Category Learning, Kasia Hitczenko, Reiko Mazuka, Micha Elsner, Naomi H. Feldman 2019 University of Maryland, College Park

Normalization May Be Ineffective For Phonetic Category Learning, Kasia Hitczenko, Reiko Mazuka, Micha Elsner, Naomi H. Feldman

Proceedings of the Society for Computation in Linguistics

Sound categories often overlap in their acoustics, which can make phonetic learning difficult. Several studies argued that normalizing acoustics relative to context improves category separation (e.g. Dillon et al., 2013). However, recent work shows that normalization is ineffective for learning Japanese vowel length from spontaneous child-directed speech (Hitczenko et al., 2018). We show that this discrepancy arises from differences between spontaneous and controlled lab speech, and that normalization can increase category overlap when there are regularities in which contexts different sounds occur in - a hallmark of spontaneous speech. Therefore, normalization is unlikely to help in real, naturalistic phonetic learning ...


C-Command Dependencies As Tsl String Constraints, Thomas Graf, Nazila Shafiei 2019 Stony Brook University

C-Command Dependencies As Tsl String Constraints, Thomas Graf, Nazila Shafiei

Proceedings of the Society for Computation in Linguistics

We provide a general framework for analyzing c-command based dependencies in syntax, e.g. binding and NPI licensing, from a subregular perspective. C-command relations are represented as strings computed from Minimalist derivation trees, and syntactic dependencies are shown to be input-output tier-based strictly local over such strings. The complexity of many syntactic phenomena thus is comparable to dependencies in phonology and morphology.


Formal Characterizations Of True And False Sour Grapes, Caitlin Smith, Charlie O'Hara 2019 University of Southern California

Formal Characterizations Of True And False Sour Grapes, Caitlin Smith, Charlie O'Hara

Proceedings of the Society for Computation in Linguistics

No abstract provided.


A Logical And Computational Methodology For Exploring Systems Of Phonotactic Constraints, Dakotah Lambert, James Rogers 2019 Earlham College

A Logical And Computational Methodology For Exploring Systems Of Phonotactic Constraints, Dakotah Lambert, James Rogers

Proceedings of the Society for Computation in Linguistics

We introduce a methodology built around a logical analysis component based on a hierarchy of classes of Subregular constraints characterized by the kinds of features of a string a mechanism must be sensitive to in order to determine whether it satisfies the constraint, and a computational component built around a publicly-available interactive workbench that implements, based on the equivalence between logical formulae and finite-state automata, a theorem prover for these logics (even algorithmically extracting certain classes of constraints), wherein the alternation between these logical and computational analyses can provide useful insight more easily than using either in isolation.


Distributional Effects Of Gender Contrasts Across Categories, Timothee Mickus, Olivier Bonami, Denis Paperno 2019 ATILF/LORIA, Université de Lorraine

Distributional Effects Of Gender Contrasts Across Categories, Timothee Mickus, Olivier Bonami, Denis Paperno

Proceedings of the Society for Computation in Linguistics

This paper proposes a methodology for comparing grammatical contrasts across categories with the tools of distributional semantics. After outlining why such a comparison is relevant to current theoretical work on gender and other morphosyntactic features, we present intrinsic and extrinsic predictability as instruments for analyzing semantic contrasts between pairs of words. We then apply our method to a dataset of gender pairs of French nouns and adjectives. We find that, while the distributional effect of gender is overall less predictable for nouns than for adjectives, it is heavily influenced by semantic properties of the adjectives.


Constraint Breeding During On-Line Incremental Learning, Elliott Moreton 2019 University of North Carolina, Chapel Hill

Constraint Breeding During On-Line Incremental Learning, Elliott Moreton

Proceedings of the Society for Computation in Linguistics

An evolutionary algorithm for simultaneously inducing and weighting phonological constraints (the Winnow-MaxEnt-Subtree Breeder) is described, analyzed, and illustrated. Implementing weights as sub-population sizes, reproduction with selection executes a new variant of Winnow (Littleton 1988), which is shown to converge. A flexible constraint schema, based on the same prosodic and autosegmental trees used in representations, is described, together with algorithms for mutation and recombination (mating). The algorithm is applied to explaining abrupt learning curves, and predicts an empirical connection between abruptness and language-particularity.


Verb Argument Structure Alternations In Word And Sentence Embeddings, Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman 2019 New York University

Verb Argument Structure Alternations In Word And Sentence Embeddings, Katharina Kann, Alex Warstadt, Adina Williams, Samuel R. Bowman

Proceedings of the Society for Computation in Linguistics

Verbs occur in different syntactic environments, or frames. We investigate whether artificial neural networks encode grammatical distinctions necessary for inferring the idiosyncratic frame-selectional properties of verbs. We introduce five datasets, collectively called FAVA, containing in aggregate nearly 10k sentences labeled for grammatical acceptability, illustrating different verbal argument structure alternations. We then test whether models can distinguish acceptable English verb--frame combinations from unacceptable ones using a sentence embedding alone. For converging evidence, we further construct LAVA, a corresponding word-level dataset, and investigate whether the same syntactic features can be extracted from word embeddings. Our models perform reliable classifications for some verbal ...


How The Structure Of The Constraint Space Enables Learning, Jane Chandlee, Remi Eyraud, Jeffrey Heinz, Adam Jardine, Jonathan Rawski 2019 Haverford College

How The Structure Of The Constraint Space Enables Learning, Jane Chandlee, Remi Eyraud, Jeffrey Heinz, Adam Jardine, Jonathan Rawski

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Discourse Relations And Signaling Information: Anchoring Discourse Signals In Rst-Dt, Yang Liu, Amir Zeldes 2019 Georgetown University

Discourse Relations And Signaling Information: Anchoring Discourse Signals In Rst-Dt, Yang Liu, Amir Zeldes

Proceedings of the Society for Computation in Linguistics

Research on discourse relations between clauses, such as cause or contrast, has studied how relations are signaled in discourse. Several corpora include discourse relation annotations: the Penn Discourse Treebank (Prasad et al., 2008) annotates a subset of relations marked by explicit connectives (e.g. ‘however’) or understood implicit ones, while the RST-Signalling Corpus (Taboada & Das 2013) annotates the presence of signals exhaustively, but provides no information about the location of signaling devices. We present an annotation effort to anchor discourse signals at all levels, bridging the gap between these two frameworks, and support feature engineering for automatic discourse parsing.


Transient Blend States And Discrete Agreement-Driven Errors In Sentence Production, Matthew Goldrick, Laurel Brehm, Pyeong Whan Cho, Paul Smolensky 2019 Northwestern University

Transient Blend States And Discrete Agreement-Driven Errors In Sentence Production, Matthew Goldrick, Laurel Brehm, Pyeong Whan Cho, Paul Smolensky

Proceedings of the Society for Computation in Linguistics

Errors in subject-verb agreement are common in everyday language production. This has been studied using a preamble completion task in which a participant hears or reads a preamble containing inflected nouns and forms a complete English sentence (“The key to the cabinets” could be completed as "The key to the cabinets is gold.") Existing work has focused on errors arising in selecting the correct verb form for production in the presence of a more ‘local’ noun with different number features (The key to the cabinets are gold). However, the same paradigm elicits substantial numbers of preamble errors ("The key to ...


Tense And Aspect Semantics For Sentential Amr, Lucia Donatelli, Nathan Schneider, William Croft, Michael Regan 2019 Georgetown University

Tense And Aspect Semantics For Sentential Amr, Lucia Donatelli, Nathan Schneider, William Croft, Michael Regan

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Digital Commons powered by bepress