Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

250 Full-Text Articles 445 Authors 37,428 Downloads 39 Institutions

All Articles in Computational Linguistics

Faceted Search

250 full-text articles. Page 4 of 12.

Learning Complex Inflectional Paradigms Through Blended Gradient Inputs, Eric R. Rosen 2019 Johns Hopkins University

Learning Complex Inflectional Paradigms Through Blended Gradient Inputs, Eric R. Rosen

Proceedings of the Society for Computation in Linguistics

Through Gradient Symbolic Computation (Smolensky and Goldrick, 2016), in which input forms can consist of gradient blends of more than one phonological realization, we propose a way of deriving surface forms in complex inflectional paradigms that dispenses with direct references to inflectional classes and relies solely on relatively simple blends of input expressions.


Learning Exceptionality Indices For French Variable Schwa Deletion, Aleksei Nazarov 2019 University of of Toronto

Learning Exceptionality Indices For French Variable Schwa Deletion, Aleksei Nazarov

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Rethinking Phonotactic Complexity, Tiago Pimentel, Brian Roark, Ryan Cotterell 2019 Kunumi and Department of Computer Science, Universidade Federal de Minas Gerais

Rethinking Phonotactic Complexity, Tiago Pimentel, Brian Roark, Ryan Cotterell

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Processing Non-Concatenative Morphology – A Developmental Computational Model, Tamar Johnson, Inbal Arnon 2019 The University of Edinburgh, The Hebrew University

Processing Non-Concatenative Morphology – A Developmental Computational Model, Tamar Johnson, Inbal Arnon

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Segmentation And Ur Acquisition With Ur Constraints, Max Nelson 2019 University of Massachusetts Amherst

Segmentation And Ur Acquisition With Ur Constraints, Max Nelson

Proceedings of the Society for Computation in Linguistics

This paper presents a model that treats segmentation and underlying representation acquisition as parallel, interacting processes. A probability distribution over mappings from underlying to surface forms is defined us- ing a Maximum Entropy grammar which weights a set of underlying representation constraints (URCs) (Apoussidou, 2007; Pater et al., 2012). URCs are induced from observed surface strings and used to generate candidates. Structural ambiguity arising from the com- parison of segmented outputs to unsegmented surface strings is handled with Expectation Maximization (Dempster et al., 1977; Jarosz, 2013). The model successfully learns a simple voicing assimilation rule and segmentation via correspondences between ...


On The Difficulty Of A Distributional Semantics Of Spoken Language, Grzegorz Chrupała, Lieke Gelderloos, Ákos Kádár, Afra Alishahi 2019 Tilburg University

On The Difficulty Of A Distributional Semantics Of Spoken Language, Grzegorz Chrupała, Lieke Gelderloos, Ákos Kádár, Afra Alishahi

Proceedings of the Society for Computation in Linguistics

In the domain of unsupervised learning most work on speech has focused on discovering low-level constructs such as phoneme inventories or word-like units. In contrast, for written language, where there is a large body of work on unsupervised induction of semantic representations of words, whole sentences and longer texts. In this study we examine the challenges of adapting these approaches from written to spoken language. We conjecture that unsupervised learning of the semantics of spoken language becomes feasible if we abstract from the surface variability. We simulate this setting with a dataset of utterances spoken by a realistic but uniform ...


Redtyp: A Database Of Reduplication With Computational Models, Hossep Dolatian, Jeffrey Heinz 2019 Stony Brook University

Redtyp: A Database Of Reduplication With Computational Models, Hossep Dolatian, Jeffrey Heinz

Proceedings of the Society for Computation in Linguistics

Reduplication is a theoretically and typologically well-studied phenomenon, but there is no database of reduplication patterns which include explicit computational models. This paper introduces RedTyp, an SQL database which provides a computational resource that can be used by both theoretical and computational linguists who work on reduplication. It catalogs 138 reduplicative morphemes across 91 languages, which are modeled with 57 distinct finite-state machines. The finite-state machines are 2-way transducers, which provide an explicit, compact, and convenient representation for reduplication patterns, and which arguably capture the linguistic generalizations more directly than the more commonly used 1-way transducers for modeling natural language ...


Using Sentiment Induction To Understand Variation In Gendered Online Communities, Lucy Li, Julia Mendelsohn 2019 Stanford University

Using Sentiment Induction To Understand Variation In Gendered Online Communities, Lucy Li, Julia Mendelsohn

Proceedings of the Society for Computation in Linguistics

We analyze gendered communities defined in three different ways: text, users, and sentiment. Differences across these representations reveal facets of communities' distinctive identities, such as social group, topic, and attitudes. Two communities may have high text similarity but not user similarity or vice versa, and word usage also does not vary according to a clearcut, binary perspective of gender. Community-specific sentiment lexicons demonstrate that sentiment can be a useful indicator of words' social meaning and community values, especially in the context of discussion content and user demographics. Our results show that social platforms such as Reddit are active settings for ...


Learning Exceptionality And Variation With Lexically Scaled Maxent, Coral Hughto, Andrew Lamont, Brandon Prickett, Gaja Jarosz 2019 University of Massachusetts, Amherst

Learning Exceptionality And Variation With Lexically Scaled Maxent, Coral Hughto, Andrew Lamont, Brandon Prickett, Gaja Jarosz

Proceedings of the Society for Computation in Linguistics

A growing body of research in phonology addresses the representation and learning of variable processes and exceptional, lexically conditioned processes. Linzen et al. (2013) present a MaxEnt model with additive lexical scales to account for data exhibiting both variation and exceptionality. In this paper, we implement a learning model for lexically scaled MaxEnt grammars which we show to be successful across a range of data containing patterns of variation and exceptionality. We also explore how the model's parameters and the rate of exceptionality in the data influence its performance and predictions for novel forms.


Learning Phonotactic Restrictions On Multiple Tiers, Kevin McMullin, Alëna Aksënova, Aniello De Santo 2019 University of Ottawa

Learning Phonotactic Restrictions On Multiple Tiers, Kevin Mcmullin, Alëna Aksënova, Aniello De Santo

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Evaluating Domain-General Learning Of Parametric Stress Typology, Gaja Jarosz, Aleksei Nazarov 2019 University of Massachusetts Amherst

Evaluating Domain-General Learning Of Parametric Stress Typology, Gaja Jarosz, Aleksei Nazarov

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Preface: Scil 2019 Editors’ Note, Gaja Jarosz, Max Nelson, Brendan O'Connor, Joe Pater 2019 University of Massachusetts Amherst

Preface: Scil 2019 Editors’ Note, Gaja Jarosz, Max Nelson, Brendan O'Connor, Joe Pater

Proceedings of the Society for Computation in Linguistics

No abstract provided.


Unsupervised Learning Of Cross-Lingual Symbol Embeddings Without Parallel Data, Mark Granroth-Wilding, Hannu Toivonen 2019 University of Helsinki

Unsupervised Learning Of Cross-Lingual Symbol Embeddings Without Parallel Data, Mark Granroth-Wilding, Hannu Toivonen

Proceedings of the Society for Computation in Linguistics

We present a new method for unsupervised learning of multilingual symbol (e.g. character) embeddings, without any parallel data or prior knowledge about correspondences between languages. It is able to exploit similarities across languages between the distributions over symbols' contexts of use within their language, even in the absence of any symbols in common to the two languages. In experiments with an artificially corrupted text corpus, we show that the method can retrieve character correspondences obscured by noise. We then present encouraging results of applying the method to real linguistic data, including for low-resourced languages. The learned representations open the ...


Modeling The Acquisition Of Words With Multiple Meanings, Libby Barak, Sammy Floyd, Adele Goldberg 2019 Princeton University

Modeling The Acquisition Of Words With Multiple Meanings, Libby Barak, Sammy Floyd, Adele Goldberg

Proceedings of the Society for Computation in Linguistics

Learning vocabulary is essential to successful communication. Complicating this task is the underappreciated fact that most common words are associated with multiple senses (are polysemous) (e.g., baseball cap vs. cap of a bottle), while other words are homonymous, evoking meanings that are unrelated to one another (e.g., baseball bat vs. flying bat). Models of human word learning have thus far failed to represent this level of naturalistic complexity. We extend a feature-based computational model to allow for multiple meanings, while capturing the gradient distinction between polysemy and homonymy by using structured sets of features. Results confirm that the ...


Evaluation Order Effects In Dynamic Continuized Ccg: From Negative Polarity Items To Balanced Punctuation, Michael White 2019 The Ohio State University

Evaluation Order Effects In Dynamic Continuized Ccg: From Negative Polarity Items To Balanced Punctuation, Michael White

Proceedings of the Society for Computation in Linguistics

Combinatory Categorial Grammar's (CCG; Steedman, 2000) flexible

treatment of word order and constituency enable it to employ a compact

lexicon, an important factor in its successful application to a range

of NLP problems. However, its word order flexibility can be

problematic for linguistic phenomena where linear order plays a key

role. In this paper, we show that the enhanced control over

evaluation order afforded by Continuized CCG (Barker & Shan, 2014)

makes it possible to not only implement an improved analysis of

negative polarity items in Dynamic Continuized CCG (White et al.,

2017) but also to develop an accurate treatment ...


Abstract Meaning Representation For Human-Robot Dialogue, Claire N. Bonial, Lucia Donatelli, Jessica Ervin, Clare R. Voss 2019 U.S. Army Research Lab

Abstract Meaning Representation For Human-Robot Dialogue, Claire N. Bonial, Lucia Donatelli, Jessica Ervin, Clare R. Voss

Proceedings of the Society for Computation in Linguistics

In this research, we begin to tackle the

challenge of natural language understanding

(NLU) in the context of the development of

a robot dialogue system. We explore the adequacy

of Abstract Meaning Representation

(AMR) as a conduit for NLU. First, we consider

the feasibility of using existing AMR

parsers for automatically creating meaning

representations for robot-directed transcribed

speech data. We evaluate the quality of output

of two parsers on this data against a manually

annotated gold-standard data set. Second,

we evaluate the semantic coverage and distinctions

made in AMR overall: how well does it

capture the meaning and distinctions needed ...


Augmentic Compositional Models For Knowledge Base Completion Using Gradient Representations, Matthias R. Lalisse, Paul Smolensky 2019 Johns Hopkins University

Augmentic Compositional Models For Knowledge Base Completion Using Gradient Representations, Matthias R. Lalisse, Paul Smolensky

Proceedings of the Society for Computation in Linguistics

Neural models of Knowledge Base data have typically employed compositional representations of graph objects: entity and relation embeddings are systematically combined to evaluate the truth of a candidate Knowedge Base entry. Using a model inspired by Harmonic Grammar, we propose to tokenize triplet embeddings by subjecting them to a process of optimization with respect to learned well-formedness conditions on Knowledge Base triplets. The resulting model, known as Gradient Graphs, leads to sizable improvements when implemented as a companion to compositional models. Also, we show that the "supracompositional" triplet token embeddings it produces have interpretable properties that prove helpful in performing ...


On Evaluating The Generalization Of Lstm Models In Formal Languages, Mirac Suzgun, Yonatan Belinkov, Stuart M. Shieber 2019 Harvard University

On Evaluating The Generalization Of Lstm Models In Formal Languages, Mirac Suzgun, Yonatan Belinkov, Stuart M. Shieber

Proceedings of the Society for Computation in Linguistics

Recurrent Neural Networks (RNNs) are theoretically Turing-complete and established themselves as a dominant model for language processing. Yet, there still remains an uncertainty regarding their language learning capabilities. In this paper, we empirically evaluate the inductive learning capabilities of Long Short-Term Memory networks, a popular extension of simple RNNs, to learn simple formal languages, in particular anbn, anbncn, and anbncndn. We investigate the influence of various aspects of learning, such as training data regimes and model capacity, on the generalization to unobserved samples. We find ...


Empty Categories Help Parse The Overt, Weiwei Sun 2019 Peking University

Empty Categories Help Parse The Overt, Weiwei Sun

Proceedings of the Society for Computation in Linguistics

This paper is concerned with whether deep syntactic information can help surface parsing, with a particular focus on empty categories. We consider data-driven dependency parsing with both linear and neural disambiguation models. We find that the information about empty categories is helpful to reduce the approximation error in a structured prediction based parsing model, but increases the search space for inference and accordingly the estimation error. To deal with structure-based overfitting, we propose to integrate disambiguation models with and without empty elements. Experiments on English and Chinese TreeBanks indicate that incorporating empty elements consistently improves surface parsing.


Temporally-Oriented Possession: A Corpus For Tracking Possession Over Time, Dhivya I. Chinnappa, Alexis Palmer, Eduardo Blanco 2019 University of North Texas

Temporally-Oriented Possession: A Corpus For Tracking Possession Over Time, Dhivya I. Chinnappa, Alexis Palmer, Eduardo Blanco

Proceedings of the Society for Computation in Linguistics

This abstract presents a new corpus for temporally-oriented possession or tracking concrete objects as they change hands over time. We annotate Wikipedia articles for 90 different well-known artifacts (paintings, diamonds, and archaeological artifacts), producing 799 artifact-possessor relations covering 735 unique possessors. Each possession relation is annotated with features capturing duration of possession, as well as the certainty of the possession according to textual evidence. A possession timeline is then produced for each artifact. This corpus provides a foundation for analysis of temporally-oriented possession, as well as work on automatic production of possession timelines.


Digital Commons powered by bepress