Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

228 Full-Text Articles 340 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

228 full-text articles. Page 7 of 10.

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher 2017 CLASP, University of Gothenburg, Sweden

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder 2017 Harvard University

Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder

Dartmouth Scholarship

Authors often convey meaning by referring to or imitating prior works of literature, a process that creates complex networks of literary relationships (“intertextuality”) and contributes to cultural evolution. In this paper, we use techniques from stylometry and machine learning to address subjective literary critical questions about Latin literature, a corpus marked by an extraordinary concentration of intertextuality. Our work, which we term “quantitative criticism,” focuses on case studies involving two influential Roman authors, the playwright Seneca and the historian Livy. We find that four plays related to but distinct from Seneca’s main writings are differentiated from the rest of the …


Es-Esa: An Information Retrieval Prototype Using Explicit Semantic Analysis And Elasticsearch, Brian D. Sloan 2017 The Graduate Center, City University of New York

Es-Esa: An Information Retrieval Prototype Using Explicit Semantic Analysis And Elasticsearch, Brian D. Sloan

Dissertations, Theses, and Capstone Projects

Many modern information retrieval systems work by using keyword search to locate documents in an inverted index by matching those documents based on terms in a user’s query. While highly effective for many use-cases, one notable drawback to simple keyword-based searching is that the contextual knowledge surrounding the user’s underlying information need may be lost, particularly if the user’s query terms are ambiguous or have multiple meanings. Research in the field of semantic search aims to make progress towards resolving this. One methodology in particular, explicit semantic analysis, works by modeling a document not only as a set of …


Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schütte, Brian Mac Namee, John D. Kelleher 2017 Technological University Dublin

Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schütte, Brian Mac Namee, John D. Kelleher

Articles

Errors in visual perception may cause problems in situated dialogues. We investigated this problem through an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot.We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions.We then introduced different methods for the user to request information about the robot’s understanding of the environment. We quantify the impact of perception errors on the dialogues, and investigate resolution attempts by users at a structural level and at the level of referring expressions.


Vanilla Sequence-To-Sequence Neural Nets Cannot Model Reduplication, Brandon Prickett 2017 University of Massachusetts Amherst

Vanilla Sequence-To-Sequence Neural Nets Cannot Model Reduplication, Brandon Prickett

OWP Linguistics

This paper presents results from a series of simulations that attempted to teach a vanilla sequence-to-sequence neural network a reduplication process. These attempts did not succeed, suggesting that added machinery is necessary for connectionist models to perform such a task.


Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner 2017 Montclair State University

Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner

Department of Linguistics Faculty Scholarship and Creative Works

We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results con rm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust …


Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb 2017 University of Kentucky

Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb

Theses and Dissertations--Linguistics

In this thesis I attempt to model, that is, computationally reproduce, the natural transmission (i.e. inflectional regularities) of twenty present tense Amharic verbs (i.e. triradicals beginning with consonants) as used by the language’s speakers. I root my approach in the linguistic theory of network morphology (NM) and model it using the DATR evaluator. In Chapter 1, I provide an overview of Amharic and discuss the fidel as an abugida, the verb system’s root-and-pattern morphology, and how radicals of each lexeme interacts with prefixes and suffixes. I offer an overview of NM in Chapter 2 and DATR in Chapter 3. In …


Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner 2016 Montclair State University

Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner

Jonathan Howell

We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results con rm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust …


Towards Multipurpose Readability Assessment, Ion Madrazo 2016 Boise State University

Towards Multipurpose Readability Assessment, Ion Madrazo

Boise State University Theses and Dissertations

Readability refers to the ease with which a reader can understand a text. Automatic readability assessment has been widely studied over the past 50 years. However, most of the studies focus on the development of tools that apply either to a single language, domain, or document type. This supposes duplicate efforts for both developers, who need to integrate multiple tools in their systems, and final users, who have to deal with incompatibilities among the readability scales of different tools. In this manuscript, we present MultiRead, a multipurpose readability assessment tool capable of predicting the reading difficulty of texts of varied …


Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John D. Kelleher 2016 simon.dobnik@gu.se

Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John D. Kelleher

Conference papers

In this paper we examine how people negotiate, interpret and repair the frame of reference (FoR) in online text based dialogues discussing spatial scenes in Swedish. We describe work-in-progress in which participants are given different perspectives of the same scene and asked to locate several objects that are only shown on one of their pictures. This task requires participants to coordinate on FoR in order to identify the missing objects. This study has implications for situated dialogue systems.


Extending Hidden Structure Learning: Features, Opacity, And Exceptions, Aleksei I. Nazarov 2016 University of Massachusetts Amherst

Extending Hidden Structure Learning: Features, Opacity, And Exceptions, Aleksei I. Nazarov

Doctoral Dissertations

This dissertation explores new perspectives in phonological hidden structure learning (inferring structure not present in the speech signal that is necessary for phonological analysis; Tesar 1998, Jarosz 2013a, Boersma and Pater 2016), and extends this type of learning towards the domain of phonological features, towards derivations in Stratal OT (Bermúdez-Otero 1999), and towards exceptionality indices in probabilistic OT. Two more specific themes also come out: the possibility of inducing instead of pre-specifying the space of possible hidden structures, and the importance of cues in the data for triggering the use of hidden structure. In chapters 2 and 4, phonological features …


Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr. 2016 University of New Mexico

Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr.

Linguistics ETDs

Speakers commonly re-purpose existing forms in the mental lexicon to create novel form-meaning. Contemporary evidence that such innovation processes have occurred historically is attested in varying degrees of polysemy in the mental lexicon. This dissertation considers speaker motivations underlying these innnovation processes historically. Strong synchronic relationships between frequency and degree of polysemy, on one hand, and frequency and lexical access, on the other hand, have traditionally been interpreted as evidence for the primacy of economic motivations in processes of lexical innovation. In contrast, the cognitive processes that most commonly facilitate innovation, metaphor and metonymy, have largely been described as processes …


An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang 2016 The Graduate Center, City University of New York

An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang

Dissertations, Theses, and Capstone Projects

This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging. By comparing the …


An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz 2016 The Graduate Center, City University of New York

An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz

Dissertations, Theses, and Capstone Projects

In recent years, Twitter has become a popular testing ground for techniques in authorship attribution. This is due to both the ease of building large corpora as well as the challenges associated with the character limit imposed by the service and the writing styles that have developed as a result. As both false and genuine claims of hacked Twitter accounts have made international news, there is an increasing need for this type of work. For newer Twitter accounts, however, there is little training data. Thus, this study looks to lay the groundwork for cross-domain authorship attribution: training on one source …


Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow 2016 Ursinus College

Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow

Computer Science Summer Fellows

The rise in the use of social media and particularly the rise of adolescent use has led to a new means of bullying. Cyber-bullying has proven consequential to youth internet users causing a need for a response. In order to effectively stop this problem we need a verified method of detecting cyber-bullying in online text; we aim to find that method. For this project we look at thirteen thousand labeled posts from Formspring and create a bank of words used in the posts. First the posts are cleaned up by taking out punctuation, normalizing emoticons, and removing high and low …


Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley 2016 Ursinus College

Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley

Computer Science Summer Fellows

Cyberbullying is a type of bullying that uses technology such as cell phones to harass or malign another person. To detect acts of cyberbullying, we are developing an algorithm that will detect cyberbullying in SMS (text) messages. Over 80,000 text messages have been collected by software installed on cell phones carried by participants in our study. This paper describes the development of the algorithm to detect cyberbullying messages, using the cell phone data collected previously. The algorithm works by first separating the messages into conversations in an automated way. The algorithm then analyzes the conversations and scores the severity and …


Event Parsing In Narrative: Trials And Tribulations Of Archaic English Fairy Tales, Rebecca Lovering 2016 Graduate Center, City University of New York

Event Parsing In Narrative: Trials And Tribulations Of Archaic English Fairy Tales, Rebecca Lovering

Dissertations, Theses, and Capstone Projects

While event extraction and automatic summarization have taken great strides in the realm of news stories, fictional narratives like fairy tales have not been so fortunate. A number of challenges arise from the literary elements present in fairy tales that are not found in more straightforward corpora of natural language, such as archaic expressions and sentence structures. To aid in summarization of fictional texts, I created an class - a template for a digital object, in this case a semantic and story event - that captures elements predicted to help classify events as important for inclusion. I wrote a processor …


Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind 2016 Graduate Center, City University of New York

Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind

Dissertations, Theses, and Capstone Projects

The process of producing written text is complex and constrained by pressures that range from physical to psychological. In a series of three sets of experiments, this thesis demonstrates the effects of linguistic context on the timing patterns of the production of keystrokes. We elucidate the effect of linguistic context at three different levels of granularity: The first set of experiments illustrate how the nontraditional syntax of a single linguistic construct, the multi-word expression, can create significant changes in keystroke production patterns. This set of experiments is followed by a set of experiments that test the hypothesis on the entire …


Nondescript: A Web Tool To Aid Subversion Of Authorship Attribution, Robin Davis 2016 Graduate Center, City University of New York

Nondescript: A Web Tool To Aid Subversion Of Authorship Attribution, Robin Davis

Dissertations, Theses, and Capstone Projects

A person’s writing style is uniquely quantifiable and can serve reliably as a biometric. A writer who wishes to remain anonymous can use a number of privacy technologies but can still be identified simply by the words they choose to use — how frequently they use common words like “of,” for instance. Nondescript is a web tool designed first to identify the user’s writing style in terms of word frequency from a given writing sample and document, then to suggest how the author can change their document to lessen its probability of being attributed to them. While Nondescript does not …


Data-Driven Synthesis And Evaluation Of Syntactic Facial Expressions In American Sign Language Animation, Hernisa Kacorri 2016 Graduate Center, City University of New York

Data-Driven Synthesis And Evaluation Of Syntactic Facial Expressions In American Sign Language Animation, Hernisa Kacorri

Dissertations, Theses, and Capstone Projects

Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions, …


Digital Commons powered by bepress