Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

107 Full-Text Articles 141 Authors 16883 Downloads 37 Institutions

All Articles in Computational Linguistics

Faceted Search

107 full-text articles. Page 1 of 5.

Back To The Future: Logic And Machine Learning, Simon Dobnik, John Kelleher 2017 CLASP, University of Gothenburg, Sweden

Back To The Future: Logic And Machine Learning, Simon Dobnik, John Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schuette, Brian Mac Namee, John Kelleher 2017 Dublin Institute of Technology

Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schuette, Brian Mac Namee, John Kelleher

Articles

Errors in visual perception may cause problems in situated dialogues. We investigated this problem through an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot.We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions.We then introduced different methods for the user to request information about the robot’s understanding of the environment. We quantify the impact of perception errors on the dialogues, and investigate resolution attempts by users at a structural level and at the level of referring expressions.


Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb 2017 University of Kentucky

Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb

Theses and Dissertations--Linguistics

In this thesis I attempt to model, that is, computationally reproduce, the natural transmission (i.e. inflectional regularities) of twenty present tense Amharic verbs (i.e. triradicals beginning with consonants) as used by the language’s speakers. I root my approach in the linguistic theory of network morphology (NM) and model it using the DATR evaluator. In Chapter 1, I provide an overview of Amharic and discuss the fidel as an abugida, the verb system’s root-and-pattern morphology, and how radicals of each lexeme interacts with prefixes and suffixes. I offer an overview of NM in Chapter 2 and DATR ...


Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John Kelleher 2016 simon.dobnik@gu.se

Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John Kelleher

Conference papers

In this paper we examine how people negotiate, interpret and repair the frame of reference (FoR) in online text based dialogues discussing spatial scenes in Swedish. We describe work-in-progress in which participants are given different perspectives of the same scene and asked to locate several objects that are only shown on one of their pictures. This task requires participants to coordinate on FoR in order to identify the missing objects. This study has implications for situated dialogue systems.


Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr. 2016 University of New Mexico

Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr.

Linguistics ETDs

Speakers commonly re-purpose existing forms in the mental lexicon to create novel form-meaning. Contemporary evidence that such innovation processes have occurred historically is attested in varying degrees of polysemy in the mental lexicon. This dissertation considers speaker motivations underlying these innnovation processes historically. Strong synchronic relationships between frequency and degree of polysemy, on one hand, and frequency and lexical access, on the other hand, have traditionally been interpreted as evidence for the primacy of economic motivations in processes of lexical innovation. In contrast, the cognitive processes that most commonly facilitate innovation, metaphor and metonymy, have largely been described as processes ...


An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz 2016 The Graduate Center, City University of New York

An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

In recent years, Twitter has become a popular testing ground for techniques in authorship attribution. This is due to both the ease of building large corpora as well as the challenges associated with the character limit imposed by the service and the writing styles that have developed as a result. As both false and genuine claims of hacked Twitter accounts have made international news, there is an increasing need for this type of work. For newer Twitter accounts, however, there is little training data. Thus, this study looks to lay the groundwork for cross-domain authorship attribution: training on one source ...


An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang 2016 The Graduate Center, City University of New York

An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging ...


Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow 2016 Ursinus College

Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow

Computer Science Summer Fellows

The rise in the use of social media and particularly the rise of adolescent use has led to a new means of bullying. Cyber-bullying has proven consequential to youth internet users causing a need for a response. In order to effectively stop this problem we need a verified method of detecting cyber-bullying in online text; we aim to find that method. For this project we look at thirteen thousand labeled posts from Formspring and create a bank of words used in the posts. First the posts are cleaned up by taking out punctuation, normalizing emoticons, and removing high and low ...


The Language Identification Problem: Formant Analysis And Cross-Linguistic Uniqueness, Lyndon Rey 2016 The University of Western Ontario

The Language Identification Problem: Formant Analysis And Cross-Linguistic Uniqueness, Lyndon Rey

Western Papers in Linguistics / Cahiers linguistiques de Western

In the field of computational linguistics, spoken language recognition (through the use of wordlists and morphological markers) is a resource-intensive process: the input must be parsed from the inputted speech signal, words must be hypothesized, and then subsequently word-lists for any likely language must be iterated through. To note, spoken language recognition does not refer to the process of identifying the meaning of the input; rather, it is finding the language of which the speaker is speaking (not necessarily 'parsing' the input). In my research, the question of whether a language can be positively and uniquely identified through small nuances ...


Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley 2016 Ursinus College

Detection Of Cyberbullying In Sms Messaging, Bryan W. Bradley

Computer Science Summer Fellows

Cyberbullying is a type of bullying that uses technology such as cell phones to harass or malign another person. To detect acts of cyberbullying, we are developing an algorithm that will detect cyberbullying in SMS (text) messages. Over 80,000 text messages have been collected by software installed on cell phones carried by participants in our study. This paper describes the development of the algorithm to detect cyberbullying messages, using the cell phone data collected previously. The algorithm works by first separating the messages into conversations in an automated way. The algorithm then analyzes the conversations and scores the severity ...


Nondescript: A Web Tool To Aid Subversion Of Authorship Attribution, Robin Davis 2016 Graduate Center, City University of New York

Nondescript: A Web Tool To Aid Subversion Of Authorship Attribution, Robin Davis

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

A person’s writing style is uniquely quantifiable and can serve reliably as a biometric. A writer who wishes to remain anonymous can use a number of privacy technologies but can still be identified simply by the words they choose to use — how frequently they use common words like “of,” for instance. Nondescript is a web tool designed first to identify the user’s writing style in terms of word frequency from a given writing sample and document, then to suggest how the author can change their document to lessen its probability of being attributed to them. While Nondescript does ...


Event Parsing In Narrative: Trials And Tribulations Of Archaic English Fairy Tales, Rebecca Lovering 2016 Graduate Center, City University of New York

Event Parsing In Narrative: Trials And Tribulations Of Archaic English Fairy Tales, Rebecca Lovering

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

While event extraction and automatic summarization have taken great strides in the realm of news stories, fictional narratives like fairy tales have not been so fortunate. A number of challenges arise from the literary elements present in fairy tales that are not found in more straightforward corpora of natural language, such as archaic expressions and sentence structures. To aid in summarization of fictional texts, I created an class - a template for a digital object, in this case a semantic and story event - that captures elements predicted to help classify events as important for inclusion. I wrote a processor to run ...


Data-Driven Synthesis And Evaluation Of Syntactic Facial Expressions In American Sign Language Animation, Hernisa Kacorri 2016 Graduate Center, City University of New York

Data-Driven Synthesis And Evaluation Of Syntactic Facial Expressions In American Sign Language Animation, Hernisa Kacorri

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

Technology to automatically synthesize linguistically accurate and natural-looking animations of American Sign Language (ASL) would make it easier to add ASL content to websites and media, thereby increasing information accessibility for many people who are deaf and have low English literacy skills. State-of-art sign language animation tools focus mostly on accuracy of manual signs rather than on the facial expressions. We are investigating the synthesis of syntactic ASL facial expressions, which are grammatically required and essential to the meaning of sentences. In this thesis, we propose to: (1) explore the methodological aspects of evaluating sign language animations with facial expressions ...


Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind 2016 Graduate Center, City University of New York

Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind

All Graduate Works by Year: Dissertations, Theses, and Capstone Projects

The process of producing written text is complex and constrained by pressures that range from physical to psychological. In a series of three sets of experiments, this thesis demonstrates the effects of linguistic context on the timing patterns of the production of keystrokes. We elucidate the effect of linguistic context at three different levels of granularity: The first set of experiments illustrate how the nontraditional syntax of a single linguistic construct, the multi-word expression, can create significant changes in keystroke production patterns. This set of experiments is followed by a set of experiments that test the hypothesis on the entire ...


A Conventional Orthography For Maghrebi Arabic, Houcemeddine Turki, Emad Adel, Tariq Daouda, Nassim Regragui 2016 Lycée secondaire Sbikha 1979

A Conventional Orthography For Maghrebi Arabic, Houcemeddine Turki, Emad Adel, Tariq Daouda, Nassim Regragui

Imed Adel


Maghrebi Arabic is the set of dialects of the Arabic language spoken in the Maghreb (Tunisia, Algeria, Morocco, Libya and Mauritania). This set of dialects is under-resourced and has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Maghrebi Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Maghrebi Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for ...


A Conventional Orthography For Maghrebi Arabic, Houcemeddine Turki, Emad Adel, Tariq Daouda, Nassim Regragui 2016 Lycée secondaire Sbikha 1979

A Conventional Orthography For Maghrebi Arabic, Houcemeddine Turki, Emad Adel, Tariq Daouda, Nassim Regragui

Houcemeddine Turki


Maghrebi Arabic is the set of dialects of the Arabic language spoken in the Maghreb (Tunisia, Algeria, Morocco, Libya and Mauritania). This set of dialects is under-resourced and has neither a standard orthography nor large collections of written text and dictionaries. Actually, there is no strict separation between Modern Standard Arabic, the official language of the government, media and education, and Maghrebi Arabic; the two exist on a continuum dominated by mixed forms. In this paper, we present a conventional orthography for Maghrebi Arabic, following a previous effort on developing a conventional orthography for Dialectal Arabic (or CODA) demonstrated for ...


Argumentation Mining In Parliamentary Discourse, Nona Naderi 2016 University of Toronto

Argumentation Mining In Parliamentary Discourse, Nona Naderi

OSSA Conference Archive

In parliamentary discourse, politicians expound their beliefs and goals through argumentation, and, to persuade the audience, they communicate their values by highlighting some aspect of an issue, an action which is commonly known as framing. The choices of frames are typically dependent upon the speaker’s ideology.

In this proposed doctoral work, we will computationally analyze framing strategies and present a model for discovering the latent structure of framing of real-world issues in Canadian parliamentary discourse.


Cest: City Event Summarization Using Twitter, Deepa Mallela 2016 Boise State University

Cest: City Event Summarization Using Twitter, Deepa Mallela

Computer Science Graduate Projects and Theses

Twitter, with 288 million active users, has become the most popular platform for continuous real-time discussions. This leads to huge amounts of information related to the real-world, which has attracted researchers from both academia and industry. Event detection on Twitter has gained attention as one of the most popular domains of interest within the research community. Unfortunately, existing event detection methodologies have yet to fully explore Twitter metadata and instead rely solely on identifying events based on prior information or focus on events that belong to specific categories. Given the heavy volume of tweets that discuss events, summarization techniques can ...


Cell Phone Ethnography: Mixed Methods And The Brand Consumer Relationship, Robert Nathaniel Dove 2016 University of Tennessee - Knoxville

Cell Phone Ethnography: Mixed Methods And The Brand Consumer Relationship, Robert Nathaniel Dove

Masters Theses

Overall, the goal of this study is to identify and differentiate the various motivations and cultural influences that can be used to explain consumer behavior. In doing so, this study hopes to facilitate the development of new and innovative marketing strategies, providing a new research design for the ethnographer’s toolkit. More importantly, this model can give shape to new constructs and new variables for further empirical testing in the field through quantitative and qualitative methods. By blending the two approaches, using qualitative interpretive anthropological analysis by field study with quantitative sentiment analysis adapted from market researcher Jeffery Breen’s ...


Phonemic Conversion As The Ideal Romanization Scheme For Hebrew: Implications For Hebrew Cataloging, Uzzi Ornan, Rachel Leket-Mor 2016 Hebrew University of Jerusalem and Technion—Israel Institute of Technology

Phonemic Conversion As The Ideal Romanization Scheme For Hebrew: Implications For Hebrew Cataloging, Uzzi Ornan, Rachel Leket-Mor

Judaica Librarianship

This paper examines a romanization scheme developed by linguist Uzzi Ornan that has not been considered for implementation in libraries. Phonemic conversion of Hebrew neither uses transliteration nor transcription strategies but reconstructs the theoretical structure of the original Hebrew word based on its phonemes. The article describes this scheme and its benefits, which include full coverage of all historical periods and script modes of Hebrew, and full reversibility, complete with an online interface that enables automatic conversion. The article compares the suggested phonemic conversion scheme with the ALA/LC Romanization of Hebrew and provides a history of previously attempted reversal ...


Digital Commons powered by bepress