Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

233 Full-Text Articles 347 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

233 full-text articles. Page 7 of 11.

#Hashtags: A Look At The Evaluative Roles Of Hashtags On Twitter, Leah Rose Schaede 2018 University of Kentucky

#Hashtags: A Look At The Evaluative Roles Of Hashtags On Twitter, Leah Rose Schaede

Theses and Dissertations--Linguistics

Social media has become a large part of today’s pop culture and keeping up with what is going on not only in our social circles, but around the world. It has given many a platform to unite their causes, build fandoms, and share their commentary with the world. A tool in helping group posts together or give commentary on a thought is the hashtag. In this paper I explore the evaluative roles of hashtags in social media discourse, specifically on Twitter. I use a sample of randomly selected tweets from the Twitter API stream I collected and compiled myself. I …


A Markedly Different Approach: Investigating Pie Stops Using Modern Empirical Methods, Phillip Barnett 2018 University of Kentucky

A Markedly Different Approach: Investigating Pie Stops Using Modern Empirical Methods, Phillip Barnett

Theses and Dissertations--Linguistics

In this thesis, I investigate a decades-old problem found in the stop system of Proto-Indo-European (PIE). More specifically, I will be investigating the paucity of */b/ in the forms reconstructed for the ancient, hypothetical language. As cross-linguistic evidence and phonological theory alone have fallen short of providing a satisfactory answer, herein will I employ modern empirical methods of linguistic investigation, namely laboratory phonology experiments and computational database analysis. Following Byrd 2015, I advocate for an examination of synchronic phenomena and behavior as a method for investigating diachronic change.

In Chapter 1, I present an overview of the various proposed phonological …


Cloud‐Based Text Analytics Harvesting, Cleaning And Analyzing Corporate Earnings Conference Calls, Michael Chuancai Zhang, Vikram Gazula, Dan Stone, Hong Xie 2017 University of Kentucky

Cloud‐Based Text Analytics Harvesting, Cleaning And Analyzing Corporate Earnings Conference Calls, Michael Chuancai Zhang, Vikram Gazula, Dan Stone, Hong Xie

Commonwealth Computational Summit

No abstract provided.


Cloud-Based Text Analytics: Harvesting, Cleaning And Analyzing Corporate Earnings Conference Calls, Michael Chuancai Zhang, Vikram Gazula, Dan Stone, Hong Xie 2017 University of Kentucky

Cloud-Based Text Analytics: Harvesting, Cleaning And Analyzing Corporate Earnings Conference Calls, Michael Chuancai Zhang, Vikram Gazula, Dan Stone, Hong Xie

Commonwealth Computational Summit

Does management language cohesion in earnings conference calls matter to the capital market? As a part of the research on the above question, and taking advantage of the modern IT technologies, this project:

  • harvested 115,882 earnings conference call transcripts from SeekingAlpha.com
  • parsed and structured 89,988 transcripts using regular expressions in Stata
  • analyzed 179,976 text files using Amazon Elastic Compute Cloud (Amazon EC2), which
  • saved almost 2 years (675 days) of the project time
As this project is related to big data, text analytics, and big computing, it may be a good case to show how we can benefit from modern …


A Sentiment Analysis Of Language & Gender Using Word Embedding Models, Ellyn Rolleston Keith 2017 The Graduate Center, City University of New York

A Sentiment Analysis Of Language & Gender Using Word Embedding Models, Ellyn Rolleston Keith

Dissertations, Theses, and Capstone Projects

Since Robin Lakoff started the conversation around language and gender with her 1975 essay “Language and Woman’s Place,” extensive work has been done on analyzing sociolinguistics associated with gender. While much work has been done on the differences between how men and women use language, there is less research to be found on language about women as opposed to language about men. In this work, I build a word embedding model from a corpus of Wikipedia film summaries and use this model to create lists of words associated with men and words associated with women. I then use sentiment analysis …


Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher 2017 CLASP, University of Gothenburg, Sweden

Back To The Future: Logic And Machine Learning, Simon Dobnik, John D. Kelleher

Conference papers

In this paper we argue that since the beginning of the natural language processing or computational linguistics there has been a strong connection between logic and machine learning. First of all, there is something logical about language or linguistic about logic. Secondly, we argue that rather than distinguishing between logic and machine learning, a more useful distinction is between top-down approaches and data-driven approaches. Examining some recent approaches in deep learning we argue that they incorporate both properties and this is the reason for their very successful adoption to solve several problems within language technology.


Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder 2017 Harvard University

Quantitative Criticism Of Literary Relationships, Joseph P. Dexter, Theodore Katz, Nilesh Tripuraneni, Tathagata Dasgupta, Ajay Kannan, James Brofos, Jorge A. Bonilla Lopez, Lea Schroeder

Dartmouth Scholarship

Authors often convey meaning by referring to or imitating prior works of literature, a process that creates complex networks of literary relationships (“intertextuality”) and contributes to cultural evolution. In this paper, we use techniques from stylometry and machine learning to address subjective literary critical questions about Latin literature, a corpus marked by an extraordinary concentration of intertextuality. Our work, which we term “quantitative criticism,” focuses on case studies involving two influential Roman authors, the playwright Seneca and the historian Livy. We find that four plays related to but distinct from Seneca’s main writings are differentiated from the rest of the …


Es-Esa: An Information Retrieval Prototype Using Explicit Semantic Analysis And Elasticsearch, Brian D. Sloan 2017 The Graduate Center, City University of New York

Es-Esa: An Information Retrieval Prototype Using Explicit Semantic Analysis And Elasticsearch, Brian D. Sloan

Dissertations, Theses, and Capstone Projects

Many modern information retrieval systems work by using keyword search to locate documents in an inverted index by matching those documents based on terms in a user’s query. While highly effective for many use-cases, one notable drawback to simple keyword-based searching is that the contextual knowledge surrounding the user’s underlying information need may be lost, particularly if the user’s query terms are ambiguous or have multiple meanings. Research in the field of semantic search aims to make progress towards resolving this. One methodology in particular, explicit semantic analysis, works by modeling a document not only as a set of …


Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schütte, Brian Mac Namee, John D. Kelleher 2017 Technological University Dublin

Robot Perception Errors And Human Resolution Strategies In Situated Human-Robot Dialogue, Niels Schütte, Brian Mac Namee, John D. Kelleher

Articles

Errors in visual perception may cause problems in situated dialogues. We investigated this problem through an experiment in which human participants interacted through a natural language dialogue interface with a simulated robot.We introduced errors into the robot’s perception, and observed the resulting problems in the dialogues and their resolutions.We then introduced different methods for the user to request information about the robot’s understanding of the environment. We quantify the impact of perception errors on the dialogues, and investigate resolution attempts by users at a structural level and at the level of referring expressions.


Vanilla Sequence-To-Sequence Neural Nets Cannot Model Reduplication, Brandon Prickett 2017 University of Massachusetts Amherst

Vanilla Sequence-To-Sequence Neural Nets Cannot Model Reduplication, Brandon Prickett

OWP Linguistics

This paper presents results from a series of simulations that attempted to teach a vanilla sequence-to-sequence neural network a reduplication process. These attempts did not succeed, suggesting that added machinery is necessary for connectionist models to perform such a task.


Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner 2017 Montclair State University

Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner

Department of Linguistics Faculty Scholarship and Creative Works

We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results con rm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust …


Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb 2017 University of Kentucky

Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb

Theses and Dissertations--Linguistics

In this thesis I attempt to model, that is, computationally reproduce, the natural transmission (i.e. inflectional regularities) of twenty present tense Amharic verbs (i.e. triradicals beginning with consonants) as used by the language’s speakers. I root my approach in the linguistic theory of network morphology (NM) and model it using the DATR evaluator. In Chapter 1, I provide an overview of Amharic and discuss the fidel as an abugida, the verb system’s root-and-pattern morphology, and how radicals of each lexeme interacts with prefixes and suffixes. I offer an overview of NM in Chapter 2 and DATR in Chapter 3. In …


Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner 2016 Montclair State University

Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner

Jonathan Howell

We present a new methodological approach which combines both naturally-occurring speech harvested on the web and speech data elicited in the laboratory. This proof-of-concept study examines the phenomenon of focus sensitivity in English, in which the interpretation of particular grammatical constructions (e.g., the comparative) is sensitive to the location of prosodic prominence. Machine learning algorithms (support vector machines and linear discriminant analysis) and human perception experiments are used to cross-validate the web-harvested and lab-elicited speech. Results con rm the theoretical predictions for location of prominence in comparative clauses and the advantages using both web-harvested and lab-elicited speech. The most robust …


Towards Multipurpose Readability Assessment, Ion Madrazo 2016 Boise State University

Towards Multipurpose Readability Assessment, Ion Madrazo

Boise State University Theses and Dissertations

Readability refers to the ease with which a reader can understand a text. Automatic readability assessment has been widely studied over the past 50 years. However, most of the studies focus on the development of tools that apply either to a single language, domain, or document type. This supposes duplicate efforts for both developers, who need to integrate multiple tools in their systems, and final users, who have to deal with incompatibilities among the readability scales of different tools. In this manuscript, we present MultiRead, a multipurpose readability assessment tool capable of predicting the reading difficulty of texts of varied …


Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John D. Kelleher 2016 simon.dobnik@gu.se

Towards A Computational Model Of Frame Of Reference Alignment In Swedish Dialogue, Simon Dobnik, Christine Howes, Kim Demaret, John D. Kelleher

Conference papers

In this paper we examine how people negotiate, interpret and repair the frame of reference (FoR) in online text based dialogues discussing spatial scenes in Swedish. We describe work-in-progress in which participants are given different perspectives of the same scene and asked to locate several objects that are only shown on one of their pictures. This task requires participants to coordinate on FoR in order to identify the missing objects. This study has implications for situated dialogue systems.


Extending Hidden Structure Learning: Features, Opacity, And Exceptions, Aleksei I. Nazarov 2016 University of Massachusetts Amherst

Extending Hidden Structure Learning: Features, Opacity, And Exceptions, Aleksei I. Nazarov

Doctoral Dissertations

This dissertation explores new perspectives in phonological hidden structure learning (inferring structure not present in the speech signal that is necessary for phonological analysis; Tesar 1998, Jarosz 2013a, Boersma and Pater 2016), and extends this type of learning towards the domain of phonological features, towards derivations in Stratal OT (Bermúdez-Otero 1999), and towards exceptionality indices in probabilistic OT. Two more specific themes also come out: the possibility of inducing instead of pre-specifying the space of possible hidden structures, and the importance of cues in the data for triggering the use of hidden structure. In chapters 2 and 4, phonological features …


Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr. 2016 University of New Mexico

Lexical Variation, Lexical Innovation, And Speaker Motivations: A Historical Psycholinguistic Approach, Jason Timm Dr.

Linguistics ETDs

Speakers commonly re-purpose existing forms in the mental lexicon to create novel form-meaning. Contemporary evidence that such innovation processes have occurred historically is attested in varying degrees of polysemy in the mental lexicon. This dissertation considers speaker motivations underlying these innnovation processes historically. Strong synchronic relationships between frequency and degree of polysemy, on one hand, and frequency and lexical access, on the other hand, have traditionally been interpreted as evidence for the primacy of economic motivations in processes of lexical innovation. In contrast, the cognitive processes that most commonly facilitate innovation, metaphor and metonymy, have largely been described as processes …


An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz 2016 The Graduate Center, City University of New York

An Examination Of Cross-Domain Authorship Attribution Techniques, Maxwell B. Schwartz

Dissertations, Theses, and Capstone Projects

In recent years, Twitter has become a popular testing ground for techniques in authorship attribution. This is due to both the ease of building large corpora as well as the challenges associated with the character limit imposed by the service and the writing styles that have developed as a result. As both false and genuine claims of hacked Twitter accounts have made international news, there is an increasing need for this type of work. For newer Twitter accounts, however, there is little training data. Thus, this study looks to lay the groundwork for cross-domain authorship attribution: training on one source …


An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang 2016 The Graduate Center, City University of New York

An Evaluation Of Pos Taggers For The Childes Corpus, Rui Huang

Dissertations, Theses, and Capstone Projects

This project evaluates four mainstream taggers on a representative collection of child-adult’s dialogues from Child Language Data Exchange System. The nine children’s files from Valian corpora and part of Eve corpora have been manually labeled, and rewrote with LARC tagset. They served as gold standard corpora in the training and testing process. Four taggers: CLAN MOR tagger, ACOPOST trigram tagger, Stanford parser, and Ver. 1.14 of Brill tagger have been tested by 10-fold cross validation. By analyzing what kinds of assumptions the tagger made about category assignment lead to failing, we identify several problematic cases of tagging. By comparing the …


Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow 2016 Ursinus College

Latent Semantic Indexing In The Discovery Of Cyber-Bullying In Online Text, Jacob L. Bigelow

Computer Science Summer Fellows

The rise in the use of social media and particularly the rise of adolescent use has led to a new means of bullying. Cyber-bullying has proven consequential to youth internet users causing a need for a response. In order to effectively stop this problem we need a verified method of detecting cyber-bullying in online text; we aim to find that method. For this project we look at thirteen thousand labeled posts from Formspring and create a bank of words used in the posts. First the posts are cleaned up by taking out punctuation, normalizing emoticons, and removing high and low …


Digital Commons powered by bepress