Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

233 Full-Text Articles 347 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

233 full-text articles. Page 6 of 11.

Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov 2019 The Graduate Center, City University of New York

Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov

Dissertations, Theses, and Capstone Projects

This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. …


The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao 2019 The Graduate Center, City University of New York

The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao

Dissertations, Theses, and Capstone Projects

Previous studies have revealed the complexity of Mandarin Tones. For example, similarities in the pitch contours of tones 2 and 3 and tones 3 and 4 cause confusion for listeners. The realization of a tone's contour is highly dependent on its context, especially the preceding pitch. This is known as the coarticulation effect. Researchers have demonstrated the robustness of tone perception by both native and non-native listeners, even with incomplete acoustic information or in noisy environment. However, non-native listeners were observed to behave differently from native listeners in their use of contextual information. For example, the disagreement between the end …


Quantifying Coherence In A Transdiagnostic Sample: A Methodological Investigation Of Computationally-Derived Coherence Using Ambulatory Assessment, Taylor L. Fedechko 2019 Louisiana State University and Agricultural and Mechanical College

Quantifying Coherence In A Transdiagnostic Sample: A Methodological Investigation Of Computationally-Derived Coherence Using Ambulatory Assessment, Taylor L. Fedechko

LSU Master's Theses

Schizophrenia is a clinical diagnosis assigned to individuals that experience positive (e.g., hallucinations and delusions), negative (e.g., blunted affect), and disorganized (e.g., incoherent speech) symptoms. One particularly disabling symptom is incoherence, which is defined as the meaning-based relationship between ideas. This symptom can drastically affect an individual’s quality of life by affecting areas such as social and occupational functioning. Currently, the mechanism behind this symptom is unknown and requires further study. One way to examine incoherence is to understand its level of expression in other clinical populations. With the advent of computationally-derived natural language processing (NLP), coherence can be quantified …


Obfuscating Authorship: Results Of A User Study On Nondescript, A Digital Privacy Tool, Robin Camille Davis 2019 CUNY John Jay College

Obfuscating Authorship: Results Of A User Study On Nondescript, A Digital Privacy Tool, Robin Camille Davis

Publications and Research

For those who write anonymously, particularly for safety reasons, authorship attribution poses a threat. Nondescript, my web app, guides writers in achieving stylometric obfuscation in order to preserve anonymity. The app runs simulations of authorship attribution scenarios by analyzing the user’s linguistic features. In this paper, I will describe the conception of the Nondescript app; discuss related work; and present the results of a user study. Most users in the study were able to anonymize their writing in at least 5 out of 10 authorship attribution scenarios. Users rated the anonymization process an average of 3.6 out of 5 in …


Generative Adversarial Networks And Word Embeddings For Natural Language Generation, Robert D. Schultz Jr 2019 The Graduate Center, City University of New York

Generative Adversarial Networks And Word Embeddings For Natural Language Generation, Robert D. Schultz Jr

Dissertations, Theses, and Capstone Projects

We explore using image generation techniques to generate natural language. Generative Adversarial Networks (GANs), normally used for image generation, were used for this task. To avoid using discrete data such as one-hot encoded vectors, with dimensions corresponding to vocabulary size, we instead use word embeddings as training data. The main motivation for this is the fact that a sentence translated into a sequence of word embeddings (a “word matrix”) is an analogue to a matrix of pixel values in an image. These word matrices can then be used to train a generative adversarial model. The output of the model’s generator …


Application Of Boolean Logic To Natural Language Complexity In Political Discourse, Austin Taing 2019 University of Kentucky

Application Of Boolean Logic To Natural Language Complexity In Political Discourse, Austin Taing

Theses and Dissertations--Computer Science

Press releases serve as a major influence on public opinion of a politician, since they are a primary means of communicating with the public and directing discussion. Thus, the public’s ability to digest them is an important factor for politicians to consider. This study employs several well-studied measures of linguistic complexity and proposes a new one to examine whether politicians change their language to become more or less difficult to parse in different situations. This study uses 27,500 press releases from the US Senate between 2004–2008 and examines election cycles and natural disasters, namely hurricanes, as situations where politicians’ language …


Non-Manual Articulators In Irish Sign Language Verbs: An Analysis With Data Mining Association Rules, Robert G. Smith, Markus Hofmann 2018 Technological University Dublin

Non-Manual Articulators In Irish Sign Language Verbs: An Analysis With Data Mining Association Rules, Robert G. Smith, Markus Hofmann

Conference Papers

The Signs of Ireland (SOI) corpus (Leeson et al., 2006) deploys a complex multi-tiered temporal data structure. The process of manually analyzing such data is laborious, cannot eliminate bias and often, important patterns can go completely unnoticed. In addition to this, as a result of the complex nature of grammatical structures contained in the corpus, identifying complex linguistic associations or patterns across tiers is simply too intricate a task for a human to carry out in an acceptable timeframe. This work explores the application of data mining techniques on a set of multi-tiered temporal data from the SOI corpus. Building …


Recursive Neural Networks For Semantic Sentence Representation, Liam S. Geron 2018 The Graduate Center, City University of New York

Recursive Neural Networks For Semantic Sentence Representation, Liam S. Geron

Dissertations, Theses, and Capstone Projects

Semantic representation has a rich history rife with both complex linguistic theory and computational models. Though this history stretches back almost 50 years (Salton, 1971), recently the field has undergone an unexpected shift in paradigm thanks to the work of Mikolov et al., 2013(a & b) which has proven that vector-space semantic models can capture large amounts of semantic information. As of yet, these semantic representations are computed at the word level, and finding a semantic representation of a phrase is a much more difficult challenge. Mikolov et al., 2013(a&b) proved that their word vectors can be composed arithmetically to …


Advanced Recurrent Network-Based Hybrid Acoustic Models For Low Resource Speech Recognition, Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu, Michael T. Johnson 2018 Tsinghua University, China

Advanced Recurrent Network-Based Hybrid Acoustic Models For Low Resource Speech Recognition, Jian Kang, Wei-Qiang Zhang, Wei-Wei Liu, Jia Liu, Michael T. Johnson

Electrical and Computer Engineering Faculty Publications

Recurrent neural networks (RNNs) have shown an ability to model temporal dependencies. However, the problem of exploding or vanishing gradients has limited their application. In recent years, long short-term memory RNNs (LSTM RNNs) have been proposed to solve this problem and have achieved excellent results. Bidirectional LSTM (BLSTM), which uses both preceding and following context, has shown particularly good performance. However, the computational requirements of BLSTM approaches are quite heavy, even when implemented efficiently with GPU-based high performance computers. In addition, because the output of LSTM units is bounded, there is often still a vanishing gradient issue over multiple layers. …


Perception & Perspective: An Analysis Of Discourse And Situational Factors In Reference Frame Selection, Robert J. Ross, Kavita E. Thomas 2018 Technological University Dublin

Perception & Perspective: An Analysis Of Discourse And Situational Factors In Reference Frame Selection, Robert J. Ross, Kavita E. Thomas

Conference papers

To integrate perception into dialogue, it is necessary to bind spatial language descriptions to reference frame use. To this end, we present an analysis of discourse and situational factors that may influence reference frame choice in dialogues. We show that factors including spatial orientation, task, self and other alignment, and dyad have an influence on reference frame use. We further show that a computational model to estimate reference frame based on these features provides results greater than both random and greedy reference frame selection strategies.


Intergroup Variability In Personality Recognition, Arundhati Sengupta 2018 The Graduate Center, City University of New York

Intergroup Variability In Personality Recognition, Arundhati Sengupta

Dissertations, Theses, and Capstone Projects

Automatic Identification of personality in conversational speech has many applications in natural language processing such as leader identification in a meeting, adaptive dialogue systems, and dating websites. However, the widespread acceptance of automatic personality recognition through lexical and vocal characteristics is limited by the variability of error rate in a general purpose model among speakers from different demographic groups. While other work reports accuracy, we explored error rates of automatic personality recognition task using classification models for different genders and native language groups (L1). We also present a statistical experiment showing the influence of gender and L1 on the relation …


Nevertheless, She Persisted: A Linguistic Analysis Of The Speech Of Elizabeth Warren, 2007-2017, Matthew Jennings 2018 East Tennessee State University

Nevertheless, She Persisted: A Linguistic Analysis Of The Speech Of Elizabeth Warren, 2007-2017, Matthew Jennings

Undergraduate Honors Theses

A breakout star among American progressives in the recent past, Elizabeth Warren has quickly gone from a law professor to a leading figure in Democratic politics. This paper analyzes Warren’s speech from before her time as a political figure to the present using the quantitative textual methodology established by Jones (2016) in order to see if Warren’s speech supports Jones’s assertion that masculine speech is the language of power. Ratios of feminine to masculine markers ultimately indicate that despite her increasing political sway, Warren’s speech becomes increasingly feminine instead. However, despite associations of feminine speech with weakness, Warren’s speech scores …


Multimodal Depression Detection: An Investigation Of Features And Fusion Techniques For Automated Systems, Michelle Renee Morales 2018 The Graduate Center, City University of New York

Multimodal Depression Detection: An Investigation Of Features And Fusion Techniques For Automated Systems, Michelle Renee Morales

Dissertations, Theses, and Capstone Projects

Depression is a serious illness that affects a large portion of the world’s population. Given the large effect it has on society, it is evident that depression is a serious health issue. This thesis evaluates, at length, how technology may aid in assessing depression. We present an in-depth investigation of features and fusion techniques for depression detection systems. We also present OpenMM: a novel tool for multimodal feature extraction. Lastly, we present novel techniques for multimodal fusion. The contributions of this work add considerably to our knowledge of depression detection systems and have the potential to improve future systems by …


Speech Perception In “Bubble” Noise: Korean Fricatives And Affricates By Native And Non-Native Korean Listeners, Jiyoung Choi 2018 The Graduate Center, City University of New York

Speech Perception In “Bubble” Noise: Korean Fricatives And Affricates By Native And Non-Native Korean Listeners, Jiyoung Choi

Dissertations, Theses, and Capstone Projects

The current study examines acoustic cues used by second language learners of Korean to discriminate between Korean fricatives and affricates in noise and how these cues relate to those used by native Korean listeners. Stimuli consist of naturally-spoken consonant-vowel-consonant-vowel (CVCV) syllables: /sɑdɑ/, /s*ɑdɑ/, /tʃɑdɑ/, /tʃhɑdɑ/, and /tʃ*ɑdɑ/. In this experiment, the “bubble noise” methodology of Mandel at al. (2016) was used to identify the time-frequency locations of important cues in each utterance, i.e., where audibility of the location is significantly correlated with correct identification of the utterance in noise. Results show that non-native Korean listeners can discriminate between …


Describing Doggo-Speak: Features Of Doggo Meme Language, Jennifer Bivens 2018 The Graduate Center, City University of New York

Describing Doggo-Speak: Features Of Doggo Meme Language, Jennifer Bivens

Dissertations, Theses, and Capstone Projects

Doggo-speak is a specialized way of writing most commonly associated with captions on Doggo memes, humorous images of dogs shared in online communities. This paper will explore linguistic features of Doggo-speak through analysis of social media posts by Doggo fan pages. It will use the discussed features as inputs to five machine learning classifiers and will show, through this classification task, that the discussed features are sufficient for distinguishing between Doggo-speak and more general English text.


Automatic Analysis Of Musical Lyrics, Joanna Gormley 2018 Merrimack College

Automatic Analysis Of Musical Lyrics, Joanna Gormley

Honors Senior Capstone Projects

Is music getting less sophisticated over time? That is the question which this study aims to answer, with the goal of improving upon previous analysis done on the topic. The blog posts which inspired this project lacked accuracy and dimensionality. Realizing that a larger data set of songs would make a significant difference in the precision of our analysis, we set out to design a piece of software constructed with the capability to analyze several thousand songs. Mimicking previous works which analyzed sophistication of music, the software focuses on the lyrics of songs. Three metrics were used in order to …


Role Of Information Technology In Development Of Eritrean Language - ኣበርክቶ ቴክኖሎጂ ሓበሬታ ኣብ ምምዕባል ቋንቋታት ኤርትራ, Filmon Gebreyesus Ph.D 2018 Santa Clara University

Role Of Information Technology In Development Of Eritrean Language - ኣበርክቶ ቴክኖሎጂ ሓበሬታ ኣብ ምምዕባል ቋንቋታት ኤርትራ, Filmon Gebreyesus Ph.D

Symposium on Eritrean Literature

Information technology has been affecting us in every day of our lives, especially social media has been the main means of communication in our society. But, all the access to this current and ever-growing technology has always been limited to using it in English, Arab or other languages because our language didn’t come up to speed with the current technology.

Though there has been lots of efforts to develop Tigrigna or other languages application programs to help us use our language, there are still lots of gaps that could be filled to achieve the competence of our languages. In light …


Does The Test Work? Evaluating A Web-Based Language Placement Test, Avizia Long, Sun-Young Shin, Kimberly Geeslin, Erik Willis 2018 Texas Tech University

Does The Test Work? Evaluating A Web-Based Language Placement Test, Avizia Long, Sun-Young Shin, Kimberly Geeslin, Erik Willis

Faculty Publications

In response to the need for examples of test validation from which everyday language programs can benefit, this paper reports on a study that used Bachman’s (2005) assessment use argument (AUA) framework to examine evidence to support claims made about the intended interpretations and uses of scores based on a new web-based Spanish language placement test. The test, which consisted of 100 items distributed across five item types (sound discrimination, grammar, listening comprehension, reading comprehension, and vocabulary), was tested with 2,201 incoming first-year and transfer students at a large, Midwestern public university. Analyses of internal consistency and validity revealed the …


Losing Shahrazad: A Distant Reading Of 1001 Nights, Taysa Mohler 2018 Bard College

Losing Shahrazad: A Distant Reading Of 1001 Nights, Taysa Mohler

Senior Projects Spring 2018

This project is a distant reading analysis of seven 19th and 20th-century English translations of One Thousand and One Nights or The Arabian Nights. Through the use of computer programming and distant reading, it becomes clear that the Nights' frame tale is the carrier of the internal logic and generative power of the story cycle. Further, the frame tale expresses the Nights' self-representation, which serves to undermine the historical use of the Nights as synecdoche for the Orient. Therefore, the translators that remove the frame story from their versions further the Nights' use as an Orientalist object, …


Exploring The Functional And Geometric Bias Of Spatial Relations Using Neural Language Models, Simon Dobnik, Mehdi Ghanimifard, John D. Kelleher 2018 University of Gothenberg, Sweden

Exploring The Functional And Geometric Bias Of Spatial Relations Using Neural Language Models, Simon Dobnik, Mehdi Ghanimifard, John D. Kelleher

Conference papers

The challenge for computational models of spatial descriptions for situated dialogue systems is the integration of information from different modalities. The semantics of spatial descriptions are grounded in at least two sources of information: (i) a geometric representation of space and (ii) the functional interaction of related objects that. We train several neural language models on descriptions of scenes from a dataset of image captions and examine whether the functional or geometric bias of spatial descriptions reported in the literature is reflected in the estimated perplexity of these models. The results of these experiments have implications for the creation of …


Digital Commons powered by bepress