Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Computational Linguistics

Demographic Factors As Domains For Adaptation In Linguistic Preprocessing, Sara Morini Sep 2019

Demographic Factors As Domains For Adaptation In Linguistic Preprocessing, Sara Morini

Dissertations, Theses, and Capstone Projects

Classic natural language processing resources such as the Penn Treebank (Marcus et al. 1993) have long been used both as evaluation data for many linguistic tasks and as training data for a variety of off-the-shelf language processing tools. Recent work has highlighted a gender imbalance in the authors of this text data (Garimella et al. 2019) and hypothesized that tools created with such resources will privilege users from particular demographic groups (Hovy and Søgaard 2015). Domain adaptation is typically employed as a strategy in machine learning to adjust models trained and evaluated with data from different genres. However, the present …


Do It Like A Syntactician: Using Binary Gramaticality Judgements To Train Sentence Encoders And Assess Their Sensitivity To Syntactic Structure, Pablo Gonzalez Martinez Sep 2019

Do It Like A Syntactician: Using Binary Gramaticality Judgements To Train Sentence Encoders And Assess Their Sensitivity To Syntactic Structure, Pablo Gonzalez Martinez

Dissertations, Theses, and Capstone Projects

The binary nature of grammaticality judgments and their use to access the structure of syntax are a staple of modern linguistics. However, computational models of natural language rarely make use of grammaticality in their training or application. Furthermore, developments in modern neural NLP have produced a myriad of methods that push the baselines in many complex tasks, but those methods are typically not evaluated from a linguistic perspective. In this dissertation I use grammaticality judgements with artificially generated ungrammatical sentences to assess the performance of several neural encoders and propose them as a suitable training target to make models learn …


Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov May 2019

Analyzing Prosody With Legendre Polynomial Coefficients, Rachel Rakov

Dissertations, Theses, and Capstone Projects

This investigation demonstrates the effectiveness of Legendre polynomial coefficients representing prosodic contours within the context of two different tasks: nativeness classification and sarcasm detection. By making use of accurate representations of prosodic contours to answer fundamental linguistic questions, we contribute significantly to the body of research focused on analyzing prosody in linguistics as well as modeling prosody for machine learning tasks. Using Legendre polynomial coefficient representations of prosodic contours, we answer prosodic questions about differences in prosody between native English speakers and non-native English speakers whose first language is Mandarin. We also learn more about prosodic qualities of sarcastic speech. …


The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao May 2019

The Perception Of Mandarin Tones In "Bubble" Noise By Native And L2 Listeners, Mengxuan Zhao

Dissertations, Theses, and Capstone Projects

Previous studies have revealed the complexity of Mandarin Tones. For example, similarities in the pitch contours of tones 2 and 3 and tones 3 and 4 cause confusion for listeners. The realization of a tone's contour is highly dependent on its context, especially the preceding pitch. This is known as the coarticulation effect. Researchers have demonstrated the robustness of tone perception by both native and non-native listeners, even with incomplete acoustic information or in noisy environment. However, non-native listeners were observed to behave differently from native listeners in their use of contextual information. For example, the disagreement between the end …


Generative Adversarial Networks And Word Embeddings For Natural Language Generation, Robert D. Schultz Jr Feb 2019

Generative Adversarial Networks And Word Embeddings For Natural Language Generation, Robert D. Schultz Jr

Dissertations, Theses, and Capstone Projects

We explore using image generation techniques to generate natural language. Generative Adversarial Networks (GANs), normally used for image generation, were used for this task. To avoid using discrete data such as one-hot encoded vectors, with dimensions corresponding to vocabulary size, we instead use word embeddings as training data. The main motivation for this is the fact that a sentence translated into a sequence of word embeddings (a “word matrix”) is an analogue to a matrix of pixel values in an image. These word matrices can then be used to train a generative adversarial model. The output of the model’s generator …