Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 7 of 7

Full-Text Articles in Computational Linguistics

Label Imputation For Homograph Disambiguation: Theoretical And Practical Approaches, Jennifer M. Seale Sep 2021

Label Imputation For Homograph Disambiguation: Theoretical And Practical Approaches, Jennifer M. Seale

Dissertations, Theses, and Capstone Projects

This dissertation presents the first implementation of label imputation for the task of homograph disambiguation using 1) transcribed audio, and 2) parallel, or translated, corpora. For label imputation from parallel corpora, a hypothesis of interlingual alignment between homograph pronunciations and text word forms is developed and formalized. Both audio and parallel corpora label imputation techniques are tested empirically in experiments that compare homograph disambiguation model performance using: 1) hand-labeled training data, and 2) hand-labeled training data augmented with label-imputed data. Regularized, multinomial logistic regression and pre-trained ALBERT, BERT, and XLNet language models fine-tuned as token classifiers are developed for homograph …


Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor Sep 2021

Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor

Dissertations, Theses, and Capstone Projects

This paper investigates recent English loanwords in Russian and explores ways in which computational methods can help further theoretical research. The goal of the study is two-fold: to find new, previously unattested loanwords borrowed over the last decade and to examine the rate of adaptation of the new borrowings, attested by the degree to which they conform to the constraints of the Russian language. First, we train a finite-state pipeline that combines character n-gram language models, which encode phonotactic and lexical properties of loanwords, with a binary classifier to detect loanwords. The model achieves state-of-the-art performance results during evaluation, surpassing …


From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur Sep 2021

From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur

Dissertations, Theses, and Capstone Projects

Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to …


The Public Innovations Explorer: A Geo-Spatial & Linked-Data Visualization Platform For Publicly Funded Innovation Research In The United States, Seth Schimmel Jun 2021

The Public Innovations Explorer: A Geo-Spatial & Linked-Data Visualization Platform For Publicly Funded Innovation Research In The United States, Seth Schimmel

Dissertations, Theses, and Capstone Projects

The Public Innovations Explorer (https://sethsch.github.io/innovations-explorer/app/index.html) is a web-based tool created using Node.js, D3.js and Leaflet.js that can be used for investigating awards made by Federal agencies and departments participating in the Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) grant-making programs between 2008 and 2018. By geocoding the publicly available grants data from SBIR.gov, the Public Innovations Explorer allows users to identify companies performing publicly-funded innovative research in each congressional district and obtain dynamic district-level summaries of funding activity by agency and year. Applying spatial clustering techniques on districts' employment levels across major economic sectors provides users …


Predicting Stock Price Movements Using Sentiment And Subjectivity Analyses, Andrew Kirby Jun 2021

Predicting Stock Price Movements Using Sentiment And Subjectivity Analyses, Andrew Kirby

Dissertations, Theses, and Capstone Projects

In a quick search online, one can find many tools which use information from news headlines to make predictions concerning the trajectory of a given stock. But what if we went further, looking instead into the text of the article, to extract this and other information? Here, the goal is to extract the sentence in which a stock ticker symbol is mentioned from a news article, then determine sentiment and subjectivity values from that sentence, and finally make a prediction on whether or not the value of that stock will go up or not in a 24-hour timespan. Bloomberg News …


When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller Feb 2021

When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller

Dissertations, Theses, and Capstone Projects

As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an overview …


A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco Feb 2021

A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco

Dissertations, Theses, and Capstone Projects

Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.

Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on …