Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Arts and Humanities (4)
- Digital Humanities (1)
- Discourse and Text Linguistics (1)
- Feminist, Gender, and Sexuality Studies (1)
- Lesbian, Gay, Bisexual, and Transgender Studies (1)
-
- Library and Information Science (1)
- Other Linguistics (1)
- Public Affairs, Public Policy and Public Administration (1)
- Russian Linguistics (1)
- Science and Technology Policy (1)
- Science and Technology Studies (1)
- Slavic Languages and Societies (1)
- Spanish Linguistics (1)
- Spanish and Portuguese Language and Literature (1)
- Keyword
-
- Computational linguistics (2)
- Artificial intelligence (1)
- Authorship identification (1)
- Computational Linguistics (1)
- Computer mediated communication (1)
-
- Deep learning (1)
- Dictionary-based keyword extraction (1)
- Doc2Vec (1)
- Economic geography (1)
- Ethical AI (1)
- Finite-state transducer (1)
- Forensic linguistics (1)
- Geospatial analysis (1)
- Grants data (1)
- Homograph disambiguation (1)
- Inflectional morphology (1)
- KNN (1)
- Label imputation (1)
- Language model (1)
- Lgbt (1)
- Loanword detection (1)
- Machine learning (1)
- Natural Language Processing (1)
- Natural language processing (1)
- Queer (1)
- Russian loanwords (1)
- Scientometrics (1)
- Small Business Innovation Research (1)
- Token classification (1)
- Transgender (1)
Articles 1 - 7 of 7
Full-Text Articles in Computational Linguistics
Label Imputation For Homograph Disambiguation: Theoretical And Practical Approaches, Jennifer M. Seale
Label Imputation For Homograph Disambiguation: Theoretical And Practical Approaches, Jennifer M. Seale
Dissertations, Theses, and Capstone Projects
This dissertation presents the first implementation of label imputation for the task of homograph disambiguation using 1) transcribed audio, and 2) parallel, or translated, corpora. For label imputation from parallel corpora, a hypothesis of interlingual alignment between homograph pronunciations and text word forms is developed and formalized. Both audio and parallel corpora label imputation techniques are tested empirically in experiments that compare homograph disambiguation model performance using: 1) hand-labeled training data, and 2) hand-labeled training data augmented with label-imputed data. Regularized, multinomial logistic regression and pre-trained ALBERT, BERT, and XLNet language models fine-tuned as token classifiers are developed for homograph …
Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor
Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor
Dissertations, Theses, and Capstone Projects
This paper investigates recent English loanwords in Russian and explores ways in which computational methods can help further theoretical research. The goal of the study is two-fold: to find new, previously unattested loanwords borrowed over the last decade and to examine the rate of adaptation of the new borrowings, attested by the degree to which they conform to the constraints of the Russian language. First, we train a finite-state pipeline that combines character n-gram language models, which encode phonotactic and lexical properties of loanwords, with a binary classifier to detect loanwords. The model achieves state-of-the-art performance results during evaluation, surpassing …
From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur
From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur
Dissertations, Theses, and Capstone Projects
Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to …
The Public Innovations Explorer: A Geo-Spatial & Linked-Data Visualization Platform For Publicly Funded Innovation Research In The United States, Seth Schimmel
Dissertations, Theses, and Capstone Projects
The Public Innovations Explorer (https://sethsch.github.io/innovations-explorer/app/index.html) is a web-based tool created using Node.js, D3.js and Leaflet.js that can be used for investigating awards made by Federal agencies and departments participating in the Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) grant-making programs between 2008 and 2018. By geocoding the publicly available grants data from SBIR.gov, the Public Innovations Explorer allows users to identify companies performing publicly-funded innovative research in each congressional district and obtain dynamic district-level summaries of funding activity by agency and year. Applying spatial clustering techniques on districts' employment levels across major economic sectors provides users …
Predicting Stock Price Movements Using Sentiment And Subjectivity Analyses, Andrew Kirby
Predicting Stock Price Movements Using Sentiment And Subjectivity Analyses, Andrew Kirby
Dissertations, Theses, and Capstone Projects
In a quick search online, one can find many tools which use information from news headlines to make predictions concerning the trajectory of a given stock. But what if we went further, looking instead into the text of the article, to extract this and other information? Here, the goal is to extract the sentence in which a stock ticker symbol is mentioned from a news article, then determine sentiment and subjectivity values from that sentence, and finally make a prediction on whether or not the value of that stock will go up or not in a 24-hour timespan. Bloomberg News …
When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller
When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller
Dissertations, Theses, and Capstone Projects
As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an overview …
A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco
A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco
Dissertations, Theses, and Capstone Projects
Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.
Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on …