Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Linguistics

City University of New York (CUNY)

Theses/Dissertations

Computational linguistics

Articles 1 - 7 of 7

Full-Text Articles in Social and Behavioral Sciences

Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni Jun 2023

Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni

Dissertations, Theses, and Capstone Projects

In this study, we used computational techniques to analyze the language used in news articles to describe female and male politicians. Our corpus included 370 subtexts for male candidates and 374 subtexts for female candidates, gathered through the New York Times API. We conducted two experiments: an LDA topic analysis to explore the data, and a logistic regression to classify the subtexts as either male or female. Our analysis revealed some noteworthy findings that suggest the possibility of developing a gender bias classifier in the future. However, to create a more robust understanding of bias, additional research and data are …


Evaluating The Role Of Gender In Dementia-Related Language Deficiencies, Kelsey Bourque Sep 2021

Evaluating The Role Of Gender In Dementia-Related Language Deficiencies, Kelsey Bourque

Dissertations, Theses, and Capstone Projects

Typically, about 60% of dementia patients are women. Researchers have historically dismissed this imbalance as a result of the life expectancy for women being longer, and since age is the primary risk factor associated with dementia, and women’s longer lifespan equates to a higher percentage of the dementia patient population (Mielke, 2018). While the exact cause of dementia is unknown, researchers and clinicians have historically treated male and female populations the same, asserting that there is no significant difference between the two sexes in regards to detecting dementia. The present study aims to address this potential gap in dementia research, …


From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur Sep 2021

From An Art To A Science: Features And Methodology In Computational Authorship Identification, Jonathan I. Manczur

Dissertations, Theses, and Capstone Projects

Nearly thirty years ago, the United States Supreme Court revaluated the criteria for accepting forensic science and expert testimony, challenging Forensic Linguistics to assert itself as a reputable science. Much work has been produced in the interim to that end, but much still needs to be accomplished to satisfy the judicial standards. Computational linguistics has the potential to provide that necessary analytical framework. This paper’s intent is two-fold. First, there are two competing theories on the proper features necessary to identify an unknown author. Four features were drawn from the syntactic computational linguistics tradition and four from computational stylometry to …


A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco Feb 2021

A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco

Dissertations, Theses, and Capstone Projects

Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.

Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on …


Inferring Research Fields In Administrative Records Using Text Data, Ekaterina Levitskaya Jun 2020

Inferring Research Fields In Administrative Records Using Text Data, Ekaterina Levitskaya

Dissertations, Theses, and Capstone Projects

The UMETRICS database (Universities: Measuring the Effects of Research on Innovation, Competitiveness, and Science) contains rich information on grants from sponsored federal and non-federal research for 32 universities over a 15-year period. It is hosted at IRIS (Institute for Research on Innovation and Science, University of Michigan) and serves as a rich source of university administrative data; however, it does not contain information on research fields. Categorizing grants data by research field can help to measure results of investment in research and science and provide evidence for the data-driven policy-making; yet administrative data often lacks this type of categorization. In …


Intergroup Variability In Personality Recognition, Arundhati Sengupta May 2018

Intergroup Variability In Personality Recognition, Arundhati Sengupta

Dissertations, Theses, and Capstone Projects

Automatic Identification of personality in conversational speech has many applications in natural language processing such as leader identification in a meeting, adaptive dialogue systems, and dating websites. However, the widespread acceptance of automatic personality recognition through lexical and vocal characteristics is limited by the variability of error rate in a general purpose model among speakers from different demographic groups. While other work reports accuracy, we explored error rates of automatic personality recognition task using classification models for different genders and native language groups (L1). We also present a statistical experiment showing the influence of gender and L1 on the relation …


Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind Jun 2016

Utilizing Linguistic Context To Improve Individual And Cohort Identification In Typed Text, Adam Goodkind

Dissertations, Theses, and Capstone Projects

The process of producing written text is complex and constrained by pressures that range from physical to psychological. In a series of three sets of experiments, this thesis demonstrates the effects of linguistic context on the timing patterns of the production of keystrokes. We elucidate the effect of linguistic context at three different levels of granularity: The first set of experiments illustrate how the nontraditional syntax of a single linguistic construct, the multi-word expression, can create significant changes in keystroke production patterns. This set of experiments is followed by a set of experiments that test the hypothesis on the entire …