Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Entire DC Network

Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu Nov 2023

Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu

Turkish Journal of Electrical Engineering and Computer Sciences

Stance detection has garnered considerable attention from researchers due to its broad range of applications, including fact-checking and social computing. While state-of-the-art stance detection models are usually based on supervised machine learning methods, their effectiveness is heavily reliant on the quality of training data. This problem is more prevalent in stance detection task because the stance of a text is intimately tied to the target under consideration. While numerous datasets exist for stance detection, determining their suitability for a specific target can be challenging. In this work, we focus on Turkish stance detection and explore the impact of training data …


Behind Derogatory Migrants' Terms For Venezuelan Migrants: Xenophobia And Sexism Identification With Twitter Data And Nlp, Joseph Martínez, Melissa Miller-Felton, Jose Padilla, Erika Frydenlund Apr 2023

Behind Derogatory Migrants' Terms For Venezuelan Migrants: Xenophobia And Sexism Identification With Twitter Data And Nlp, Joseph Martínez, Melissa Miller-Felton, Jose Padilla, Erika Frydenlund

Modeling, Simulation and Visualization Student Capstone Conference

The sudden arrival of many migrants can present new challenges for host communities and create negative attitudes that reflect that tension. In the case of Colombia, with the influx of over 2.5 million Venezuelan migrants, such tensions arose. Our research objective is to investigate how those sentiments arise in social media. We focused on monitoring derogatory terms for Venezuelans, specifically veneco and veneca. Using a dataset of 5.7 million tweets from Colombian users between 2015 and 2021, we determined the proportion of tweets containing those terms. We observed a high prevalence of xenophobic and defamatory language correlated with the …


Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör Mar 2023

Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör

Turkish Journal of Electrical Engineering and Computer Sciences

Solving math word problems (MWP) is a challenging task due to the semantic gap between natural language texts and mathematical equations. The main purpose of the task is to take a written math problem as input and produce a proper equation as output for solving that problem. This paper describes a sequence-to-sequence (seq2seq) neural model for automatically solving Turkish MWPs based on their semantic meanings in the text. It comprises a bidirectional encoder to comprehend the semantics of the problem by encoding the input sequence and a decoder with attention to extract the equation by tracking the semantic meanings of …


Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher Jan 2023

Know An Emotion By The Company It Keeps: Word Embeddings From Reddit/Coronavirus, Alejandro García-Rudolph, David Sanchez-Pinsach, Dietmar Frey, Eloy Opisso, Katryna Cisek, John Kelleher

Articles

Social media is a crucial communication tool (e.g., with 430 million monthly active users in online forums such as Reddit), being an objective of Natural Language Processing (NLP) techniques. One of them (word embeddings) is based on the quotation, “You shall know a word by the company it keeps,” highlighting the importance of context in NLP. Meanwhile, “Context is everything in Emotion Research.” Therefore, we aimed to train a model (W2V) for generating word associations (also known as embeddings) using a popular Coronavirus Reddit forum, validate them using public evidence and apply them to the discovery of context for specific …


Exploring Gender Bias In Semantic Representations For Occupational Classification In Nlp: Techniques And Mitigation Strategies, Joseph Michael O'Carroll Jan 2023

Exploring Gender Bias In Semantic Representations For Occupational Classification In Nlp: Techniques And Mitigation Strategies, Joseph Michael O'Carroll

Dissertations

Gender bias in Natural Language Processing (NLP) models is a non-trivial problem that can perpetuate and amplify existing societal biases. This thesis investigates gender bias in occupation classification and explores the effectiveness of different debiasing methods for language models to reduce the impact of bias in the model’s representations. The study employs a data-driven empirical methodology focusing heavily on experimentation and result investigation. The study uses five distinct semantic representations and models with varying levels of complexity to classify the occupation of individuals based on their biographies.


A Structure-Aware Generative Adversarial Network For Bilingual Lexicon Induction, Bocheng Han, Qian Tao, Lusi Li, Zhihao Xiong Jan 2023

A Structure-Aware Generative Adversarial Network For Bilingual Lexicon Induction, Bocheng Han, Qian Tao, Lusi Li, Zhihao Xiong

Computer Science Faculty Publications

Bilingual lexicon induction (BLI) is the task of inducing word translations with a learned mapping function that aligns monolingual word embedding spaces in two different languages. However, most previous methods treat word embeddings as isolated entities and fail to jointly consider both the intra-space and inter-space topological relations between words. This limitation makes it challenging to align words from embedding spaces with distinct topological structures, especially when the assumption of isomorphism may not hold. To this end, we propose a novel approach called the Structure-Aware Generative Adversarial Network (SA-GAN) model to explicitly capture multiple topological structure information to achieve accurate …


A Structured Narrative Prompt For Prompting Narratives From Large Language Models: Sentiment Assessment Of Chatgpt-Generated Narratives And Real Tweets, Christopher J. Lynch, Erik J. Jensen, Virginia Zamponi, Kevin O'Brien, Erika Frydenlund, Ross Gore Jan 2023

A Structured Narrative Prompt For Prompting Narratives From Large Language Models: Sentiment Assessment Of Chatgpt-Generated Narratives And Real Tweets, Christopher J. Lynch, Erik J. Jensen, Virginia Zamponi, Kevin O'Brien, Erika Frydenlund, Ross Gore

VMASC Publications

Large language models (LLMs) excel in providing natural language responses that sound authoritative, reflect knowledge of the context area, and can present from a range of varied perspectives. Agent-based models and simulations consist of simulated agents that interact within a simulated environment to explore societal, social, and ethical, among other, problems. Simulated agents generate large volumes of data and discerning useful and relevant content is an onerous task. LLMs can help in communicating agents' perspectives on key life events by providing natural language narratives. However, these narratives should be factual, transparent, and reproducible. Therefore, we present a structured narrative prompt …


Data-Driven Strategies For Disease Management In Patients Admitted For Heart Failure, Ankita Agarwal Jan 2023

Data-Driven Strategies For Disease Management In Patients Admitted For Heart Failure, Ankita Agarwal

Browse all Theses and Dissertations

Heart failure is a syndrome which effects a patient’s quality of life adversely. It can be caused by different underlying conditions or abnormalities and involves both cardiovascular and non-cardiovascular comorbidities. Heart failure cannot be cured but a patient’s quality of life can be improved by effective treatment through medicines and surgery, and lifestyle management. As effective treatment of heart failure incurs cost for the patients and resource allocation for the hospitals, predicting length of stay of these patients during each hospitalization becomes important. Heart failure can be classified into two types: left sided heart failure and right sided heart failure. …


Comparative Adjudication Of Noisy And Subjective Data Annotation Disagreements For Deep Learning, Scott David Williams Jan 2023

Comparative Adjudication Of Noisy And Subjective Data Annotation Disagreements For Deep Learning, Scott David Williams

Browse all Theses and Dissertations

Obtaining accurate inferences from deep neural networks is difficult when models are trained on instances with conflicting labels. Algorithmic recognition of online hate speech illustrates this. No human annotator is perfectly reliable, so multiple annotators evaluate and label online posts in a corpus. Labeling scheme limitations, differences in annotators' beliefs, and limits to annotators' honesty and carefulness cause some labels to disagree. Consequently, decisive and accurate inferences become less likely. Some practical applications such as social research can tolerate some indecisiveness. However, an online platform using an indecisive classifier for automated content moderation could create more problems than it solves. …


L3 Ensembles: Lifelong Learning Approach For Ensemble Of Foundational Language Models*, Aidin Shiri, Kaushik Roy, Amit Sheth, Manas Gaur Jan 2023

L3 Ensembles: Lifelong Learning Approach For Ensemble Of Foundational Language Models*, Aidin Shiri, Kaushik Roy, Amit Sheth, Manas Gaur

Publications

Fine-tuning pre-trained foundational language models (FLM) for specific tasks is often impractical, especially for resource-constrained devices. This necessitates the development of a Lifelong Learning (L3) framework that continuously adapts to a stream of Natural Language Processing (NLP) tasks efficiently. We propose an approach that focuses on extracting meaningful representations from unseen data, constructing a structured knowledge base, and improving task performance incrementally. We conducted experiments on various NLP tasks to validate its effectiveness, including benchmarks like GLUE and SuperGLUE. We measured good performance across the accuracy, training efficiency, and knowledge transfer metrics. Initial experimental results show that the proposed L3 …