Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

2023

Natural language processing

Discipline
Institution
Publication
Publication Type

Articles 1 - 30 of 42

Full-Text Articles in Physical Sciences and Mathematics

Review Classification Using Natural Language Processing And Deep Learning, Brian Nazareth Dec 2023

Review Classification Using Natural Language Processing And Deep Learning, Brian Nazareth

Electronic Theses, Projects, and Dissertations

Sentiment Analysis is an ongoing research in the field of Natural Language Processing (NLP). In this project, I will evaluate my testing against an Amazon Reviews Dataset, which contains more than 100 thousand reviews from customers. This project classifies the reviews using three methods – using a sentiment score by comparing the words of the reviews based on every positive and negative word that appears in the text with the Opinion Lexicon dataset, by considering the text’s variating sentiment polarity scores with a Python library called TextBlob, and with the help of neural network training. I have created a neural …


Enhanced Content-Based Fake News Detection Methods With Context-Labeled News Sources, Duncan Arnfield Dec 2023

Enhanced Content-Based Fake News Detection Methods With Context-Labeled News Sources, Duncan Arnfield

Electronic Theses and Dissertations

This work examined the relative effectiveness of multilayer perceptron, random forest, and multinomial naïve Bayes classifiers, trained using bag of words and term frequency-inverse dense frequency transformations of documents in the Fake News Corpus and Fake and Real News Dataset. The goal of this work was to help meet the formidable challenges posed by proliferation of fake news to society, including the erosion of public trust, disruption of social harmony, and endangerment of lives. This training included the use of context-categorized fake news in an effort to enhance the tools’ effectiveness. It was found that term frequency-inverse dense frequency provided …


Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu Nov 2023

Exploring The Impact Of Training Datasets On Turkish Stance Detection, Muhammed Sai̇d Zengi̇n, Berk Utku Yeni̇sey, Mücahi̇d Kutlu

Turkish Journal of Electrical Engineering and Computer Sciences

Stance detection has garnered considerable attention from researchers due to its broad range of applications, including fact-checking and social computing. While state-of-the-art stance detection models are usually based on supervised machine learning methods, their effectiveness is heavily reliant on the quality of training data. This problem is more prevalent in stance detection task because the stance of a text is intimately tied to the target under consideration. While numerous datasets exist for stance detection, determining their suitability for a specific target can be challenging. In this work, we focus on Turkish stance detection and explore the impact of training data …


Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna Nov 2023

Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna

Doctoral Dissertations

Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to …


Dispensing With Humans In Human-Computer Interaction Research, Courtni L. Byun Nov 2023

Dispensing With Humans In Human-Computer Interaction Research, Courtni L. Byun

Theses and Dissertations

Machine Learning models have become more advanced than could have been supposed even a few years ago, often surpassing human performance on many tasks. Large language models (LLM) can produce text indistinguishable from human-produced text. This begs the question, how necessary are humans - even for tasks where humans appear indispensable? Qualitative Analysis (QA) is integral to human-computer interaction research, requiring both human-produced data and human analysis of that data to illuminate human opinions about and experiences with technology. We use GPT-3 and ChatGPT to replace human analysis and then to dispense with human-produced text altogether. We find GPT-3 is …


Complex Knowledge Base Question Answering: A Survey, Yunshi Lan, Gaole He, Jinhao Jiang, Jing Jiang, Zhao Wayne Xin, Ji Rong Wen Nov 2023

Complex Knowledge Base Question Answering: A Survey, Yunshi Lan, Gaole He, Jinhao Jiang, Jing Jiang, Zhao Wayne Xin, Ji Rong Wen

Research Collection School Of Computing and Information Systems

Knowledge base question answering (KBQA) aims to answer a question over a knowledge base (KB). Early studies mainly focused on answering simple questions over KBs and achieved great success. However, their performances on complex questions are still far from satisfactory. Therefore, in recent years, researchers propose a large number of novel methods, which looked into the challenges of answering complex questions. In this survey, we review recent advances in KBQA with the focus on solving complex questions, which usually contain multiple subjects, express compound relations, or involve numerical operations. In detail, we begin with introducing the complex KBQA task and …


Dei: Exploring Academic Reflections Using Natural Language Processing To Create A Roadmap Of Student Success And Foster Inclusive Engineering Education, Rajvir H. Vyas, Nidhi Raviprasad Oct 2023

Dei: Exploring Academic Reflections Using Natural Language Processing To Create A Roadmap Of Student Success And Foster Inclusive Engineering Education, Rajvir H. Vyas, Nidhi Raviprasad

College of Engineering Summer Undergraduate Research Program

Every year, the College of Engineering (CENG) students and faculty reach out to admitted students through “Text-a-Thon” programs to answer their questions about being a student at Cal Poly. In order to improve CENG outreach efforts, we analyzed these text conversations to predict the likelihood of an admitted student accepting an offer of admission from Cal Poly. Through our research, we discovered key factors that play a role in a student committing to Cal Poly through data-based insights. Additionally, we successfully used a human-on-the-loop system to help create Machine Learning (ML) models that predict satisfaction of response by way of …


Predictive Ai For The S&P 500 Index, Jacqueline Rose Perry Aug 2023

Predictive Ai For The S&P 500 Index, Jacqueline Rose Perry

Computer Science Senior Theses

Artificial intelligence has powerful applications in virtually every field, and the financial world is no exception. Utilizing various elements of artificial intelligence, this research aims to predict the future value of the S&P 500 index using numerous models, and in doing so, identify relevant features. More specifically, models that include combinations of historical data, public sentiment, and technical indicators were employed to predict the stock price one day and three days forward. To account for public opinion, the sentiment of tweets and news headlines from the beginning of 2015 through the end of 2019 was calculated using FinBERT, a pre-trained …


N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed Aug 2023

N-Shot Benchmarking Of Whisper On Diverse Arabic Speech Recognition, Bashar Talafha, Abdul Waheed, Muhammad Abdul-Mageed

Natural Language Processing Faculty Publications

Whisper, the recently developed multilingual weakly supervised model, is reported to perform well on multiple speech recognition benchmarks in both monolingual and multilingual settings. However, it is not clear how Whisper would fare under diverse conditions even on languages it was evaluated on such as Arabic. In this work, we address this gap by comprehensively evaluating Whisper on several varieties of Arabic speech for the ASR task. Our evaluation covers most publicly available Arabic speech data and is performed under n-shot (zero-, few-, and full) finetuning. We also investigate the robustness of Whisper under completely novel conditions, such as in …


Ocr Post-Processing Using Large Language Models, Mahdi Hajiali Aug 2023

Ocr Post-Processing Using Large Language Models, Mahdi Hajiali

UNLV Theses, Dissertations, Professional Papers, and Capstones

Optical Character Recognition (OCR) technology transforms textual visuals into an electronically readable, non-graphical format of the text. This allows the editing and other text manipulation of the content by language technology software such as machine translation, text comprehension, query-answering systems, and search engines. While Optical Character Recognition (OCR) systems continually progress towards greater precision, several complications persist when dealing with low-resolution source images or those with multicolored backgrounds. Consequently, the text derived from OCR necessitates additional refinement to optimize accuracy, beneficial for various subsequent applications. It is recognized that the character accuracy of OCR-generated text may influence certain natural language …


Connecting Linguistic Expressions And Pain Relief Through Transformer Model Construction And Analysis, Sarah M. Chacko May 2023

Connecting Linguistic Expressions And Pain Relief Through Transformer Model Construction And Analysis, Sarah M. Chacko

Computer Science Senior Theses

Chronic pain is a widespread problem that significantly impacts quality of life. Overprescription and abuse of pain medication continues to be a major public health issue and can further burden patients due to a fragmented health care system. Previous research has suggested a possible psychological basis to pain and the potential for safer, non-pharmacological alternatives for pain relief. This project leverages language models to study chronic pain development and relief through psychological treatments, which will be assessed through responses to post-treatment interviews. A transformer-based natural language processing model is employed to identify connections between language expressions and pain on a …


Assessing The Effectiveness Of A Chatbot Workshop As Experiential Teaching And Learning Tool To Engage Undergraduate Students, Kyong Jin Shim, Thomas Menkhoff, Ying Qian Teo, Clement Shi Qi Ong May 2023

Assessing The Effectiveness Of A Chatbot Workshop As Experiential Teaching And Learning Tool To Engage Undergraduate Students, Kyong Jin Shim, Thomas Menkhoff, Ying Qian Teo, Clement Shi Qi Ong

Research Collection School Of Computing and Information Systems

In this paper, we empirically examine and assess the effectiveness of a chatbot workshop as experiential teaching and learning tool to engage undergraduate students enrolled in an elective course “Doing Business with A.I.” in the Lee Kong Chian School of Business (LKCSB) at Singapore Management University. The chatbot workshop provides non-STEM students with an opportunity to acquire basic skills to build a chatbot prototype using the ‘Dialogflow’ program. The workshop and the experiential learning activity are designed to impart conversation and user-centric design know how and know why to students. A key didactical aspect which informs the design and flow …


Wearing Masks Implies Refuting Trump?: Towards Target-Specific User Stance Prediction Across Events In Covid-19 And Us Election 2020, Hong Zhang, Haewoon Kwak, Wei Gao, Jisun An May 2023

Wearing Masks Implies Refuting Trump?: Towards Target-Specific User Stance Prediction Across Events In Covid-19 And Us Election 2020, Hong Zhang, Haewoon Kwak, Wei Gao, Jisun An

Research Collection School Of Computing and Information Systems

People who share similar opinions towards controversial topics could form an echo chamber and may share similar political views toward other topics as well. The existence of such connections, which we call connected behavior, gives researchers a unique opportunity to predict how one would behave for a future event given their past behaviors. In this work, we propose a framework to conduct connected behavior analysis. Neural stance detection models are trained on Twitter data collected on three seemingly independent topics, i.e., wearing a mask, racial equality, and Trump, to detect people’s stance, which we consider as their online behavior in …


Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild May 2023

Examining The Effect Of Word Embeddings And Preprocessing Methods On Fake News Detection, Jessica Hauschild

Department of Statistics: Dissertations, Theses, and Student Work

The words people choose to use hold a lot of power, whether that be in spreading truth or deception. As listeners and readers, we do our best to understand how words are being used. There are many current methods in computer science literature attempting to embed words into numerical information for statistical analyses. Some of these embedding methods, such as Bag of Words, treat words as independent, while others, such as Word2Vec, attempt to gain information about the context of words. It is of interest to compare how well these various methods of translating text into numerical data work specifically …


Enhancing Institutional Assessment And Reporting Through Conversational Technologies: Exploring The Potential Of Ai-Powered Tools And Natural Language Processing, James Hutson, Daniel Plate May 2023

Enhancing Institutional Assessment And Reporting Through Conversational Technologies: Exploring The Potential Of Ai-Powered Tools And Natural Language Processing, James Hutson, Daniel Plate

Faculty Scholarship

This study explores the potential of conversational technologies, AI-powered tools, and natural language processing (NLP) in enhancing institutional assessment and reporting processes in higher education. The traditional approach to assessment often involves labor-intensive manual analysis of extensive data and documents, which burdens institutions. To address these challenges, AI-powered tools, such as ChatGPT, LangChain, Poe, Claude, and others, along with NLP techniques, are investigated in relationship to their ability to improve institutional assessment practices and output. By leveraging these advanced technologies, assessment officers and institutional effectiveness, researchers can engage in dynamic conversations with data, transforming spreadsheets and documents from static artifacts …


Behind Derogatory Migrants' Terms For Venezuelan Migrants: Xenophobia And Sexism Identification With Twitter Data And Nlp, Joseph Martínez, Melissa Miller-Felton, Jose Padilla, Erika Frydenlund Apr 2023

Behind Derogatory Migrants' Terms For Venezuelan Migrants: Xenophobia And Sexism Identification With Twitter Data And Nlp, Joseph Martínez, Melissa Miller-Felton, Jose Padilla, Erika Frydenlund

Modeling, Simulation and Visualization Student Capstone Conference

The sudden arrival of many migrants can present new challenges for host communities and create negative attitudes that reflect that tension. In the case of Colombia, with the influx of over 2.5 million Venezuelan migrants, such tensions arose. Our research objective is to investigate how those sentiments arise in social media. We focused on monitoring derogatory terms for Venezuelans, specifically veneco and veneca. Using a dataset of 5.7 million tweets from Colombian users between 2015 and 2021, we determined the proportion of tweets containing those terms. We observed a high prevalence of xenophobic and defamatory language correlated with the …


Towards Nlp-Based Conceptual Modeling Frameworks, David Shuttleworth, Jose Padilla Apr 2023

Towards Nlp-Based Conceptual Modeling Frameworks, David Shuttleworth, Jose Padilla

Modeling, Simulation and Visualization Student Capstone Conference

This paper presents preliminary research using Natural Language Processing (NLP) to support the development of conceptual modeling frameworks. NLP-based frameworks are intended to lower the barrier of entry for non-modelers to develop models and to facilitate communication across disciplines considering simulations in research efforts. NLP drives conceptual modeling in two ways. Firstly, it attempts to automate the generation of conceptual models and simulation specifications, derived from non-modelers’ narratives, while standardizing the conceptual modeling process and outcome. Secondly, as the process is automated, it is simpler to replicate and be followed by modelers and non-modelers. This allows for using a common …


Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese Apr 2023

Using Nlp To Model U.S. Supreme Court Cases, Katherine Lockard, Robert Slater, Brandon Sucrese

SMU Data Science Review

The advantages of employing text analysis to uncover policy positions, generate legal predictions, and inform or evaluate reform practices are multifold. Given the far-reaching effects of legislation at all levels of society these insights and their continued improvement are impactful. This research explores the use of natural language processing (NLP) and machine learning to predictively model U.S. Supreme Court case outcomes based on textual case facts. The final model achieved an F1-score of .324 and an AUC of .68. This suggests that the model can distinguish between the two target classes; however, further research is needed before machine learning models …


Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard Apr 2023

Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard

SMU Data Science Review

The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …


Professor Text: University Fundraising Optimization, Braden Anderson, Connor Dobbs, Hien Lam, John Santerre Apr 2023

Professor Text: University Fundraising Optimization, Braden Anderson, Connor Dobbs, Hien Lam, John Santerre

SMU Data Science Review

University fundraising campaigns are a unique type of cause-related marketing with its own challenges and opportunities. Campaigns like this typically last an extended period, such as five or more years, and goals exist beyond the dollar amount raised. These supplemental goals, such as awareness among potential future donators or brand reputation within the local community, are important to consider and strategize. There can also be unique limitations, such as requiring advertising specifically on recent large gifts or endowment programs. This research explores how machine learning techniques such as natural language processing can be used to optimize a fundraising campaign strategy, …


Language Modeling Using Image Representations Of Natural Language, Seong Eun Cho Apr 2023

Language Modeling Using Image Representations Of Natural Language, Seong Eun Cho

Theses and Dissertations

This thesis presents training of an end-to-end autoencoder model using the transformer, with an encoder that can encode sentences into fixed-length latent vectors and a decoder that can reconstruct the sentences using image representations. Encoding and decoding sentences to and from these image representations are central to the model design. This method allows new sentences to be generated by traversing the Euclidean space, which makes vector arithmetic possible using sentences. Machines excel in dealing with concrete numbers and calculations, but do not possess an innate infrastructure designed to help them understand abstract concepts like natural language. In order for a …


Beyond News Values On Twitter: Predicting Factors That Drive User Engagement In News, Zhiyan Zhong Apr 2023

Beyond News Values On Twitter: Predicting Factors That Drive User Engagement In News, Zhiyan Zhong

Dartmouth College Master’s Theses

When deciding on what news stories to cover, traditional journalism determines news values by following several elements of newsworthiness, such as impact, timeliness, and prominence. However, these guidelines do not always seem to correspond with the success of content on social media. As people are increasingly turning to social media for news, our research aims to understand and predict factors that drive user engagement for news on social media. In this study, we analyze news content published on Twitter, and examine a diverse set of characteristics like metrics retrieved from the Twitter API and semantics by natural language processing, including …


Socially Aware Natural Language Processing With Commonsense Reasoning And Fairness In Intelligent Systems, Sirwe Saeedi Apr 2023

Socially Aware Natural Language Processing With Commonsense Reasoning And Fairness In Intelligent Systems, Sirwe Saeedi

Dissertations

Although Artificial Intelligence (AI) promises to deliver ever more user-friendly consumer applications, recent mishaps involving fake information and biased treatment serve as vivid reminders of the pitfalls of AI. AI can harbor latent biases and flaws that can cause harm in diverse and unexpected ways. It is crucial to understand the reasons for, mechanisms behind, and circumstances under which AI can fail. For instance, a lack of commonsense reasoning can lead to biased or unfair decisions made by Machine Learning (ML) systems. For example, if an ML system is trained on data that is biased or unrepresentative of the real …


Conversations With Chatgpt About C Programming: An Ongoing Study, James C. Davis, Yung-Hsiang Lu, George K. Thiruvathukal Mar 2023

Conversations With Chatgpt About C Programming: An Ongoing Study, James C. Davis, Yung-Hsiang Lu, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

AI (Artificial Intelligence) Generative Models have attracted great attention in recent years. Generative models can be used to create new articles, visual arts, music composition, even computer programs from English specifications. Among all generative models, ChatGPT is becoming one of the most well-known since its public announcement in November 2022. GPT means {\it Generative Pre-trained Transformer}. ChatGPT is an online program that can interact with human users in text formats and is able to answer questions in many topics, including computer programming. Many computer programmers, including students and professionals, are considering the use of ChatGPT as an aid. The quality …


Hierarchical Joint Entity Recognition And Relation Extraction Of Contextual Entities In Family History Records, Daniel Segrera Mar 2023

Hierarchical Joint Entity Recognition And Relation Extraction Of Contextual Entities In Family History Records, Daniel Segrera

Theses and Dissertations

Entity extraction is an important step in document understanding. Higher accuracy entity extraction on fine-grained entities can be achieved by combining the utility of Named Entity Recognition (NER) and Relation Extraction (RE) models. In this paper, a cascading model is proposed that implements NER and Relation extraction. This model utilizes relations between entities to infer context-dependent fine-grain named entities in text corpora. The RE module runs independent of the NER module, which reduces error accumulation from sequential steps. This process improves on the fine-grained NER F1-score of existing state-of-the-art from .4753 to .8563 on our data, albeit on a strictly …


Investment And Risk Management With Online News And Heterogeneous Networks, Meng Kiat Gary Ang, Ee-Peng Lim Mar 2023

Investment And Risk Management With Online News And Heterogeneous Networks, Meng Kiat Gary Ang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Stock price movements in financial markets are influenced by large volumes of news from diverse sources on the web, e.g., online news outlets, blogs, social media. Extracting useful information from online news for financial tasks, e.g., forecasting stock returns or risks, is, however, challenging due to the low signal-to-noise ratios of such online information. Assessing the relevance of each news article to the price movements of individual stocks is also difficult, even for human experts. In this article, we propose the Guided Global-Local Attention-based Multimodal Heterogeneous Network (GLAM) model, which comprises novel attention-based mechanisms for multimodal sequential and graph encoding, …


Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör Mar 2023

Solving Turkish Math Word Problems By Sequence-To-Sequence Encoder-Decoder Models, Esi̇n Gedi̇k, Tunga Güngör

Turkish Journal of Electrical Engineering and Computer Sciences

Solving math word problems (MWP) is a challenging task due to the semantic gap between natural language texts and mathematical equations. The main purpose of the task is to take a written math problem as input and produce a proper equation as output for solving that problem. This paper describes a sequence-to-sequence (seq2seq) neural model for automatically solving Turkish MWPs based on their semantic meanings in the text. It comprises a bidirectional encoder to comprehend the semantics of the problem by encoding the input sequence and a decoder with attention to extract the equation by tracking the semantic meanings of …


A Quantum Approach To Language Modeling, Constantijn Van Der Poel Feb 2023

A Quantum Approach To Language Modeling, Constantijn Van Der Poel

Dissertations, Theses, and Capstone Projects

This dissertation consists of six chapters. . . Chapter 1: We introduce language modeling, outline the software used for this thesis, and discuss related work. Chapter 2: We will unpack the transition from classical to quantum probabilities, as well as motivate their use in building a model to understand language-like datasets. Chapter 3: We motivate the Motzkin dataset, the models we will be investigating, as well as the necessary algorithms to do calculations with them. Chapter 4: We investigate our models’ sensitivity to various hyperparameters. Chapter 5: We compare the performance and robustness of the models. Chapter 6: We conclude …


Semantic Orientation Of Crosslingual Sentiments: Employment Of Lexicon And Dictionaries, Arslan Ali Raza, Asad Habib, Jawad Ashraf, Babar Shah, Fernando Moreira Jan 2023

Semantic Orientation Of Crosslingual Sentiments: Employment Of Lexicon And Dictionaries, Arslan Ali Raza, Asad Habib, Jawad Ashraf, Babar Shah, Fernando Moreira

All Works

Sentiment Analysis is a modern discipline at the crossroads of data mining and natural language processing. It is concerned with the computational treatment of public moods shared in the form of text over social networking websites. Social media users express their feelings in conversations through cross-lingual terms, intensifiers, enhancers, reducers, symbols, and Net Lingo. However, the generic Sentiment Analysis (SA) research lacks comprehensive coverage about such abstruseness. In particular, they are inapt in the semantic orientation of Crosslingual based code switching, capitalization and accentuation of opinionative text due to the lack of annotated corpora, computational resources, linguistic processing and inefficient …


The Use Of Artificial Intelligence To Detect Students Sentiments And Emotions In Gross Anatomy Reflections, Krzysztof J. Rechowicz, Carrie A. Elzie Jan 2023

The Use Of Artificial Intelligence To Detect Students Sentiments And Emotions In Gross Anatomy Reflections, Krzysztof J. Rechowicz, Carrie A. Elzie

VMASC Publications

Students' reflective writings in gross anatomy provide a rich source of complex emotions experienced by learners. However, qualitative approaches to evaluating student writings are resource heavy and timely. To overcome this, natural language processing, a nascent field of artificial intelligence that uses computational techniques for the analysis and synthesis of text, was used to compare health professional students' reflections on the importance of various regions of the body to their own lives and those of the anatomical donor dissected. A total of 1365 anonymous writings (677 about a donor, 688 about self) were collected from 132 students. Binary and trinary …