Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 21 of 21

Full-Text Articles in Computer Sciences

Enhanced Content-Based Fake News Detection Methods With Context-Labeled News Sources, Duncan Arnfield Dec 2023

Enhanced Content-Based Fake News Detection Methods With Context-Labeled News Sources, Duncan Arnfield

Electronic Theses and Dissertations

This work examined the relative effectiveness of multilayer perceptron, random forest, and multinomial naïve Bayes classifiers, trained using bag of words and term frequency-inverse dense frequency transformations of documents in the Fake News Corpus and Fake and Real News Dataset. The goal of this work was to help meet the formidable challenges posed by proliferation of fake news to society, including the erosion of public trust, disruption of social harmony, and endangerment of lives. This training included the use of context-categorized fake news in an effort to enhance the tools’ effectiveness. It was found that term frequency-inverse dense frequency provided …


Review Classification Using Natural Language Processing And Deep Learning, Brian Nazareth Dec 2023

Review Classification Using Natural Language Processing And Deep Learning, Brian Nazareth

Electronic Theses, Projects, and Dissertations

Sentiment Analysis is an ongoing research in the field of Natural Language Processing (NLP). In this project, I will evaluate my testing against an Amazon Reviews Dataset, which contains more than 100 thousand reviews from customers. This project classifies the reviews using three methods – using a sentiment score by comparing the words of the reviews based on every positive and negative word that appears in the text with the Opinion Lexicon dataset, by considering the text’s variating sentiment polarity scores with a Python library called TextBlob, and with the help of neural network training. I have created a neural …


Twitter Bot Detection Using Nlp And Graph Classification, Warada Jayant Kulkarni Jan 2023

Twitter Bot Detection Using Nlp And Graph Classification, Warada Jayant Kulkarni

Master's Projects

Social media platforms are one of the primary resources for information as it is easily accessible, low in cost, and provides a high rate of information spread. Online social media (OSM) have become the main source of news information around the world, but because of the distributed nature of the web, it has increased the risk of fake news spread. Fake news is misleading information that is published as real news. Therefore, identifying fake news and flagging them as such, as well as detecting sources that generate them is an ongoing task for researchers and OSM companies. Bots are artificial …


Influence Level Prediction On Social Media Through Multi-Task And Sociolinguistic User Characteristics Modeling, Denys Katerenchuk Sep 2022

Influence Level Prediction On Social Media Through Multi-Task And Sociolinguistic User Characteristics Modeling, Denys Katerenchuk

Dissertations, Theses, and Capstone Projects

Prediction of a user’s influence level on social networks has attracted a lot of attention as human interactions move online. Influential users have the ability to influence others’ behavior to achieve their own agenda. As a result, predicting users’ level of influence online can help to understand social networks, forecast trends, prevent misinformation, etc. The research on user influence in social networks has attracted much attention across multiple disciplines, from social sciences to mathematics, yet it is still not well understood. One of the difficulties is that the definition of influence is specific to a particular problem or a domain, …


Video Games, Grief, And The Character Link System, Nam Nguyen May 2022

Video Games, Grief, And The Character Link System, Nam Nguyen

University of New Orleans Theses and Dissertations

Grief can encompass more than just the loss of real-life people. It can be felt with the loss of a pet, changes in daily structure, and even the loss of video game characters. The topic of grief related to video games and video game characters comes at a time when games as a service (GaaS) continue to increase in popularity and the phenomenon where these games also inevitably terminate service. To combat this unique form of grief, the Character LINK System was created as a tool that uses simple natural language processing (NLP) techniques to offer support to the bereaved …


Humanizing Computational Literature Analysis Through Art-Based Visualizations, Alexandria Leto Jan 2022

Humanizing Computational Literature Analysis Through Art-Based Visualizations, Alexandria Leto

Electronic Theses and Dissertations

Inequalities in gender representation and characterization in fictional works are issues that have long been discussed by social scientists. This work addresses these inequalities with two interrelated components. First, it contributes a sentiment and word frequency analysis task focused on gender-specific nouns and pronouns in 15,000 fictional works taken from the online library, Project Gutenberg. This analysis allows for both quantifying and offering further insight on the nature of this disparity in gender representation. Then, the outcomes of the analysis are harnessed to explore novel data visualization formats using computational and studio art techniques. Our results call attention to the …


Identifying Optimal Course Structures Using Topic Models, Tehut Tesfaye Biru Jun 2021

Identifying Optimal Course Structures Using Topic Models, Tehut Tesfaye Biru

Dartmouth College Undergraduate Theses

This research project investigates whether there exists an optimal way to structure topics in educational course content that results in higher levels of engagement among students. It is implemented by fitting topic models to transcripts of educational videos contained in the Khan Academy platform. The fitted models were used to extract topic trajectories across time for each video and subsequently clustered based on whether they have similar “shapes”. The differences in mean engagement metrics per cluster suggest that some course shapes are more palatable to students regardless of subject matter. Additionally, the topic trajectories suggest a constant progression of topics …


A Comparison Of Word Embedding Techniques For Similarity Analysis, Tyler Gerth May 2021

A Comparison Of Word Embedding Techniques For Similarity Analysis, Tyler Gerth

Computer Science and Computer Engineering Undergraduate Honors Theses

There have been a multitude of word embedding techniques developed that allow a computer to process natural language and compare the relationships between different words programmatically. In this paper, similarity analysis, or the testing of words for synonymic relations, is used to compare several of these techniques to see which performs the best. The techniques being compared all utilize the method of creating word vectors, reducing words down into a single vector of numerical values that denote how the word relates to other words that appear around it. In order to get a holistic comparison, multiple analyses were made, with …


Using Natural Language Processing To Categorize Fictional Literature In An Unsupervised Manner, Dalton J. Crutchfield Jan 2020

Using Natural Language Processing To Categorize Fictional Literature In An Unsupervised Manner, Dalton J. Crutchfield

Electronic Theses and Dissertations

When following a plot in a story, categorization is something that humans do without even thinking; whether this is simple classification like “This is science fiction” or more complex trope recognition like recognizing a Chekhov's gun or a rags to riches storyline, humans group stories with other similar stories. Research has been done to categorize basic plots and acknowledge common story tropes on the literary side, however, there is not a formula or set way to determine these plots in a story line automatically. This paper explores multiple natural language processing techniques in an attempt to automatically compare and cluster …


Pseudo-Data Generation For Improving Clinical Named Entity Recognition, Jeffrey T. Smith Jan 2020

Pseudo-Data Generation For Improving Clinical Named Entity Recognition, Jeffrey T. Smith

Theses and Dissertations

One of the primary challenges for clinical Named Entity Recognition (NER) is the availability of annotated training data. Technical and legal hurdles prevent the creation and release of corpora related to electronic health records (EHRs). In this work, we look at the imapct of pseudo-data generation on clinical NER using gazetteering and thresholding utilizing a neural network model. We report that gazetteers can result in the inclusion of proper terms with the exclusion of determiners and pronouns in preceding and middle positions. Gazetteers that had higher numbers of terms inclusive to the original dataset had a higher impact. We also …


Music Mood Classification Using Convolutional Neural Networks, Revanth Akella May 2019

Music Mood Classification Using Convolutional Neural Networks, Revanth Akella

Master's Projects

Grouping music into moods is useful as music is migrating from to online streaming services as it can help in recommendations. To establish the connection between music and mood we develop an end-to-end, open source approach for mood classification using lyrics. We develop a pipeline for tag extraction, lyric extraction, and establishing classification models for classifying music into moods. We investigate techniques to classify music into moods using lyrics and audio features. Using various natural language processing methods with machine learning and deep learning we perform a comparative study across different classification and mood models. The results infer that features …


Chatbots With Personality Using Deep Learning, Susmit Gaikwad May 2019

Chatbots With Personality Using Deep Learning, Susmit Gaikwad

Master's Projects

Natural Language Processing (NLP) requires the computational modelling of the complex relationships of the syntax and semantics of a language. While traditional machine learning methods are used to solve NLP problems, they cannot imitate the human ability for language comprehension. With the growth in deep learning, these complexities within NLP are easier to model, and be used to build many computer applications. A particular example of this is a chatbot, where a human user has a conversation with a computer program, that generates responses based on the user’s input. In this project, we study the methods used in building chatbots, …


A Transfer Learning Approach For Sentiment Classification., Omar Abdelwahab Dec 2018

A Transfer Learning Approach For Sentiment Classification., Omar Abdelwahab

Electronic Theses and Dissertations

The idea of developing machine learning systems or Artificial Intelligence agents that would learn from different tasks and be able to accumulate that knowledge with time so that it functions successfully on a new task that it has not seen before is an idea and a research area that is still being explored. In this work, we will lay out an algorithm that allows a machine learning system or an AI agent to learn from k different domains then uses some or no data from the new task for the system to perform strongly on that new task. In order …


Cse: U: Mixed-Initiative Personal Assistant Agents, Joshua W. Buck, Saverio Perugini, Tam Nguyen Nov 2018

Cse: U: Mixed-Initiative Personal Assistant Agents, Joshua W. Buck, Saverio Perugini, Tam Nguyen

Saverio Perugini

Specification and implementation of flexible human-computer dialogs is challenging because of the complexity involved in rendering the dialog responsive to a vast number of varied paths through which users might desire to complete the dialog. To address this problem, we developed a toolkit for modeling and implementing task-based, mixed-initiative dialogs based on metaphors from lambda calculus. Our toolkit can automatically operationalize a dialog that involves multiple prompts and/or sub-dialogs, given a high-level dialog specification of it. The use of natural language with the resulting dialogs makes the flexibility in communicating user utterances commensurate with that in dialog completion paths—an aspect …


Chrono: A System For Normalizing Temporal Expressions, Amy L. Olex, Luke G. Maffey, Nicholas Morton, Bridget T. Mcinnes Jan 2018

Chrono: A System For Normalizing Temporal Expressions, Amy L. Olex, Luke G. Maffey, Nicholas Morton, Bridget T. Mcinnes

Computer Science Publications

The Chrono System: Chrono is a hybrid rule-based and machine learning system written in Python and built from the ground up to identify temporal expressions in text and normalizes them into the SCATE schema. Input text is preprocessed using Python’s NLTK package, and is run through each of the four primary modules highlighted here. Note that Chrono does not remove stopwords because they add temporal information and context, and Chrono does not tokenize sentences. Output is an Anafora XML file with annotated SCATE entities. After minor parsing logic adjustments, Chrono has emerged as the top performing system for SemEval 2018 …


Tandem 2.0: Image And Text Data Generation Application, Christopher J. Vitale Feb 2017

Tandem 2.0: Image And Text Data Generation Application, Christopher J. Vitale

Dissertations, Theses, and Capstone Projects

First created as part of the Digital Humanities Praxis course in the spring of 2012 at the CUNY Graduate Center, Tandem explores the generation of datasets comprised of text and image data by leveraging Optical Character Recognition (OCR), Natural Language Processing (NLP) and Computer Vision (CV). This project builds upon that earlier work in a new programming framework. While other developers and digital humanities scholars have created similar tools specifically geared toward NLP (e.g. Voyant-Tools), as well as algorithms for image processing and feature extraction on the CV side, Tandem explores the process of developing a more robust and user-friendly …


Cse: U: Mixed-Initiative Personal Assistant Agents, Joshua W. Buck, Saverio Perugini, Tam Nguyen Jan 2017

Cse: U: Mixed-Initiative Personal Assistant Agents, Joshua W. Buck, Saverio Perugini, Tam Nguyen

Computer Science Faculty Publications

Specification and implementation of flexible human-computer dialogs is challenging because of the complexity involved in rendering the dialog responsive to a vast number of varied paths through which users might desire to complete the dialog. To address this problem, we developed a toolkit for modeling and implementing task-based, mixed-initiative dialogs based on metaphors from lambda calculus. Our toolkit can automatically operationalize a dialog that involves multiple prompts and/or sub-dialogs, given a high-level dialog specification of it. The use of natural language with the resulting dialogs makes the flexibility in communicating user utterances commensurate with that in dialog completion paths—an aspect …


Categorizing Blog Spam, Brandon Bevans Jun 2016

Categorizing Blog Spam, Brandon Bevans

Master's Theses

The internet has matured into the focal point of our era. Its ecosystem is vast, complex, and in many regards unaccounted for. One of the most prevalent aspects of the internet is spam. Similar to the rest of the internet, spam has evolved from simply meaning ‘unwanted emails’ to a blanket term that encompasses any unsolicited or illegitimate content that appears in the wide range of media that exists on the internet.

Many forms of spam permeate the internet, and spam architects continue to develop tools and methods to avoid detection. On the other side, cyber security engineers continue to …


Cest: City Event Summarization Using Twitter, Deepa Mallela May 2016

Cest: City Event Summarization Using Twitter, Deepa Mallela

Computer Science Graduate Projects and Theses

Twitter, with 288 million active users, has become the most popular platform for continuous real-time discussions. This leads to huge amounts of information related to the real-world, which has attracted researchers from both academia and industry. Event detection on Twitter has gained attention as one of the most popular domains of interest within the research community. Unfortunately, existing event detection methodologies have yet to fully explore Twitter metadata and instead rely solely on identifying events based on prior information or focus on events that belong to specific categories. Given the heavy volume of tweets that discuss events, summarization techniques can …


Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes Jun 2013

Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes

Master's Theses

In the field of speech recognition, an algorithm must learn to tell the difference between "a nice rock" and "a gneiss rock". These identical-sounding phrases are called oronyms. Word frequency dictionaries are often used by speech recognition systems to help resolve phonetic sequences with more than one possible orthographic phrase interpretation, by looking up which oronym of the root phonetic sequence contains the most-common words.

Our paper demonstrates a technique used to validate word frequency dictionary values. We chose to use frequency values from the UNISYN dictionary, which tallies each word on a per-occurance basis, using a proprietary text corpus, …


Identifying Subjective Statements In News Titles Using A Personal Sense Annotation Framework, Polina Panicheva, John Cardiff, Paolo Rosso Apr 2013

Identifying Subjective Statements In News Titles Using A Personal Sense Annotation Framework, Polina Panicheva, John Cardiff, Paolo Rosso

John Cardiff

Subjective language contains information about private states. The goal of subjective language identification is to determine that a private state is expressed, without considering its polarity or specific emotion. A component of word meaning, “Personal Sense,” has clear potential in the field of subjective language identification, as it reflects a meaning of words in terms of unique personal experience and carries personal characteristics. In this paper we investigate how Personal Sense can be harnessed for the purpose of identifying subjectivity in news titles. In the process, we develop a new Personal Sense annotation framework for annotating and classifying subjectivity, polarity, …