Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Browse all Theses and Dissertations

Natural Language Processing

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …


A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne Jan 2018

A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne

Browse all Theses and Dissertations

Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 …


Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra Jan 2018

Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra

Browse all Theses and Dissertations

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of …


Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri Jan 2017

Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri

Browse all Theses and Dissertations

The success of any multiplayer game depends on the player’s experience. Cheating/Hacking undermines the player’s experience and thus the success of that game. Cheaters, who use hacks, bots or trainers are ruining the gaming experience of a player and are making him leave the game. As the video game industry is a constantly increasing multibillion dollar economy, it is crucial to assure and maintain a state of security. Players reflect their gaming experience in one of the following places: multiplayer chat, game reviews, and social media. This thesis is an exploratory study where our goal is to experiment and propose …


Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna Jan 2017

Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna

Browse all Theses and Dissertations

The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the …


What Machines Understand About Personality Words After Reading The News, Eric David Moyer Jan 2014

What Machines Understand About Personality Words After Reading The News, Eric David Moyer

Browse all Theses and Dissertations

Vector-based lexical semantics is a powerful technique that still has many undiscovered applications. In this thesis I apply a vector-space lexical-semantic model newly developed by Mikolov et. al. trained on skip-grams to the lexical hypothesis in personality psychology. The method produces interpretable dimensions that are consistent across several sets of descriptive personality words. The dimensions include ones for conflict and positive and negative evaluation. However they are more descriptive of word usage semantics than of the characteristics of the thing described and thus do not include a recognizable component of the 5 factor model in their first 14 dimensions. They …


Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills Jan 2013

Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills

Browse all Theses and Dissertations

The purpose of this research is to design and implement a new methodology that captures the natural language understanding of events from English natural language text and model it using Stochastic Petri Nets. To establish a baseline of recent natural language processing (NLP) and understanding (NLU) research, two surveys are presented. One is a general survey in NLP and NLU methodologies for processing multi-documents. It summarizes and presents methodologies in terms of their features, capabilities, and maturity. The second survey focuses on graph-based methods for NL text processing and understanding and analyzes them in terms of their functional descriptions, capabilities …


A Latent Dirichlet Allocation/N-Gram Composite Language Model, Raymond Daniel Kulhanek Jan 2013

A Latent Dirichlet Allocation/N-Gram Composite Language Model, Raymond Daniel Kulhanek

Browse all Theses and Dissertations

I present a composite language model in which an n-gram language model is integrated with the Latent Dirichlet Allocation topic clustering model. I also describe a parallel architecture that allows this model to be trained over large corpora and present experimental results that show how the composite model compares to a standard n-gram model over corpora of varying size.