Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Natural Language Processing

Discipline
Institution
Publication Year
Publication

Articles 1 - 21 of 21

Full-Text Articles in Computer Engineering

Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray Dec 2023

Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray

Masters Theses

In this work, we designed two prototypes for a friendly robot to ease dementia (FRED). This affordable social robot is designed to provide company to older adults with cognitive decline, create reminders for important events and tasks, like taking medication, and providing cognitive stimulus through games. This project combines several cloud technologies including speech-to-text, cloud data storage, and chat generation in order to provide high level interactions with a social robot. Software design patterns were employed in the creation of the software to produce flexible code base that can sustain platform changes easily, including the framework used for the graphical …


Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul Mar 2023

Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul

Master's Theses

Committee meetings are a fundamental part of the legislative process in which
constituents, lobbyists, and legislators alike can speak on proposed bills at the
local and state level. Oftentimes, unspoken “rules” or standards are at play in
political processes that can influence the trajectory of a bill, leaving constituents
without a political background at an inherent disadvantage when engaging with
the legislative process. The work done in this thesis aims to explore the extent to
which the language and phraseology of a general public testimony can influence a
vote, and examine how this information can be used to promote civic …


Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman Jan 2023

Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman

Browse all Theses and Dissertations

Insider threats to information security have become a burden for organizations. Understanding insider activities leads to an effective improvement in identifying insider attacks and limits their threats. This dissertation presents three systems to detect insider threats effectively. The aim is to reduce the false negative rate (FNR), provide better dataset use, and reduce dimensionality and zero padding effects. The systems developed utilize deep learning techniques and are evaluated using the CERT 4.2 dataset. The dataset is analyzed and reformed so that each row represents a variable length sample of user activities. Two data representations are implemented to model extracted features …


Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli Jun 2022

Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli

Master's Theses

Though exploring one’s family lineage through genealogical family trees can be insightful to developing one’s identity, this knowledge is typically held behind closed doors by private companies or require expensive technologies, such as DNA testing, to uncover. With the ever-booming explosion of data on the world wide web, many unstructured text documents, both old and new, are being discovered, written, and processed which contain rich genealogical information. With access to this immense amount of data, however, entails a costly process whereby people, typically volunteers, have to read large amounts of text to find relationships between people. This delays having genealogical …


Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi Jan 2022

Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi

Theses and Dissertations--Computer Science

Computer networks are becoming more complex and challenging to operate, manage, and protect. As a result, Network policies that define how network operators should manage the network are becoming more complex and nuanced. Unfortunately, network policies are often an undervalued part of network design, leaving network operators to guess at the intent of policies that are written and fill in the gaps where policies don’t exist. Organizations typically designate Policy Committees to write down the network policies in the policy documents using high-level natural languages. The policy documents describe both the acceptable and unacceptable uses of the network. Network operators …


Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet Dec 2020

Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet

Graduate Theses and Dissertations

In this dissertation, we present and analyze the technology used in the making of PPMExplorer: Search, Find, and Explore Pompeii. PPMExplorer is a software tool made with data extracted from the Pompei: Pitture e Mosaic (PPM) volumes. PPM is a valuable set of volumes containing 20,000 historical annotated images of the archaeological site of Pompeii, Italy accompanied by extensive captions. We transformed the volumes from paper, to digital, to searchable. PPMExplorer enables archaeologist researchers to conduct and check hypotheses on historical findings. We present a theory that such a concept is possible by leveraging computer generated correlations between artifacts using …


Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed Jan 2020

Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed

Theses and Dissertations

Radiation oncology is the field of medicine that deals with treating cancer patients through ionizing radiation. The clinical modality or technique used to treat the cancer patients in the radiation oncology domain is referred to as radiation therapy. Radiation therapy aims to deliver precisely measured dose irradiation to a defined tumor volume (target) with as minimal damage as possible to surrounding healthy tissue (organs-at-risk), resulting in eradication of the tumor, high quality of life, and prolongation of survival. A typical radiotherapy process requires the use of different clinical systems at various stages of the workflow. The data generated in these …


Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …


Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu Dec 2018

Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu

Dissertations and Theses Collection (Open Access)

In the past few decades, supervised machine learning approach is one of the most important methodologies in the Natural Language Processing (NLP) community. Although various kinds of supervised learning methods have been proposed to obtain the state-of-the-art performance across most NLP tasks, the bottleneck of them lies in the heavy reliance on the large amount of manually annotated data, which is not always available in our desired target domain/task. To alleviate the data sparsity issue in the target domain/task, an attractive solution is to find sufficient labeled data from a related source domain/task. However, for most NLP applications, due to …


A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne Jan 2018

A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne

Browse all Theses and Dissertations

Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 …


Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra Jan 2018

Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra

Browse all Theses and Dissertations

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of …


Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu Dec 2017

Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu

Master's Theses

Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create …


Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang Sep 2017

Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang

Electronic Theses, Projects, and Dissertations

Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program …


Using Natural Language Processing And Machine Learning Techniques To Characterize Configuration Bug Reports: A Study, Wei Wen Jan 2017

Using Natural Language Processing And Machine Learning Techniques To Characterize Configuration Bug Reports: A Study, Wei Wen

Theses and Dissertations--Computer Science

In this study, a tool is developed that achieves two purposes: (1) given bug reports, it identifies configuration bug reports from non-configuration bug reports; (2) once a bug report is identified to be a configuration bug report, the tool finds out what specific configuration option the bug report is associated.

This study starts with a review of related works that used machine learning tools to solve software bug and bug report related issues. It then discusses the natural language processing and machine learning techniques. Afterwards, the development process of the proposed tool is described in detail, including the motivation, the …


Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri Jan 2017

Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri

Browse all Theses and Dissertations

The success of any multiplayer game depends on the player’s experience. Cheating/Hacking undermines the player’s experience and thus the success of that game. Cheaters, who use hacks, bots or trainers are ruining the gaming experience of a player and are making him leave the game. As the video game industry is a constantly increasing multibillion dollar economy, it is crucial to assure and maintain a state of security. Players reflect their gaming experience in one of the following places: multiplayer chat, game reviews, and social media. This thesis is an exploratory study where our goal is to experiment and propose …


Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna Jan 2017

Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna

Browse all Theses and Dissertations

The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the …


An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler Dec 2014

An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler

University of New Orleans Theses and Dissertations

This thesis performs an empirical analysis of Word2Vec by comparing its output to WordNet, a well-known, human-curated lexical database. It finds that Word2Vec tends to uncover more of certain types of semantic relations than others -- with Word2Vec returning more hypernyms, synonomyns and hyponyms than hyponyms or holonyms. It also shows the probability that neighbors separated by a given cosine distance in Word2Vec are semantically related in WordNet. This result both adds to our understanding of the still-unknown Word2Vec and helps to benchmark new semantic tools built from word vectors.


Tspoons: Tracking Salience Profiles Of Online News Stories, Kimberly Laurel Paterson Jun 2014

Tspoons: Tracking Salience Profiles Of Online News Stories, Kimberly Laurel Paterson

Master's Theses

News space is a relatively nebulous term that describes the general discourse concerning events that affect the populace. Past research has focused on qualitatively analyzing news space in an attempt to answer big questions about how the populace relates to the news and how they respond to it. We want to ask when do stories begin? What stories stand out among the noise? In order to answer the big questions about news space, we need to track the course of individual stories in the news. By analyzing the specific articles that comprise stories, we can synthesize the information gained from …


Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills Jan 2013

Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills

Browse all Theses and Dissertations

The purpose of this research is to design and implement a new methodology that captures the natural language understanding of events from English natural language text and model it using Stochastic Petri Nets. To establish a baseline of recent natural language processing (NLP) and understanding (NLU) research, two surveys are presented. One is a general survey in NLP and NLU methodologies for processing multi-documents. It summarizes and presents methodologies in terms of their features, capabilities, and maturity. The second survey focuses on graph-based methods for NL text processing and understanding and analyzes them in terms of their functional descriptions, capabilities …


A System For Natural Language Unmarked Clausal Transformations In Text-To-Text Applications, Daniel Miller Jun 2009

A System For Natural Language Unmarked Clausal Transformations In Text-To-Text Applications, Daniel Miller

Master's Theses

A system is proposed which separates clauses from complex sentences into simpler stand-alone sentences. This is useful as an initial step on raw text, where the resulting processed text may be fed into text-to-text applications such as Automatic Summarization, Question Answering, and Machine Translation, where complex sentences are difficult to process. Grammatical natural language transformations provide a possible method to simplify complex sentences to enhance the results of text-to-text applications. Using shallow parsing, this system improves the performance of existing systems to identify and separate marked and unmarked embedded clauses in complex sentence structure resulting in syntactically simplified source for …