Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Engineering

Natural Language Processing

Institution
Publication Year
Publication
Publication Type

Articles 1 - 30 of 37

Full-Text Articles in Engineering

Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray Dec 2023

Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray

Masters Theses

In this work, we designed two prototypes for a friendly robot to ease dementia (FRED). This affordable social robot is designed to provide company to older adults with cognitive decline, create reminders for important events and tasks, like taking medication, and providing cognitive stimulus through games. This project combines several cloud technologies including speech-to-text, cloud data storage, and chat generation in order to provide high level interactions with a social robot. Software design patterns were employed in the creation of the software to produce flexible code base that can sustain platform changes easily, including the framework used for the graphical …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul Mar 2023

Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul

Master's Theses

Committee meetings are a fundamental part of the legislative process in which
constituents, lobbyists, and legislators alike can speak on proposed bills at the
local and state level. Oftentimes, unspoken “rules” or standards are at play in
political processes that can influence the trajectory of a bill, leaving constituents
without a political background at an inherent disadvantage when engaging with
the legislative process. The work done in this thesis aims to explore the extent to
which the language and phraseology of a general public testimony can influence a
vote, and examine how this information can be used to promote civic …


Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman Jan 2023

Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman

Browse all Theses and Dissertations

Insider threats to information security have become a burden for organizations. Understanding insider activities leads to an effective improvement in identifying insider attacks and limits their threats. This dissertation presents three systems to detect insider threats effectively. The aim is to reduce the false negative rate (FNR), provide better dataset use, and reduce dimensionality and zero padding effects. The systems developed utilize deep learning techniques and are evaluated using the CERT 4.2 dataset. The dataset is analyzed and reformed so that each row represents a variable length sample of user activities. Two data representations are implemented to model extracted features …


Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra Oct 2022

Softskip: Empowering Multi-Modal Dynamic Pruning For Single-Stage Referring Comprehension, Dulanga Weerakoon, Vigneshwaran Subbaraju, Tuan Tran, Archan Misra

Research Collection School Of Computing and Information Systems

Supporting real-time referring expression comprehension (REC) on pervasive devices is an important capability for human-AI collaborative tasks. Model pruning techniques, applied to DNN models, can enable real-time execution even on resource-constrained devices. However, existing pruning strategies are designed principally for uni-modal applications, and suffer a significant loss of accuracy when applied to REC tasks that require fusion of textual and visual inputs. We thus present a multi-modal pruning model, LGMDP, which uses language as a pivot to dynamically and judiciously select the relevant computational blocks that need to be executed. LGMDP also introduces a new SoftSkip mechanism, whereby 'skipped' visual …


Knowledge-Driven Drug-Use Namedentity Recognition With Distant Supervision, Goonmeet Bajaj, Ugur Kursuncu, Manas Gaur, Usha Lokala, Ayaz Hyder, Srinivasan Parthasarathy, Amit Sheth Jun 2022

Knowledge-Driven Drug-Use Namedentity Recognition With Distant Supervision, Goonmeet Bajaj, Ugur Kursuncu, Manas Gaur, Usha Lokala, Ayaz Hyder, Srinivasan Parthasarathy, Amit Sheth

Publications

As Named Entity Recognition (NER) has been essential in identifying critical elements of unstructured content, generic NER tools remain limited in recognizing entities specific to a domain, such as drug use and public health. For such high-impact areas, accurately capturing relevant entities at a more granular level is critical, as this information influences real-world processes. On the other hand, training NER models for a specific domain without handcrafted features requires an extensive amount of labeled data, which is expensive in human effort and time. In this study, we employ distant supervision utilizing a domain-specific ontology to reduce the need for …


Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli Jun 2022

Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli

Master's Theses

Though exploring one’s family lineage through genealogical family trees can be insightful to developing one’s identity, this knowledge is typically held behind closed doors by private companies or require expensive technologies, such as DNA testing, to uncover. With the ever-booming explosion of data on the world wide web, many unstructured text documents, both old and new, are being discovered, written, and processed which contain rich genealogical information. With access to this immense amount of data, however, entails a costly process whereby people, typically volunteers, have to read large amounts of text to find relationships between people. This delays having genealogical …


Measuring And Comparing Social Bias In Static And Contextual Word Embeddings, Alan Cueva Mora Jan 2022

Measuring And Comparing Social Bias In Static And Contextual Word Embeddings, Alan Cueva Mora

Dissertations

Word embeddings have been considered one of the biggest breakthroughs of deep learning for natural language processing. They are learned numerical vector representations of words where similar words have similar representations. Contextual word embeddings are the promising second-generation of word embeddings assigning a representation to a word based on its context. This can result in different representations for the same word depending on the context (e.g. river bank and commercial bank). There is evidence of social bias (human-like implicit biases based on gender, race, and other social constructs) in word embeddings. While detecting bias in static (classical or non-contextual) word …


Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi Jan 2022

Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi

Theses and Dissertations--Computer Science

Computer networks are becoming more complex and challenging to operate, manage, and protect. As a result, Network policies that define how network operators should manage the network are becoming more complex and nuanced. Unfortunately, network policies are often an undervalued part of network design, leaving network operators to guess at the intent of policies that are written and fill in the gaps where policies don’t exist. Organizations typically designate Policy Committees to write down the network policies in the policy documents using high-level natural languages. The policy documents describe both the acceptable and unacceptable uses of the network. Network operators …


Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Finetuning Bert And Xlnet For Sentiment Analysis Of Stock Market Tweets Using Mixout And Dropout Regularization, Shubham Jangir Jan 2021

Finetuning Bert And Xlnet For Sentiment Analysis Of Stock Market Tweets Using Mixout And Dropout Regularization, Shubham Jangir

Dissertations

Sentiment analysis is also known as Opinion mining or emotional mining which aims to identify the way in which sentiments are expressed in text and written data. Sentiment analysis combines different study areas such as Natural Language Processing (NLP), Data Mining, and Text Mining, and is quickly becoming a key concern for businesses and organizations, especially as online commerce data is being used for analysis. Twitter is also becoming a popular microblogging and social networking platform today for information among people as they contribute their opinions, thoughts, and attitudes on social media platforms over the years. Because of the large …


Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet Dec 2020

Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet

Graduate Theses and Dissertations

In this dissertation, we present and analyze the technology used in the making of PPMExplorer: Search, Find, and Explore Pompeii. PPMExplorer is a software tool made with data extracted from the Pompei: Pitture e Mosaic (PPM) volumes. PPM is a valuable set of volumes containing 20,000 historical annotated images of the archaeological site of Pompeii, Italy accompanied by extensive captions. We transformed the volumes from paper, to digital, to searchable. PPMExplorer enables archaeologist researchers to conduct and check hypotheses on historical findings. We present a theory that such a concept is possible by leveraging computer generated correlations between artifacts using …


Understanding Event Structure In Text, Mohammed Aldawsari Oct 2020

Understanding Event Structure In Text, Mohammed Aldawsari

FIU Electronic Theses and Dissertations

Stories often appear in textual form, for example, news stories are found in the form of newspaper articles, blogs, or broadcast transcripts, and so forth. These contain descriptions of current, past, or future events. Automatically extracting knowledge from these events descriptions is an important natural language processing (NLP) task, and understanding event structure aids in this knowledge extraction. Event structure is the fact that events may have relationships or internal structure, for example, be in a co-reference relationship with another event mention, or composed of subevents.

Understanding event structure has received less attention in NLP than is due. This work …


Evaluating Bert Embeddings For Text Classification In Bio-Medical Domain To Determine Eligibility Of Patients In Clinical Trials, Saurabh Khodake Jan 2020

Evaluating Bert Embeddings For Text Classification In Bio-Medical Domain To Determine Eligibility Of Patients In Clinical Trials, Saurabh Khodake

Dissertations

Clinical Trials are studies conducted by researchers in order to assess the impact of new medicine in terms of its efficacy and most importantly safety on human health. For any advancement in the field of medicine it is very important that clinical trials are conducted with right ethics supported by scientific evidence. Not all people who volunteer or participate in clinical trials are allowed to undergo the trials. Age, comorbidity and other health issues present in a patient can be a major factor to decide whether the profile is suitable or not for the trial. Profiles selected for clinical trials …


Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed Jan 2020

Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed

Theses and Dissertations

Radiation oncology is the field of medicine that deals with treating cancer patients through ionizing radiation. The clinical modality or technique used to treat the cancer patients in the radiation oncology domain is referred to as radiation therapy. Radiation therapy aims to deliver precisely measured dose irradiation to a defined tumor volume (target) with as minimal damage as possible to surrounding healthy tissue (organs-at-risk), resulting in eradication of the tumor, high quality of life, and prolongation of survival. A typical radiotherapy process requires the use of different clinical systems at various stages of the workflow. The data generated in these …


Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …


Aspect And Opinion Aware Abstractive Review Summarization With Reinforced Hard Typed Decoder, Yufei Tian, Jianfei Yu, Jing Jiang Nov 2019

Aspect And Opinion Aware Abstractive Review Summarization With Reinforced Hard Typed Decoder, Yufei Tian, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we study abstractive review summarization. Observing that review summaries often consist of aspect words, opinion words and context words, we propose a two-stage reinforcement learning approach, which first predicts the output word type from the three types, and then leverages the predicted word type to generate the final word distribution. Experimental results on two Amazon product review datasets demonstrate that our method can consistently outperform several strong baseline approaches based on ROUGE scores.


Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels Jan 2019

Finding Truth In Fake News: Reverse Plagiarism And Other Models Of Classification, Matthew Przybyla, David Tran, Amber Whelpley, Daniel W. Engels

SMU Data Science Review

As the digital age creates new ways of spreading news, fake stories are propagated to widen audiences. A majority of people obtain both fake and truthful news without knowing which is which. There is not currently a reliable and efficient method to identify “fake news”. Several ways of detecting fake news have been produced, but the various algorithms have low accuracy of detection and the definition of what makes a news item ‘fake’ remains unclear. In this paper, we propose a new method of detecting on of fake news through comparison to other news items on the same topic, as …


Distance,Time And Terms In First Story Detection, Fei Wang Jan 2019

Distance,Time And Terms In First Story Detection, Fei Wang

Doctoral

First Story Detection (FSD) is an important application of online novelty detection within Natural Language Processing (NLP). Given a stream of documents, or stories, about news events in a chronological order, the goal of FSD is to identify the very first story for each event. While a variety of NLP techniques have been applied to the task, FSD remains challenging because it is still not clear what is the most crucial factor in defining the “story novelty”. Giventhesechallenges,thethesisaddressedinthisdissertationisthat the notion of novelty in FSD is multi-dimensional. To address this, the work presented has adopted a three dimensional analysis of the …


An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari Jan 2019

An Evaluation Of Learning Employing Natural Language Processing And Cognitive Load Assessment, Mrunal Tipari

Dissertations

One of the key goals of Pedagogy is to assess learning. Various paradigms exist and one of this is Cognitivism. It essentially sees a human learner as an information processor and the mind as a black box with limited capacity that should be understood and studied. With respect to this, an approach is to employ the construct of cognitive load to assess a learner's experience and in turn design instructions better aligned to the human mind. However, cognitive load assessment is not an easy activity, especially in a traditional classroom setting. This research proposes a novel method for evaluating learning …


A Tree-Based Approach For English-To-Turkish Translation, Özge Bakay, Begüm Avar, Olcay Taner Yildiz Jan 2019

A Tree-Based Approach For English-To-Turkish Translation, Özge Bakay, Begüm Avar, Olcay Taner Yildiz

Turkish Journal of Electrical Engineering and Computer Sciences

In this paper, we present our English-to-Turkish translation methodology, which adopts a tree-based approach. Our approach relies on tree analysis and the application of structural modification rules to get the target side (Turkish) trees from source side (English) ones. We also use morphological analysis to get candidate root words and apply tree-based rules to obtain the agglutinated target words. Compared to earlier work on English-to-Turkish translation using phrase-based models, we have been able to obtain higher BLEU scores in our current study. Our syntactic subtree permutation strategy, combined with a word replacement algorithm, provides a 67 % relative improvement from …


Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu Dec 2018

Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu

Dissertations and Theses Collection (Open Access)

In the past few decades, supervised machine learning approach is one of the most important methodologies in the Natural Language Processing (NLP) community. Although various kinds of supervised learning methods have been proposed to obtain the state-of-the-art performance across most NLP tasks, the bottleneck of them lies in the heavy reliance on the large amount of manually annotated data, which is not always available in our desired target domain/task. To alleviate the data sparsity issue in the target domain/task, an attractive solution is to find sufficient labeled data from a related source domain/task. However, for most NLP applications, due to …


A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne Jan 2018

A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne

Browse all Theses and Dissertations

Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 …


An Application Of Natural Language Processing For Triangulation Of Cognitive Load Assessments In Third Level Education, Luis Alfredo Contreras Jan 2018

An Application Of Natural Language Processing For Triangulation Of Cognitive Load Assessments In Third Level Education, Luis Alfredo Contreras

Dissertations

Work has been done to measure Mental Workload based on applications mainly related to ergonomics, human factors, and Machine Learning. The influence of Machine Learning is a reflection of an increased use of new technologies applied to areas conventionally dominated by theoretical approaches. However, collaboration between MWL and Natural Language Processing techniques seems to happen rarely. In this sense, the objective of this research is to make use of Natural Languages Processing techniques to contribute to the analysis of the relationship between Mental Workload subjective measures and Relative Frequency Ratios of keywords gathered during pre-tasks and post-tasks of MWL activities …


Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra Jan 2018

Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra

Browse all Theses and Dissertations

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of …


Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu Dec 2017

Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu

Master's Theses

Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create …


Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang Sep 2017

Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang

Electronic Theses, Projects, and Dissertations

Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program …


Parsing Metamap Files In Hadoop, Amy Olex, Alberto Cano, Bridget T. Mcinnes Jan 2017

Parsing Metamap Files In Hadoop, Amy Olex, Alberto Cano, Bridget T. Mcinnes

Computer Science Publications

The UMLS::Association CUICollector module identifies UMLS Concept Unique Identifier bigrams and their frequencies in a biomedical text corpus. CUICollector was re-implemented in Hadoop MapReduce to improve algorithm speed, flexibility, and scalability. Evaluation of the Hadoop implementation compared to the serial module produced equivalent results and achieved a 28x speedup on a single-node Hadoop system.


The Evaluation Of Ensemble Sentiment Classification Approach On Airline Services Using Twitter, Zechen Wang Jan 2017

The Evaluation Of Ensemble Sentiment Classification Approach On Airline Services Using Twitter, Zechen Wang

Dissertations

In the field of sentiment classification, much research has been done on reviews of topics such as movies, software and books. Little research has been done in the airline service domain. In the airline industry, the use of social media as a customer service tool has become a growing phenomenon. The research conducted by Wan and Gao (2015) has proposed an ensemble classification approach for airline service sentiment classification using Twitter data. In accordance, the objective of improving the performance of ensemble classification approach is the primary consideration. This research proposed new hybrid classification approach that uses the state-of-art approach …


Using Natural Language Processing And Machine Learning Techniques To Characterize Configuration Bug Reports: A Study, Wei Wen Jan 2017

Using Natural Language Processing And Machine Learning Techniques To Characterize Configuration Bug Reports: A Study, Wei Wen

Theses and Dissertations--Computer Science

In this study, a tool is developed that achieves two purposes: (1) given bug reports, it identifies configuration bug reports from non-configuration bug reports; (2) once a bug report is identified to be a configuration bug report, the tool finds out what specific configuration option the bug report is associated.

This study starts with a review of related works that used machine learning tools to solve software bug and bug report related issues. It then discusses the natural language processing and machine learning techniques. Afterwards, the development process of the proposed tool is described in detail, including the motivation, the …