Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Theses/Dissertations

Natural Language Processing

Discipline
Institution
Publication Year
Publication

Articles 1 - 30 of 39

Full-Text Articles in Engineering

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta Jan 2024

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta

Theses and Dissertations--Computer Science

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and …


Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim Dec 2023

Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim

Theses and Dissertations

Transformer Neural Networks have emerged as the predominant architecture for addressing a wide range of Natural Language Processing (NLP) applications such as machine translation, speech recognition, sentiment analysis, text anomaly detection, etc. This noteworthy achievement of Transformer Neural Networks in the NLP field has sparked a growing interest in integrating and utilizing Transformer models in computer vision tasks. The Vision Transformer (ViT) model efficiently captures long-range dependencies by employing a self-attention mechanism to transform different image data into meaningful, significant representations. Recently, the Vision Transformer (ViT) has exhibited incredible performance in solving image classification problems by utilizing ViT models, thereby …


Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray Dec 2023

Developing A Flexible System For A Friendly Robot To Ease Dementia (Fred) Using Cloud Technologies And Software Design Patterns, Robert James Bray

Masters Theses

In this work, we designed two prototypes for a friendly robot to ease dementia (FRED). This affordable social robot is designed to provide company to older adults with cognitive decline, create reminders for important events and tasks, like taking medication, and providing cognitive stimulus through games. This project combines several cloud technologies including speech-to-text, cloud data storage, and chat generation in order to provide high level interactions with a social robot. Software design patterns were employed in the creation of the software to produce flexible code base that can sustain platform changes easily, including the framework used for the graphical …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


Extracting A Body Of Knowledge As A First Step Towards Defining A United Software Engineering Curriculum Guideline, Anton Kiselev Apr 2023

Extracting A Body Of Knowledge As A First Step Towards Defining A United Software Engineering Curriculum Guideline, Anton Kiselev

Doctoral Dissertations and Master's Theses

In general, the computing field is a rapidly changing environment, and as such, software engineering education must be able to adjust quickly to new needs. Industry adapts to technologies as fast as it can, but the critical issue is a need for recent graduates with the necessary expertise and knowledge of new trends, technologies, and practical experience. The industries that employ graduates of computing degree programs aim to hire those who are familiar with the latest technical traits, tools, and methodologies to meet these needs, and the software engineering curriculum needs to respond quickly to these needs. Still, unfortunately, software …


Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul Mar 2023

Analysis And Usage Of Natural Language Features In Success Prediction Of Legislative Testimonies, Marine Cossoul

Master's Theses

Committee meetings are a fundamental part of the legislative process in which
constituents, lobbyists, and legislators alike can speak on proposed bills at the
local and state level. Oftentimes, unspoken “rules” or standards are at play in
political processes that can influence the trajectory of a bill, leaving constituents
without a political background at an inherent disadvantage when engaging with
the legislative process. The work done in this thesis aims to explore the extent to
which the language and phraseology of a general public testimony can influence a
vote, and examine how this information can be used to promote civic …


Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli Jun 2022

Improving Relation Extraction From Unstructured Genealogical Texts Using Fine-Tuned Transformers, Carloangello Parrolivelli

Master's Theses

Though exploring one’s family lineage through genealogical family trees can be insightful to developing one’s identity, this knowledge is typically held behind closed doors by private companies or require expensive technologies, such as DNA testing, to uncover. With the ever-booming explosion of data on the world wide web, many unstructured text documents, both old and new, are being discovered, written, and processed which contain rich genealogical information. With access to this immense amount of data, however, entails a costly process whereby people, typically volunteers, have to read large amounts of text to find relationships between people. This delays having genealogical …


Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi Jan 2022

Improving Network Policy Enforcement Using Natural Language Processing And Programmable Networks, Pinyi Shi

Theses and Dissertations--Computer Science

Computer networks are becoming more complex and challenging to operate, manage, and protect. As a result, Network policies that define how network operators should manage the network are becoming more complex and nuanced. Unfortunately, network policies are often an undervalued part of network design, leaving network operators to guess at the intent of policies that are written and fill in the gaps where policies don’t exist. Organizations typically designate Policy Committees to write down the network policies in the policy documents using high-level natural languages. The policy documents describe both the acceptable and unacceptable uses of the network. Network operators …


Smart Chatbot For User Authentication, Peter Voege Dec 2021

Smart Chatbot For User Authentication, Peter Voege

Electronic Thesis and Dissertation Repository

The field of authentication has a lot of room to develop in the age of big data and machine learning. Conventional high-accessibility authentication mechanisms including passwords or security questions struggle with critical vulnerabilities, creating a need for alternative authentication mechanisms able to cover said weaknesses.

We sought to create an authentication mechanism that creates dynamic, ever-changing security questions only the user can answer while remaining intuitive to use and as accessible as typical security questions by creating an authentication chatbot that leverages big data and natural language processing to pose dynamic authentication challenges.

We tested the components of our design …


Using Custom Ner Models To Extract Dod Specific Entities From Contracts, Kayla P. Haberstich Dec 2021

Using Custom Ner Models To Extract Dod Specific Entities From Contracts, Kayla P. Haberstich

Theses and Dissertations

The Air Force Sustainment Center collected 3.7 million contracts onto the Air Force Research Laboratory’s high power computers. They are in the format of a .pdf or scanned document, making them unstructured data. The Data Analytics Resource Team extracted the documents into a textual format for use in further analysis. This thesis looks to extract four DOD specific entities (NSN, Part Number, CAGE Code, and Supplier Name) from the contracts using custom NER models. This newly extracted information will allow the Air Force to identify what parts are supplied by which vendors. This information along with historical CLIN pricing for …


Evaluation Of Patient Experience Using Natural Language Processing Algorithms, Sofia Veronica Ortega Aug 2021

Evaluation Of Patient Experience Using Natural Language Processing Algorithms, Sofia Veronica Ortega

Open Access Theses & Dissertations

INTRODUCTION: Healthcare organizations are making extensive efforts to improve the patient experience. Enhancing patient/client experience and outcomes is crucial for patient-centered care and can reveal improvement opportunities. Healthcare settings currently rely on surveys (e.g., HCAHPS) and patient feedback to measure patient experience. Studies have identified that utilizing patient journey mapping can better capture patient experience throughout all stages of the patient's journey and provide quality and process improvement recommendations at specific hotspots. However, these measurement techniques are time-consuming and resource intensive. AIM: This research aims to measure patient experience of breast cancer patients from social media data using natural language …


Exploring Hidden Networks Yields Important Insights In Disparate Fields Of Study, Laurence Clarfeld Jan 2021

Exploring Hidden Networks Yields Important Insights In Disparate Fields Of Study, Laurence Clarfeld

Graduate College Dissertations and Theses

Network science captures a broad range of problems related to things (nodes) and relationships between them (edges). This dissertation explores real-world network problems in disparate domain applications where exploring less obvious "hidden networks" reveals important dynamics of the original network.

The power grid is an explicit network of buses (e.g., generators) connected by branches (e.g., transmission lines). In rare cases, if k branches (a k-set) fail simultaneously, a cascading blackout may ensue; we refer to such k-sets as "defective". We calculate system risk of cascading failure due to defective 2-sets and 3-sets in synthetic test cases of the Polish and …


Model-Based Approach For Product Requirement Representation And Generation In Product Lifecycle Management, Omer Yaman Dec 2020

Model-Based Approach For Product Requirement Representation And Generation In Product Lifecycle Management, Omer Yaman

Dissertations - ALL

The requirement specification is an official documentation activity, which is a collection of certain information to specify the product and its life-cycle activities in terms of functions, features, performance, constraints, production, maintenance, disposal process, etc. It contains mainly two phases; product requirement generation and representation. Appropriate criteria for the product design and further life-cycle activities are determined based on the requirement specification as well as the interrelations of product requirements with other life-cycle information such as; materials, manufacturing, working environments, finance, and regulations. The determination of these criteria is normally error-prone. It is difficult to identify and maintain the completeness …


Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet Dec 2020

Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet

Graduate Theses and Dissertations

In this dissertation, we present and analyze the technology used in the making of PPMExplorer: Search, Find, and Explore Pompeii. PPMExplorer is a software tool made with data extracted from the Pompei: Pitture e Mosaic (PPM) volumes. PPM is a valuable set of volumes containing 20,000 historical annotated images of the archaeological site of Pompeii, Italy accompanied by extensive captions. We transformed the volumes from paper, to digital, to searchable. PPMExplorer enables archaeologist researchers to conduct and check hypotheses on historical findings. We present a theory that such a concept is possible by leveraging computer generated correlations between artifacts using …


Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy Nov 2020

Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy

College of Computing and Digital Media Dissertations

Extracting scientific facts from unstructured text is difficult due to challenges specific to the ambiguity of the language, the complexity of the scientific named entities and relations to be extracted. This problem is well illustrated through the extraction of polymer names and their properties. Even in the cases where the property is a temperature, identifying the polymer name associated with the temperature may require expertise due to the use of acronyms, synonyms, complicated naming conventions and by the fact that new polymer names are being “introduced” to the vernacular as polymer science advances. While there exist domain-specific machine learning toolkits …


Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed Jan 2020

Data Science Methods For Standardization, Safety, And Quality Assurance In Radiation Oncology, Khajamoinuddin Syed

Theses and Dissertations

Radiation oncology is the field of medicine that deals with treating cancer patients through ionizing radiation. The clinical modality or technique used to treat the cancer patients in the radiation oncology domain is referred to as radiation therapy. Radiation therapy aims to deliver precisely measured dose irradiation to a defined tumor volume (target) with as minimal damage as possible to surrounding healthy tissue (organs-at-risk), resulting in eradication of the tumor, high quality of life, and prolongation of survival. A typical radiotherapy process requires the use of different clinical systems at various stages of the workflow. The data generated in these …


Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …


An Application Of Clustering And Cluster Update Methods To Boiler Sensor Prediction And Case-Based-Reasoning To Boiler Repair, Timothy Edward Rooney Dec 2019

An Application Of Clustering And Cluster Update Methods To Boiler Sensor Prediction And Case-Based-Reasoning To Boiler Repair, Timothy Edward Rooney

Theses and Dissertations

Driven by demand from both consumers and manufacturers alike, Internet of Things (IoT)

capabilities are being built into more products. Consumers want more control and access to their

devices, while manufacturers can find data gathered from IoT-capable products invaluable. In

this thesis, we use data from a growing fleet of IoT-connected boilers in the residential, lightcommercial, and medium-commercial ranges to demonstrate a framework for cluster initialization

and updating. We compare two methods of dynamically updating clusters: a sequential method

inspired by sequential K-means clustering and a cohesion-based method called DYNC. A predictive

artificial neural network system demonstrates the effectiveness of …


A Machine Learning Approach To Predicting Alcohol Consumption In Adolescents From Historical Text Messaging Data, Adrienne Bergh May 2019

A Machine Learning Approach To Predicting Alcohol Consumption In Adolescents From Historical Text Messaging Data, Adrienne Bergh

Computational and Data Sciences (MS) Theses

Techniques based on artificial neural networks represent the current state-of-the-art in machine learning due to the availability of improved hardware and large data sets. Here we employ doc2vec, an unsupervised neural network, to capture the semantic content of text messages sent by adolescents during high school, and encode this semantic content as numeric vectors. These vectors effectively condense the text message data into highly leverageable inputs to a logistic regression classifier in a matter of hours, as compared to the tedious and often quite lengthy task of manually coding data. Using our machine learning approach, we are able to train …


Framework For Validation Of Different Media Types Using A Model Based On Consensus, Adilahmed Patel Jan 2019

Framework For Validation Of Different Media Types Using A Model Based On Consensus, Adilahmed Patel

All ETDs from UAB

Increasingly people of all ages consume news and entertainment through electronic media and social media [1]. As the internet in general and social media in particular are a recent phenomenon, the laws governing them and the technologies to monitor them are still evolving. There is a general consensus on the ubiquity and power of these media, hence the worry of how to handle these media. In this context, validating the online content becomes of paramount importance. Fighting fake content is not only relevant to news and current affairs but is very useful in other areas like technical content, legal content …


Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu Dec 2018

Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu

Dissertations and Theses Collection (Open Access)

In the past few decades, supervised machine learning approach is one of the most important methodologies in the Natural Language Processing (NLP) community. Although various kinds of supervised learning methods have been proposed to obtain the state-of-the-art performance across most NLP tasks, the bottleneck of them lies in the heavy reliance on the large amount of manually annotated data, which is not always available in our desired target domain/task. To alleviate the data sparsity issue in the target domain/task, an attractive solution is to find sufficient labeled data from a related source domain/task. However, for most NLP applications, due to …


A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne Jan 2018

A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne

Browse all Theses and Dissertations

Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 …


Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra Jan 2018

Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra

Browse all Theses and Dissertations

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of …


Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu Dec 2017

Genealogy Extraction And Tree Generation From Free Form Text, Timothy Sui-Tim Chu

Master's Theses

Genealogical records play a crucial role in helping people to discover their lineage and to understand where they come from. They provide a way for people to celebrate their heritage and to possibly reconnect with family they had never considered. However, genealogical records are hard to come by for ordinary people since their information is not always well established in known databases. There often is free form text that describes a person’s life, but this must be manually read in order to extract the relevant genealogical information. In addition, multiple texts may have to be read in order to create …


Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang Sep 2017

Natural Language Processing Based Generator Of Testing Instruments, Qianqian Wang

Electronic Theses, Projects, and Dissertations

Natural Language Processing (NLP) is the field of study that focuses on the interactions between human language and computers. By “natural language” we mean a language that is used for everyday communication by humans. Different from programming languages, natural languages are hard to be defined with accurate rules. NLP is developing rapidly and it has been widely used in different industries. Technologies based on NLP are becoming increasingly widespread, for example, Siri or Alexa are intelligent personal assistants using NLP build in an algorithm to communicate with people. “Natural Language Processing Based Generator of Testing Instruments” is a stand-alone program …


Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu Aug 2017

Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu

Theses and Dissertations

Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to …


Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri Jan 2017

Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri

Browse all Theses and Dissertations

The success of any multiplayer game depends on the player’s experience. Cheating/Hacking undermines the player’s experience and thus the success of that game. Cheaters, who use hacks, bots or trainers are ruining the gaming experience of a player and are making him leave the game. As the video game industry is a constantly increasing multibillion dollar economy, it is crucial to assure and maintain a state of security. Players reflect their gaming experience in one of the following places: multiplayer chat, game reviews, and social media. This thesis is an exploratory study where our goal is to experiment and propose …


Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna Jan 2017

Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna

Browse all Theses and Dissertations

The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the …


A Framework For Social Network Sentiment Analysis Using Big Data Analytics, Bharat Sri Harsha Karpurapu Jan 2017

A Framework For Social Network Sentiment Analysis Using Big Data Analytics, Bharat Sri Harsha Karpurapu

All ETDs from UAB

The primary research of this thesis focused on the development of a Big Data framework for performing sentiment analysis on social networking sites. Over the last decade, social media has been gaining lots of popularity for sharing thoughts and feelings with a user base of over two billion users. Social networking sites such as Twitter, Facebook, and Instagram are increasingly becoming huge repositories of thoughts and opinions on a wide variety of topics. Several public and private organizations, such as Government and companies are attempting to exploit the expressed preferences, opinions, and attitudes regarding politics, commercial products and other matters …