Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 18 of 18

Full-Text Articles in Engineering

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta Jan 2024

Language Models For Rare Disease Information Extraction: Empirical Insights And Model Comparisons, Shashank Gupta

Theses and Dissertations--Computer Science

End-to-end relation extraction (E2ERE) is a crucial task in natural language processing (NLP) that involves identifying and classifying semantic relationships between entities in text. This thesis compares three paradigms for end-to-end relation extraction (E2ERE) in biomedicine, focusing on rare diseases with discontinuous and nested entities. We evaluate Named Entity Recognition (NER) to Relation Extraction (RE) pipelines, sequence-to-sequence models, and generative pre-trained transformer (GPT) models using the RareDis information extraction dataset. Our findings indicate that pipeline models are the most effective, followed closely by sequence-to-sequence models. GPT models, despite having eight times as many parameters, perform worse than sequence-to-sequence models and …


Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim Dec 2023

Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim

Theses and Dissertations

Transformer Neural Networks have emerged as the predominant architecture for addressing a wide range of Natural Language Processing (NLP) applications such as machine translation, speech recognition, sentiment analysis, text anomaly detection, etc. This noteworthy achievement of Transformer Neural Networks in the NLP field has sparked a growing interest in integrating and utilizing Transformer models in computer vision tasks. The Vision Transformer (ViT) model efficiently captures long-range dependencies by employing a self-attention mechanism to transform different image data into meaningful, significant representations. Recently, the Vision Transformer (ViT) has exhibited incredible performance in solving image classification problems by utilizing ViT models, thereby …


Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju Apr 2023

Quantification Of Various Types Of Biases In Large Language Models, Sudhashree Sayenju

Doctor of Data Science and Analytics Dissertations

Natural Language Processing (NLP) systems are included everywhere on the internet from search engines, language translations to more advanced systems like voice assistant and customer service. Since humans are always on the receiving end of NLP technologies, it is very important to analyze whether or not the Large Language Models (LLMs) in use have bias and are therefore unfair. The majority of the research in NLP bias has focused on societal stereotype biases embedded in LLMs. However, our research focuses on all types of biases, namely model class level bias, stereotype bias and domain bias present in LLMs. Model class …


Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman Jan 2023

Effective Systems For Insider Threat Detection, Muhanned Qasim Jabbar Alslaiman

Browse all Theses and Dissertations

Insider threats to information security have become a burden for organizations. Understanding insider activities leads to an effective improvement in identifying insider attacks and limits their threats. This dissertation presents three systems to detect insider threats effectively. The aim is to reduce the false negative rate (FNR), provide better dataset use, and reduce dimensionality and zero padding effects. The systems developed utilize deep learning techniques and are evaluated using the CERT 4.2 dataset. The dataset is analyzed and reformed so that each row represents a variable length sample of user activities. Two data representations are implemented to model extracted features …


Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh Jan 2022

Novel Natural Language Processing Models For Medical Terms And Symptoms Detection In Twitter, Farahnaz Golrooy Motlagh

Browse all Theses and Dissertations

This dissertation focuses on disambiguation of language use on Twitter about drug use, consumption types of drugs, drug legalization, ontology-enhanced approaches, and prediction analysis of data-driven by developing novel NLP models. Three technical aims comprise this work: (a) leveraging pattern recognition techniques to improve the quality and quantity of crawled Twitter posts related to drug abuse; (b) using an expert-curated, domain-specific DsOn ontology model that improve knowledge extraction in the form of drug-to-symptom and drug-to-side effect relations; and (c) modeling the prediction of public perception of the drug’s legalization and the sentiment analysis of drug consumption on Twitter. We collected …


Exploring Hidden Networks Yields Important Insights In Disparate Fields Of Study, Laurence Clarfeld Jan 2021

Exploring Hidden Networks Yields Important Insights In Disparate Fields Of Study, Laurence Clarfeld

Graduate College Dissertations and Theses

Network science captures a broad range of problems related to things (nodes) and relationships between them (edges). This dissertation explores real-world network problems in disparate domain applications where exploring less obvious "hidden networks" reveals important dynamics of the original network.

The power grid is an explicit network of buses (e.g., generators) connected by branches (e.g., transmission lines). In rare cases, if k branches (a k-set) fail simultaneously, a cascading blackout may ensue; we refer to such k-sets as "defective". We calculate system risk of cascading failure due to defective 2-sets and 3-sets in synthetic test cases of the Polish and …


Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet Dec 2020

Ppmexplorer: Using Information Retrieval, Computer Vision And Transfer Learning Methods To Index And Explore Images Of Pompeii, Cindy Roullet

Graduate Theses and Dissertations

In this dissertation, we present and analyze the technology used in the making of PPMExplorer: Search, Find, and Explore Pompeii. PPMExplorer is a software tool made with data extracted from the Pompei: Pitture e Mosaic (PPM) volumes. PPM is a valuable set of volumes containing 20,000 historical annotated images of the archaeological site of Pompeii, Italy accompanied by extensive captions. We transformed the volumes from paper, to digital, to searchable. PPMExplorer enables archaeologist researchers to conduct and check hypotheses on historical findings. We present a theory that such a concept is possible by leveraging computer generated correlations between artifacts using …


Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy Nov 2020

Ensemble Labeling Towards Scientific Information Extraction (Elsie), Erin Murphy

College of Computing and Digital Media Dissertations

Extracting scientific facts from unstructured text is difficult due to challenges specific to the ambiguity of the language, the complexity of the scientific named entities and relations to be extracted. This problem is well illustrated through the extraction of polymer names and their properties. Even in the cases where the property is a temperature, identifying the polymer name associated with the temperature may require expertise due to the use of acronyms, synonyms, complicated naming conventions and by the fact that new polymer names are being “introduced” to the vernacular as polymer science advances. While there exist domain-specific machine learning toolkits …


Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …


Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu Dec 2018

Feature-Based Transfer Learning In Natural Language Processing, Jianfei Yu

Dissertations and Theses Collection (Open Access)

In the past few decades, supervised machine learning approach is one of the most important methodologies in the Natural Language Processing (NLP) community. Although various kinds of supervised learning methods have been proposed to obtain the state-of-the-art performance across most NLP tasks, the bottleneck of them lies in the heavy reliance on the large amount of manually annotated data, which is not always available in our desired target domain/task. To alleviate the data sparsity issue in the target domain/task, an attractive solution is to find sufficient labeled data from a related source domain/task. However, for most NLP applications, due to …


A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne Jan 2018

A Framework To Understand Emoji Meaning: Similarity And Sense Disambiguation Of Emoji Using Emojinet, Sanjaya Wijeratne

Browse all Theses and Dissertations

Pictographs, commonly referred to as `emoji’, have become a popular way to enhance electronic communications. They are an important component of the language used in social media. With their introduction in the late 1990’s, emoji have been widely used to enhance the sentiment, emotion, and sarcasm expressed in social media messages. They are equally popular across many social media sites including Facebook, Instagram, and Twitter. In 2015, Instagram reported that nearly half of the photo comments posted on Instagram contain emoji, and in the same year, Twitter reported that the `face with tears of joy’ emoji has been tweeted 6.6 …


Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra Jan 2018

Using Natural Language Processing And Machine Learning For Analyzing Clinical Notes In Sickle Cell Disease Patients, Shufa Khizra

Browse all Theses and Dissertations

Sickle Cell Disease (SCD) is a hereditary disorder in red blood cells that can lead to excruciating pain episodes. SCD causes the normal red blood cells to distort its shape and turn into sickle shape. The distorted shape makes the hemoglobin inflexible and stick to the walls of the vessels thereby obstructing the free flow of blood and eventually making the tissues suffer from lack of oxygen. The lack of oxygen causes serious problems including Acute Chest Syndrome (ACS), stroke, infection, organ damage, and over the lifetime an SCD can harm a persons spleen, brain, kidneys, eyes, bones. Sickling of …


Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu Aug 2017

Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu

Theses and Dissertations

Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to …


Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri Jan 2017

Multi-Class Classification Of Textual Data: Detection And Mitigation Of Cheating In Massively Multiplayer Online Role Playing Games, Naga Sai Nikhil Maguluri

Browse all Theses and Dissertations

The success of any multiplayer game depends on the player’s experience. Cheating/Hacking undermines the player’s experience and thus the success of that game. Cheaters, who use hacks, bots or trainers are ruining the gaming experience of a player and are making him leave the game. As the video game industry is a constantly increasing multibillion dollar economy, it is crucial to assure and maintain a state of security. Players reflect their gaming experience in one of the following places: multiplayer chat, game reviews, and social media. This thesis is an exploratory study where our goal is to experiment and propose …


Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna Jan 2017

Semantics-Based Summarization Of Entities In Knowledge Graphs, Kalpa Gunaratna

Browse all Theses and Dissertations

The processing of structured and semi-structured content on the Web has been gaining attention with the rapid progress in the Linking Open Data project and the development of commercial knowledge graphs. Knowledge graphs capture domain-specific or encyclopedic knowledge in the form of a data layer and add rich and explicit semantics on top of the data layer to infer additional knowledge. The data layer of a knowledge graph represents entities and their descriptions. The semantic layer on top of the data layer is called the schema (ontology), where relationships of the entity descriptions, their classes, and the hierarchy of the …


An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler Dec 2014

An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler

University of New Orleans Theses and Dissertations

This thesis performs an empirical analysis of Word2Vec by comparing its output to WordNet, a well-known, human-curated lexical database. It finds that Word2Vec tends to uncover more of certain types of semantic relations than others -- with Word2Vec returning more hypernyms, synonomyns and hyponyms than hyponyms or holonyms. It also shows the probability that neighbors separated by a given cosine distance in Word2Vec are semantically related in WordNet. This result both adds to our understanding of the still-unknown Word2Vec and helps to benchmark new semantic tools built from word vectors.


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …


Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills Jan 2013

Natural Language Document And Event Association Using Stochastic Petri Net Modeling, Michael Thomas Mills

Browse all Theses and Dissertations

The purpose of this research is to design and implement a new methodology that captures the natural language understanding of events from English natural language text and model it using Stochastic Petri Nets. To establish a baseline of recent natural language processing (NLP) and understanding (NLU) research, two surveys are presented. One is a general survey in NLP and NLU methodologies for processing multi-documents. It summarizes and presents methodologies in terms of their features, capabilities, and maturity. The second survey focuses on graph-based methods for NL text processing and understanding and analyzes them in terms of their functional descriptions, capabilities …