Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 20 of 20

Full-Text Articles in Physical Sciences and Mathematics

Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim Dec 2023

Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim

Theses and Dissertations

Transformer Neural Networks have emerged as the predominant architecture for addressing a wide range of Natural Language Processing (NLP) applications such as machine translation, speech recognition, sentiment analysis, text anomaly detection, etc. This noteworthy achievement of Transformer Neural Networks in the NLP field has sparked a growing interest in integrating and utilizing Transformer models in computer vision tasks. The Vision Transformer (ViT) model efficiently captures long-range dependencies by employing a self-attention mechanism to transform different image data into meaningful, significant representations. Recently, the Vision Transformer (ViT) has exhibited incredible performance in solving image classification problems by utilizing ViT models, thereby …


Extracting Patterns Of Semantic Roles From Accident Narratives, Soundarya Jayakumar May 2023

Extracting Patterns Of Semantic Roles From Accident Narratives, Soundarya Jayakumar

Theses and Dissertations

Accident databases are filled with rich information about accidents. Analyzing these datasets can reveal useful information which can be used to prevent similar accidents in the future. Policy makers, and safety management organizations can design appropriate measures based on the analysis done to prevent accidents. Besides structured data, crash reports include natural language narratives which contain valuable accident-related information which is otherwise not present in the structured data. Using natural language processing (NLP) techniques one can analyze these narratives and mine hidden patterns of accidents from them. The thesis focuses on developing an algorithm to extract common patterns of semantic …


Emotion Classification And Intensity Prediction On Tweets, Sharath Chander Pugazhenthi May 2023

Emotion Classification And Intensity Prediction On Tweets, Sharath Chander Pugazhenthi

Theses and Dissertations

The task of finding an emotion associated with the text from individuals on a social media platform has become very crucial as it influences the current state of mind of a particular individual in real life. It also helps one to understand social behavior at a given point in time. Microblogging platforms like Twitter serves as a powerful tool for expressing one’s thoughts. Several work have been done in classifying the emotion associated with it. The thesis comprises of a system that first classifies the tweet into one of the four emotions - anger, joy, sadness, and fear with good …


Learning Analytics Through Machine Learning And Natural Language Processing, Bokai Yang Apr 2023

Learning Analytics Through Machine Learning And Natural Language Processing, Bokai Yang

Theses and Dissertations

The increase of computing power and the ability to log students’ data with the help of the computer-assisted learning systems has led to an increased interest in developing and applying computer science techniques for analyzing learning data. To understand and investigate how learning-generated data can be used to improve student success, data mining techniques have been applied to several educational tasks. This dissertation investigates three important tasks in various domains of educational data mining: learners’ behavior analysis, essay structure analysis and feedback providing, and learners’ dropout prediction. The first project applied latent semantic analysis and machine learning approaches to investigate …


A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran Jan 2022

A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran

Theses and Dissertations

Relation Extraction (RE) is a task of Natural Language Processing (NLP) to detect and classify the relations between two entities. Relation extraction in the biomedical and scientific literature domain is challenging as text can contain multiple pairs of entities in the same instance. During the course of this research, we developed an RE framework (RelEx), which consists of five main RE paradigms: rule-based, machine learning-based, Convolutional Neural Network (CNN)-based, Bidirectional Encoder Representations from Transformers (BERT)-based, and Graph Convolutional Networks (GCNs)-based approaches. RelEx's rule-based approach uses co-location information of the entities to determine whether a relation exists between a selected entity …


Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch Dec 2021

Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch

Theses and Dissertations

Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.

A variety of machine learning models combined with different features and text processingare tested against training data that mentions …


Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi Jan 2021

Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi

Theses and Dissertations

Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be …


Use Of Text Data In Identifying And Prioritizing Potential Drug Repositioning Candidates, Majid Rastegar-Mojarad May 2019

Use Of Text Data In Identifying And Prioritizing Potential Drug Repositioning Candidates, Majid Rastegar-Mojarad

Theses and Dissertations

New drug development costs between 500 million and 2 billion dollars and takes 10-15 years, with a success rate of less than 10%. Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. In the period 2007-2009, drug repurposing led to the launching of 30-40% of new drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates with significantly …


An Instruction Embedding Model For Binary Code Analysis, Kimberly Michelle Redmond Apr 2019

An Instruction Embedding Model For Binary Code Analysis, Kimberly Michelle Redmond

Theses and Dissertations

Binary code analysis is important for understanding programs without access to the original source code, which is common with proprietary software. Analyzing binaries can be challenging given their high variability: due to growth in tech manufactur- ers, source code is now frequently compiled for multiple instruction set architectures (ISAs); however, there is no formal dictionary that translates between their assem- bly languages. The difficulty of analysis is further compounded by different compiler optimizations and obfuscated malware signatures. Such minutiae means that some vulnerabilities may only be detectable on a fine-grained level. Recent strides in ma- chine learning—particularly in Natural Language …


Curtus: An Nlp Tool To Map Job Skills To Academic Courses, Daniel Rockwell Jan 2019

Curtus: An Nlp Tool To Map Job Skills To Academic Courses, Daniel Rockwell

Theses and Dissertations

Many businesses are burdened with the need to train students for the job instead of finding them prepared for it. Few business leaders feel that colleges prepare students for future jobs from day one. It can be a challenge for colleges to determine if their curricula meet the industry needs. Mapping industry needs to academic courses can be advantageous to both parties as it will allow colleges to be aligned with the industry needs and accordingly satisfy those needs and will allow the industry to hire better prepared graduates. In an attempt to address this, a system prototype that uses …


Indirect Relatedness, Evaluation, And Visualization For Literature Based Discovery, Sam Henry Jan 2019

Indirect Relatedness, Evaluation, And Visualization For Literature Based Discovery, Sam Henry

Theses and Dissertations

The exponential growth of scientific literature is creating an increased need for systems to process and assimilate knowledge contained within text. Literature Based Discovery (LBD) is a well established field that seeks to synthesize new knowledge from existing literature, but it has remained primarily in the theoretical realm rather than in real-world application. This lack of real-world adoption is due in part to the difficulty of LBD, but also due to several solvable problems present in LBD today. Of these problems, the ones in most critical need of improvement are: (1) the over-generation of knowledge by LBD systems, (2) a …


Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi Jan 2019

Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi

Theses and Dissertations

Both expert and novice software developers frequently access software development resources available on the Web in order to lookup or learn new APIs, tools and techniques. Software quality is affected negatively when developers fail to find high-quality information relevant to their problem. While there is a substantial amount of freely available resources that can be accessed online, some of the available resources contain information that suffers from error proneness, copyright infringement, security concerns, and incompatible versions. Use of such toxic information can have a strong negative effect on developer’s efficacy. This dissertation focuses specifically on software tutorials, aiming to automatically …


Al Planning Assistant For Scheduling Daily Activities, Priyanka Ahuja May 2018

Al Planning Assistant For Scheduling Daily Activities, Priyanka Ahuja

Theses and Dissertations

Artificial conversational agents are software agents that can interact with humans in the way humans do. Siri Cortana, and Alexa are examples of intelligent agents that can help us with almost all the basic tasks. These agents are smart enough to do the basic tasks, but not as much when it comes to complex tasks, such as analyzing traffic data, reviewing scheduling conflicts, rescheduling meetings while resolving conflicts, and offering suggestions based upon data analyses (e.g. traffic patterns, weather, etc.) The actual potential of dialogue-based task agent potential remains untapped. The reason is the fact agents lack the ability to …


Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand Aug 2017

Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand

Theses and Dissertations

Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain …


Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu Aug 2017

Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu

Theses and Dissertations

Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to …


Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung Aug 2015

Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung

Theses and Dissertations

Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to …


Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng Dec 2014

Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng

Theses and Dissertations

This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method …


Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh May 2014

Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh

Theses and Dissertations

Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the …


Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand May 2014

Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand

Theses and Dissertations

The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, …


Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad Dec 2013

Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad

Theses and Dissertations

One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …