Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Institution
Articles 1 - 20 of 20
Full-Text Articles in Physical Sciences and Mathematics
Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim
Robust And Uncertainty-Aware Image Classification Using Bayesian Vision Transformer Model, Fazlur Rahman Bin Karim
Theses and Dissertations
Transformer Neural Networks have emerged as the predominant architecture for addressing a wide range of Natural Language Processing (NLP) applications such as machine translation, speech recognition, sentiment analysis, text anomaly detection, etc. This noteworthy achievement of Transformer Neural Networks in the NLP field has sparked a growing interest in integrating and utilizing Transformer models in computer vision tasks. The Vision Transformer (ViT) model efficiently captures long-range dependencies by employing a self-attention mechanism to transform different image data into meaningful, significant representations. Recently, the Vision Transformer (ViT) has exhibited incredible performance in solving image classification problems by utilizing ViT models, thereby …
Extracting Patterns Of Semantic Roles From Accident Narratives, Soundarya Jayakumar
Extracting Patterns Of Semantic Roles From Accident Narratives, Soundarya Jayakumar
Theses and Dissertations
Accident databases are filled with rich information about accidents. Analyzing these datasets can reveal useful information which can be used to prevent similar accidents in the future. Policy makers, and safety management organizations can design appropriate measures based on the analysis done to prevent accidents. Besides structured data, crash reports include natural language narratives which contain valuable accident-related information which is otherwise not present in the structured data. Using natural language processing (NLP) techniques one can analyze these narratives and mine hidden patterns of accidents from them. The thesis focuses on developing an algorithm to extract common patterns of semantic …
Emotion Classification And Intensity Prediction On Tweets, Sharath Chander Pugazhenthi
Emotion Classification And Intensity Prediction On Tweets, Sharath Chander Pugazhenthi
Theses and Dissertations
The task of finding an emotion associated with the text from individuals on a social media platform has become very crucial as it influences the current state of mind of a particular individual in real life. It also helps one to understand social behavior at a given point in time. Microblogging platforms like Twitter serves as a powerful tool for expressing one’s thoughts. Several work have been done in classifying the emotion associated with it. The thesis comprises of a system that first classifies the tweet into one of the four emotions - anger, joy, sadness, and fear with good …
Learning Analytics Through Machine Learning And Natural Language Processing, Bokai Yang
Learning Analytics Through Machine Learning And Natural Language Processing, Bokai Yang
Theses and Dissertations
The increase of computing power and the ability to log students’ data with the help of the computer-assisted learning systems has led to an increased interest in developing and applying computer science techniques for analyzing learning data. To understand and investigate how learning-generated data can be used to improve student success, data mining techniques have been applied to several educational tasks. This dissertation investigates three important tasks in various domains of educational data mining: learners’ behavior analysis, essay structure analysis and feedback providing, and learners’ dropout prediction. The first project applied latent semantic analysis and machine learning approaches to investigate …
A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran
A Study On Developing Novel Methods For Relation Extraction, Darshini Mahendran
Theses and Dissertations
Relation Extraction (RE) is a task of Natural Language Processing (NLP) to detect and classify the relations between two entities. Relation extraction in the biomedical and scientific literature domain is challenging as text can contain multiple pairs of entities in the same instance. During the course of this research, we developed an RE framework (RelEx), which consists of five main RE paradigms: rule-based, machine learning-based, Convolutional Neural Network (CNN)-based, Bidirectional Encoder Representations from Transformers (BERT)-based, and Graph Convolutional Networks (GCNs)-based approaches. RelEx's rule-based approach uses co-location information of the entities to determine whether a relation exists between a selected entity …
Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch
Predicting Occurrence Of The Term Sarcopenia With Semi-Supervised Machine Learning, Kevin Flasch
Theses and Dissertations
Sarcopenia is a medical condition that involves loss of muscle mass. It has been difficult todefine and only recently assigned an official medical code, leading to many medical records lacking a coded diagnosis although the clinical note text may discuss it or symptoms of it. This thesis investigates the application of machine learning and natural language processing to analyze clinical note text to see how well the term ’sarcopenia’ can be predicted in clinical note text from records concerning the condition.
A variety of machine learning models combined with different features and text processingare tested against training data that mentions …
Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi
Improving Space Efficiency Of Deep Neural Networks, Aliakbar Panahi
Theses and Dissertations
Language models employ a very large number of trainable parameters. Despite being highly overparameterized, these networks often achieve good out-of-sample test performance on the original task and easily fine-tune to related tasks. Recent observations involving, for example, intrinsic dimension of the objective landscape and the lottery ticket hypothesis, indicate that often training actively involves only a small fraction of the parameter space. Thus, a question remains how large a parameter space needs to be in the first place — the evidence from recent work on model compression, parameter sharing, factorized representations, and knowledge distillation increasingly shows that models can be …
Use Of Text Data In Identifying And Prioritizing Potential Drug Repositioning Candidates, Majid Rastegar-Mojarad
Use Of Text Data In Identifying And Prioritizing Potential Drug Repositioning Candidates, Majid Rastegar-Mojarad
Theses and Dissertations
New drug development costs between 500 million and 2 billion dollars and takes 10-15 years, with a success rate of less than 10%. Drug repurposing (defined as discovering new indications for existing drugs) could play a significant role in drug development, especially considering the declining success rates of developing novel drugs. In the period 2007-2009, drug repurposing led to the launching of 30-40% of new drugs. Typically, new indications for existing medications are identified by accident. However, new technologies and a large number of available resources enable the development of systematic approaches to identify and validate drug-repurposing candidates with significantly …
An Instruction Embedding Model For Binary Code Analysis, Kimberly Michelle Redmond
An Instruction Embedding Model For Binary Code Analysis, Kimberly Michelle Redmond
Theses and Dissertations
Binary code analysis is important for understanding programs without access to the original source code, which is common with proprietary software. Analyzing binaries can be challenging given their high variability: due to growth in tech manufactur- ers, source code is now frequently compiled for multiple instruction set architectures (ISAs); however, there is no formal dictionary that translates between their assem- bly languages. The difficulty of analysis is further compounded by different compiler optimizations and obfuscated malware signatures. Such minutiae means that some vulnerabilities may only be detectable on a fine-grained level. Recent strides in ma- chine learning—particularly in Natural Language …
Curtus: An Nlp Tool To Map Job Skills To Academic Courses, Daniel Rockwell
Curtus: An Nlp Tool To Map Job Skills To Academic Courses, Daniel Rockwell
Theses and Dissertations
Many businesses are burdened with the need to train students for the job instead of finding them prepared for it. Few business leaders feel that colleges prepare students for future jobs from day one. It can be a challenge for colleges to determine if their curricula meet the industry needs. Mapping industry needs to academic courses can be advantageous to both parties as it will allow colleges to be aligned with the industry needs and accordingly satisfy those needs and will allow the industry to hire better prepared graduates. In an attempt to address this, a system prototype that uses …
Indirect Relatedness, Evaluation, And Visualization For Literature Based Discovery, Sam Henry
Indirect Relatedness, Evaluation, And Visualization For Literature Based Discovery, Sam Henry
Theses and Dissertations
The exponential growth of scientific literature is creating an increased need for systems to process and assimilate knowledge contained within text. Literature Based Discovery (LBD) is a well established field that seeks to synthesize new knowledge from existing literature, but it has remained primarily in the theoretical realm rather than in real-world application. This lack of real-world adoption is due in part to the difficulty of LBD, but also due to several solvable problems present in LBD today. Of these problems, the ones in most critical need of improvement are: (1) the over-generation of knowledge by LBD systems, (2) a …
Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi
Assessing The Quality Of Software Development Tutorials Available On The Web, Manziba A. Nishi
Theses and Dissertations
Both expert and novice software developers frequently access software development resources available on the Web in order to lookup or learn new APIs, tools and techniques. Software quality is affected negatively when developers fail to find high-quality information relevant to their problem. While there is a substantial amount of freely available resources that can be accessed online, some of the available resources contain information that suffers from error proneness, copyright infringement, security concerns, and incompatible versions. Use of such toxic information can have a strong negative effect on developer’s efficacy. This dissertation focuses specifically on software tutorials, aiming to automatically …
Al Planning Assistant For Scheduling Daily Activities, Priyanka Ahuja
Al Planning Assistant For Scheduling Daily Activities, Priyanka Ahuja
Theses and Dissertations
Artificial conversational agents are software agents that can interact with humans in the way humans do. Siri Cortana, and Alexa are examples of intelligent agents that can help us with almost all the basic tasks. These agents are smart enough to do the basic tasks, but not as much when it comes to complex tasks, such as analyzing traffic data, reviewing scheduling conflicts, rescheduling meetings while resolving conflicts, and offering suggestions based upon data analyses (e.g. traffic patterns, weather, etc.) The actual potential of dialogue-based task agent potential remains untapped. The reason is the fact agents lack the ability to …
Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand
Unsupervised Biomedical Named Entity Recognition, Omid Ghiasvand
Theses and Dissertations
Named entity recognition (NER) from text is an important task for several applications, including in the biomedical domain. Supervised machine learning based systems have been the most successful on NER task, however, they require correct annotations in large quantities for training. Annotating text manually is very labor intensive and also needs domain expertise. The purpose of this research is to reduce human annotation effort and to decrease cost of annotation for building NER systems in the biomedical domain. The method developed in this work is based on leveraging the availability of resources like UMLS (Unified Medical Language System), that contain …
Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu
Bayesian Methods And Machine Learning For Processing Text And Image Data, Yingying Gu
Theses and Dissertations
Classification/clustering is an important class of unstructured data processing problems. The classification (supervised, semi-supervised and unsupervised) aims to discover the clusters and group the similar data into categories for information organization and knowledge discovery. My work focuses on using the Bayesian methods and machine learning techniques to classify the free-text and image data, and address how to overcome the limitations of the traditional methods. The Bayesian approach provides a way to allow using more variations(numerical or categorical), and estimate the probabilities instead of explicit rules, which will benefit in the ambiguous cases. The MAP(maximum a posterior) estimation is used to …
Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung
Three Essays On Enhancing Clinical Trial Subject Recruitment Using Natural Language Processing And Text Mining, Euisung Jung
Theses and Dissertations
Patient recruitment and enrollment are critical factors for a successful clinical trial; however, recruitment tends to be the most common problem in most clinical trials. The success of a clinical trial depends on efficiently recruiting suitable patients to conduct the trial. Every clinical trial research has a protocol, which describes what will be done in the study and how it will be conducted. Also, the protocol ensures the safety of the trial subjects and the integrity of the data collected. The eligibility criteria section of clinical trial protocols is important because it specifies the necessary conditions that participants have to …
Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng
Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng
Theses and Dissertations
This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method …
Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh
Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh
Theses and Dissertations
Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the …
Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand
Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand
Theses and Dissertations
The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, …
Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad
Extraction And Classification Of Drug-Drug Interaction From Biomedical Text Using A Two-Stage Classifier, Majid Rastegar-Mojarad
Theses and Dissertations
One of the critical causes of medical errors is Drug-Drug interaction (DDI), which occurs when one drug increases or decreases the effect of another drug. We propose a machine learning system to extract and classify drug-drug interactions from the biomedical literature, using the annotated corpus from the DDIExtraction-2013 shared task challenge. Our approach applies a two-stage classifier to handle the highly unbalanced class distribution in the corpus. The first stage is designed for binary classification of drug pairs as interacting or non-interacting, and the second stage for further classification of interacting pairs into one of four interacting types: advise, effect, …