Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler Dec 2014

An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler

University of New Orleans Theses and Dissertations

This thesis performs an empirical analysis of Word2Vec by comparing its output to WordNet, a well-known, human-curated lexical database. It finds that Word2Vec tends to uncover more of certain types of semantic relations than others -- with Word2Vec returning more hypernyms, synonomyns and hyponyms than hyponyms or holonyms. It also shows the probability that neighbors separated by a given cosine distance in Word2Vec are semantically related in WordNet. This result both adds to our understanding of the still-unknown Word2Vec and helps to benchmark new semantic tools built from word vectors.


Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng Dec 2014

Three Essays On Opinion Mining Of Social Media Texts, Shuyuan Deng

Theses and Dissertations

This dissertation research is a collection of three essays on opinion mining of social media texts. I explore different theoretical and methodological perspectives in this inquiry. The first essay focuses on improving lexicon-based sentiment classification. I propose a method to automatically generate a sentiment lexicon that incorporates knowledge from both the language domain and the content domain. This method learns word associations from a large unannotated corpus. These associations are used to identify new sentiment words. Using a Twitter data set containing 743,069 tweets related to the stock market, I show that the sentiment lexicons generated using the proposed method …


Alternative Approaches To Correction Of Malapropisms In Aiml Based Conversational Agents, Walter A. Brock Nov 2014

Alternative Approaches To Correction Of Malapropisms In Aiml Based Conversational Agents, Walter A. Brock

CCE Theses and Dissertations

The use of Conversational Agents (CAs) utilizing Artificial Intelligence Markup Language (AIML) has been studied in a number of disciplines. Previous research has shown a great deal of promise. It has also documented significant limitations in the abilities of these CAs. Many of these limitations are related specifically to the method employed by AIML to resolve ambiguities in the meaning and context of words. While methods exist to detect and correct common errors in spelling and grammar of sentences and queries submitted by a user, one class of input error that is particularly difficult to detect and correct is the …


Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams Aug 2014

Identification Of Informativeness In Text Using Natural Language Stylometry, Rushdi Shams

Electronic Thesis and Dissertation Repository

In this age of information overload, one experiences a rapidly growing over-abundance of written text. To assist with handling this bounty, this plethora of texts is now widely used to develop and optimize statistical natural language processing (NLP) systems. Surprisingly, the use of more fragments of text to train these statistical NLP systems may not necessarily lead to improved performance. We hypothesize that those fragments that help the most with training are those that contain the desired information. Therefore, determining informativeness in text has become a central issue in our view of NLP. Recent developments in this field have spawned …


Predicting Music Genre Preferences Based On Online Comments, Andrew J. Sinclair Jun 2014

Predicting Music Genre Preferences Based On Online Comments, Andrew J. Sinclair

Master's Theses

Communication Accommodation Theory (CAT) states that individuals adapt to each other’s communicative behaviors. This adaptation is called “convergence.” In this work we explore the convergence of writing styles of users of the online music distribution plat- form SoundCloud.com. In order to evaluate our system we created a corpus of over 38,000 comments retrieved from SoundCloud in April 2014. The corpus represents comments from 8 distinct musical genres: Classical, Electronic, Hip Hop, Jazz, Country, Metal, Folk, and World. Our corpus contains: short comments, frequent misspellings, little sentence struc- ture, hashtags, emoticons, and URLs. We adapt techniques used by researchers analyzing other …


Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh May 2014

Adverse Drug Event Detection, Causality Inference, Patient Communication And Translational Research, Balaji Polepalli Ramesh

Theses and Dissertations

Adverse drug events (ADEs) are injuries resulting from a medical intervention related to a drug. ADEs are responsible for nearly 20% of all the adverse events that occur in hospitalized patients. ADEs have been shown to increase the cost of health care and the length of stays in hospital. Therefore, detecting and preventing ADEs for pharmacovigilance is an important task that can improve the quality of health care and reduce the cost in a hospital setting. In this dissertation, we focus on the development of ADEtector, a system that identifies ADEs and medication information from electronic medical records and the …


Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand May 2014

Disease Name Extraction From Clinical Text Using Conditional Random Fields, Omid Ghiasvand

Theses and Dissertations

The aim of the research done in this thesis was to extract disease and disorder names from clinical texts. We utilized Conditional Random Fields (CRF) as the main method to label diseases and disorders in clinical sentences. We used some other tools such as MetaMap and Stanford Core NLP tool to extract some crucial features. MetaMap tool was used to identify names of diseases/disorders that are already in UMLS Metathesaurus. Some other important features such as lemmatized versions of words, and POS tags were extracted using the Stanford Core NLP tool. Some more features were extracted directly from UMLS Metathesaurus, …


Time Will Tell : Temporal Reasoning In Clinical Narratives And Beyond, Weiyi Sun Jan 2014

Time Will Tell : Temporal Reasoning In Clinical Narratives And Beyond, Weiyi Sun

Legacy Theses & Dissertations (2009 - 2024)

Temporal reasoning in natural language refers to the extraction and understanding of time-related information conveyed in free text. A clinical narrative temporal reasoning component can enable a spectrum of medical natural language processing (NLP) applications that directly improve patient care documentation efficiency, accessibility and accountability. This dissertation contributes in three subtasks under temporal reasoning: temporal annotation, temporal expression extraction and temporal relation inferences. The temporal annotation work described in the dissertation produced one of the first publicly available clinical narratives. We published one of the first sets of temporal


What Machines Understand About Personality Words After Reading The News, Eric David Moyer Jan 2014

What Machines Understand About Personality Words After Reading The News, Eric David Moyer

Browse all Theses and Dissertations

Vector-based lexical semantics is a powerful technique that still has many undiscovered applications. In this thesis I apply a vector-space lexical-semantic model newly developed by Mikolov et. al. trained on skip-grams to the lexical hypothesis in personality psychology. The method produces interpretable dimensions that are consistent across several sets of descriptive personality words. The dimensions include ones for conflict and positive and negative evaluation. However they are more descriptive of word usage semantics than of the characteristics of the thing described and thus do not include a recognizable component of the 5 factor model in their first 14 dimensions. They …