Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 6 of 6

Full-Text Articles in Computational Linguistics

English Wordnet Taxonomic Random Walk Pseudo-Corpora, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher May 2020

English Wordnet Taxonomic Random Walk Pseudo-Corpora, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

This is a resource description paper that describes the creation and properties of a set of pseudo-corpora generated artificially from a random walk over the English WordNet taxonomy. Our WordNet taxonomic random walk implementation allows the exploration of different random walk hyperparameters and the generation of a variety of different pseudo-corpora. We find that different combinations of the walk’s hyperparameters result in varying statistical properties of the generated pseudo-corpora. We have published a total of 81 pseudo-corpora that we have used in our previous research, but have not exhausted all possible combinations of hyperparameters, which is why we have also …


Size Matters: The Impact Of Training Size In Taxonomically-Enriched Word Embeddings, Alfredo Maldonado, Filip Klubicka, John D. Kelleher Oct 2019

Size Matters: The Impact Of Training Size In Taxonomically-Enriched Word Embeddings, Alfredo Maldonado, Filip Klubicka, John D. Kelleher

Articles

Word embeddings trained on natural corpora (e.g., newspaper collections, Wikipedia or the Web) excel in capturing thematic similarity (“topical relatedness”) on word pairs such as ‘coffee’ and ‘cup’ or ’bus’ and ‘road’. However, they are less successful on pairs showing taxonomic similarity, like ‘cup’ and ‘mug’ (near synonyms) or ‘bus’ and ‘train’ (types of public transport). Moreover, purely taxonomy-based embeddings (e.g. those trained on a random-walk of WordNet’s structure) outperform natural-corpus embeddings in taxonomic similarity but underperform them in thematic similarity. Previous work suggests that performance gains in both types of similarity can be achieved by enriching natural-corpus embeddings with …


Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher Jul 2019

Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find …


Metadata And Linked Data In Word Sense Disambiguation, Matthew Corsmeier Jan 2015

Metadata And Linked Data In Word Sense Disambiguation, Matthew Corsmeier

Library Philosophy and Practice (e-journal)

Word Sense Disambiguation (WSD) can be assisted by taking advantage of the metadata embedded in the various ontologies, lexica, databases, etc… that exist in the Semantic Web. Automated processes that exploit the links already present in the Semantic Web can strengthen parsing of word senses by using user-contributed and semantically-linked data. These processes are only possible because of a commitment to interoperability and the creation of shared standards. This paper will review some of the most heavily used Linguistic Linked Open Data (LLOD) tools and models which show the most promise for using metadata to alleviate problems caused by polysemous …


An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler Dec 2014

An Empirical Study Of Semantic Similarity In Wordnet And Word2vec, Abram Handler

University of New Orleans Theses and Dissertations

This thesis performs an empirical analysis of Word2Vec by comparing its output to WordNet, a well-known, human-curated lexical database. It finds that Word2Vec tends to uncover more of certain types of semantic relations than others -- with Word2Vec returning more hypernyms, synonomyns and hyponyms than hyponyms or holonyms. It also shows the probability that neighbors separated by a given cosine distance in Word2Vec are semantically related in WordNet. This result both adds to our understanding of the still-unknown Word2Vec and helps to benchmark new semantic tools built from word vectors.


An Operator-Based Account Of Semantic Processing, Deryle W. Lonsdale, C. Anton Rytting Jan 2006

An Operator-Based Account Of Semantic Processing, Deryle W. Lonsdale, C. Anton Rytting

Faculty Publications

This paper explores issues of psychological plausibility in modeling natural language understanding within Soar, a symbolic cognitive model. It focuses on constructing syntactic and semantic representations in simulated real time, with particular emphasis on word sense disambiguation (WSD). We discuss (i) what level of WSD should be modeled and (ii) how to use resources such as WordNet to inform these models. A preliminary model of coarse-grained WSD is included to show how syntactic, semantic, and other knowledge sources interact in Soar. Finally, we explore issues of interleaving, learning, and integrating other WSD approaches with Soar's native model of learning.