Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Engineering
Size Matters: The Impact Of Training Size In Taxonomically-Enriched Word Embeddings, Alfredo Maldonado, Filip Klubicka, John D. Kelleher
Size Matters: The Impact Of Training Size In Taxonomically-Enriched Word Embeddings, Alfredo Maldonado, Filip Klubicka, John D. Kelleher
Articles
Word embeddings trained on natural corpora (e.g., newspaper collections, Wikipedia or the Web) excel in capturing thematic similarity (“topical relatedness”) on word pairs such as ‘coffee’ and ‘cup’ or ’bus’ and ‘road’. However, they are less successful on pairs showing taxonomic similarity, like ‘cup’ and ‘mug’ (near synonyms) or ‘bus’ and ‘train’ (types of public transport). Moreover, purely taxonomy-based embeddings (e.g. those trained on a random-walk of WordNet’s structure) outperform natural-corpus embeddings in taxonomic similarity but underperform them in thematic similarity. Previous work suggests that performance gains in both types of similarity can be achieved by enriching natural-corpus embeddings with …