Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Browse all Theses and Dissertations

Theses/Dissertations

2020

Natural Language Processing

Articles 1 - 1 of 1

Full-Text Articles in Engineering

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes Jan 2020

Topological Analysis Of Averaged Sentence Embeddings, Wesley J. Holmes

Browse all Theses and Dissertations

Sentence embeddings are frequently generated by using complex, pretrained models that were trained on a very general corpus of data. This thesis explores a potential alternative method for generating high-quality sentence embeddings for highly specialized corpora in an efficient manner. A framework for visualizing and analyzing sentence embeddings is developed to help assess the quality of sentence embeddings for a highly specialized corpus of documents related to the 2019 coronavirus epidemic. A Topological Data Analysis (TDA) technique is explored as an alternative method for grouping embeddings for document clustering and topic modeling tasks and is compared to a simple clustering …