Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

University of Windsor

Electronic Theses and Dissertations

2019

Data mining

Articles 1 - 2 of 2

Full-Text Articles in Entire DC Network

Learning Embeddings For Academic Papers, Yi Zhang Sep 2019

Learning Embeddings For Academic Papers, Yi Zhang

Electronic Theses and Dissertations

Academic papers contain both text and citation links. Representing such data is crucial for many downstream tasks, such as classification, disambiguation, duplicates detection, recommendation and influence prediction. The success of Skip-gram with Negative Sampling model (hereafter SGNS) has inspired many algorithms to learn embeddings for words, documents, and networks. However, there is limited research on learning the representation of linked documents such as academic papers. This dissertation first studies the norm convergence issue in SGNS and propose to use an L2 regularization to fix the problem. Our experiments show that our method improves SGNS and its variants on different types …


Improving Document Representation Using Retrofitting, Zeeshan Mansoor Jan 2019

Improving Document Representation Using Retrofitting, Zeeshan Mansoor

Electronic Theses and Dissertations

Data-driven learning of document vectors that capture linkage between them is of immense importance in natural language processing (NLP). These document vectors can, in turn, be used for tasks like information retrieval, document classification, and clustering. Inherently, documents are linked together in the form of links or citations in case of web pages or academic papers respectively. Methods like PV-DM or PV-DBOW try to capture the semantic representation of the document using only the text information. These methods ignore the network information altogether while learning the representation. Similarly, methods developed for network representation learning like node2vec or DeepWalk, capture the …