Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Entire DC Network
Semantic Similarity Detection In Natural Language Documents, Lianyu Zhao
Semantic Similarity Detection In Natural Language Documents, Lianyu Zhao
All Theses
Data leak prevention (DLP) solutions monitor and control data flow. Current techniques find data that matches user defined syntactic patterns. Unfortunately, large classes of DLP relevant data are defined by information semantics, rather than data syntax. Syntax refers to data format, whereas semantics refers to data meaning. The class of social security numbers can be adequately expressed using data syntax, whereas a new industrial process can only be adequately described using information semantics. In this paper, we propose methods for extracting and identifying document semantics using training sets of limited size (tens of documents). The first method is based on …