Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Clemson University

2012

HMM

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Semantic Similarity Detection In Natural Language Documents, Lianyu Zhao Dec 2012

Semantic Similarity Detection In Natural Language Documents, Lianyu Zhao

All Theses

Data leak prevention (DLP) solutions monitor and control data flow. Current techniques find data that matches user defined syntactic patterns. Unfortunately, large classes of DLP relevant data are defined by information semantics, rather than data syntax. Syntax refers to data format, whereas semantics refers to data meaning. The class of social security numbers can be adequately expressed using data syntax, whereas a new industrial process can only be adequately described using information semantics. In this paper, we propose methods for extracting and identifying document semantics using training sets of limited size (tens of documents). The first method is based on …