Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Western Kentucky University

Masters Theses & Specialist Projects

Physical Sciences and Mathematics

2011

Document mark up language

Articles 1 - 1 of 1

Full-Text Articles in Entire DC Network

Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan May 2011

Efficient Schema Extraction From A Collection Of Xml Documents, Vijayeandra Parthepan

Masters Theses & Specialist Projects

The eXtensible Markup Language (XML) has become the standard format for data exchange on the Internet, providing interoperability between different business applications. Such wide use results in large volumes of heterogeneous XML data, i.e., XML documents conforming to different schemas. Although schemas are important in many business applications, they are often missing in XML documents. In this thesis, we present a suite of algorithms that are effective in extracting schema information from a large collection of XML documents. We propose using the cost of NFA simulation to compute the Minimum Length Description to rank the inferred schema. We also studied …