Open Access. Powered by Scholars. Published by Universities.®

Other Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Arts and Humanities

Computer Science: Faculty Publications and Other Works

Articles 1 - 1 of 1

Full-Text Articles in Other Computer Sciences

Separating Markup From Text, Ronald I. Greenberg, George K. Thiruvathukal Nov 2017

Separating Markup From Text, Ronald I. Greenberg, George K. Thiruvathukal

Computer Science: Faculty Publications and Other Works

As more and more online versions of Humanities texts are created, it is becoming commonplace to embed elaborate formatting, for example, through the use of HTML. But this can interfere with computerized analyses of the original text. While it may seem, at first, straightforward to simply strip markup from text, this is not the reality. Many digital texts add things that appear to be legitimate content according to the markup syntax, for example, line numbers, and even apart from this issue, existing tools for stripping markup produce inconsistent results. Apart from adopting and enforcing strict conventions for adding markup to …