Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Other Computer Sciences
Separating Markup From Text, Ronald I. Greenberg, George K. Thiruvathukal
Separating Markup From Text, Ronald I. Greenberg, George K. Thiruvathukal
Computer Science: Faculty Publications and Other Works
As more and more online versions of Humanities texts are created, it is becoming commonplace to embed elaborate formatting, for example, through the use of HTML. But this can interfere with computerized analyses of the original text. While it may seem, at first, straightforward to simply strip markup from text, this is not the reality. Many digital texts add things that appear to be legitimate content according to the markup syntax, for example, line numbers, and even apart from this issue, existing tools for stripping markup produce inconsistent results. Apart from adopting and enforcing strict conventions for adding markup to …