Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
A Trainable, Single-Pass Algorithm For Column Segmentation, Son Sylwester, Sharad C. Seth
A Trainable, Single-Pass Algorithm For Column Segmentation, Son Sylwester, Sharad C. Seth
CSE Conference and Workshop Papers
Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm described here, XYCUT, relies on horizontal and vertical binary profiles to produce an XY- tree representing the column structure of a page of a technical document in a single pass through the bit image. Training against ground truth adjusts a single, resolution independent, parameter using only local information and guided by an edit distance function. The algorithm correctly segments the page image for a (fairly) wide range of parameter values, although small, local and repairable errors may be made, an effect measured by a repair cost function.