Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

University of Nebraska - Lincoln

Profiles

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

A Trainable, Single-Pass Algorithm For Column Segmentation, Son Sylwester, Sharad C. Seth Jan 1995

A Trainable, Single-Pass Algorithm For Column Segmentation, Son Sylwester, Sharad C. Seth

CSE Conference and Workshop Papers

Column Segmentation logically precedes OCR in the document analysis process. The trainable algorithm described here, XYCUT, relies on horizontal and vertical binary profiles to produce an XY- tree representing the column structure of a page of a technical document in a single pass through the bit image. Training against ground truth adjusts a single, resolution independent, parameter using only local information and guided by an edit distance function. The algorithm correctly segments the page image for a (fairly) wide range of parameter values, although small, local and repairable errors may be made, an effect measured by a repair cost function.