Open Access. Powered by Scholars. Published by Universities.®
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
Articles 1 - 1 of 1
Full-Text Articles in Physical Sciences and Mathematics
Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly
Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly
Dissertations, Theses, and Capstone Projects
In this paper, we present a new sentence alignment system (Canvas), which is a Python implementation of a geometric approach to sentence alignment, based on lexical cues. Canvas system is designed mainly to handle parallel texts exhibiting complex misalignment patterns, namely within English-Arabic pairs for United Nations documents. The system relies heavily on pre-indexing words/tokens in the source and target texts, and it creates correspondences between the token indexes. From this point onward, the alignment problem is reduced to a geometric problem of finding the path that runs through the True Correspondence Points (TCPs). The likelihood of a point being …