Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Dissertations, Theses, and Capstone Projects

Translation

Articles 1 - 1 of 1

Full-Text Articles in Social and Behavioral Sciences

Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly Oct 2014

Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly

Dissertations, Theses, and Capstone Projects

In this paper, we present a new sentence alignment system (Canvas), which is a Python implementation of a geometric approach to sentence alignment, based on lexical cues. Canvas system is designed mainly to handle parallel texts exhibiting complex misalignment patterns, namely within English-Arabic pairs for United Nations documents. The system relies heavily on pre-indexing words/tokens in the source and target texts, and it creates correspondences between the token indexes. From this point onward, the alignment problem is reduced to a geometric problem of finding the path that runs through the True Correspondence Points (TCPs). The likelihood of a point being …