Open Access. Powered by Scholars. Published by Universities.®

Jewish Studies Commons

Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences

City University of New York (CUNY)

Theses/Dissertations

Orthographic normalization

Articles 1 - 1 of 1

Full-Text Articles in Jewish Studies

Techniques For Automatic Normalization Of Orthographically Variant Yiddish Texts, Yakov Peretz Blum Feb 2015

Techniques For Automatic Normalization Of Orthographically Variant Yiddish Texts, Yakov Peretz Blum

Dissertations, Theses, and Capstone Projects

Yiddish is characterized by a multitude of orthographic systems. A number of approaches to automatic normalization of variant orthography have been explored for the processing of historic texts of languages whose orthography has since been standardized. However, these approaches have not yet been applied to Yiddish.

Using a manually normalized set of 16 Yiddish documents as a training and test corpus, four techniques for automatic normalization were compared: a hand-crafted set of transformation rules, an off-the-shelf spell checker, edit distance minimization with manually set weights, and edit distance minimization with weights learned through a training set.

Performance was evaluated by …