Computational Linguistics | Open Access Articles

Phonologically Informed Edit Distance Algorithms For Word Alignment With Low-Resource Languages, Richard T. Mccoy, Robert Frank

Robert Frank

We present three methods for weighting edit distance algorithms based on linguistic information. These methods base their penalties on (i) phonological features, (ii) distributional character embeddings, or (iii) differences between cognate words. We also introduce a novel method for evaluating edit distance through the task of low-resource word alignment by using edit-distance neighbors in a high-resource pivot language to inform alignments from the low-resource language. At this task, the cognate-based scheme outperforms our other methods and the Levenshtein edit distance baseline, showing that NLP applications can benefit from information about cross-linguistic phonological patterns.

Full-Text Articles in Computational Linguistics

Phonologically Informed Edit Distance Algorithms For Word Alignment With Low-Resource Languages, Richard T. Mccoy, Robert Frank

Robert Frank

Jabberwocky Parsing: Dependency Parsing With Lexical Noise, Jungo Kasai, Robert Frank

Robert Frank

Acoustic Classification Of Focus: On The Web And In The Lab, Jonathan Howell, Mats Rooth, Michael Wagner

Jonathan Howell

General Analysis Of An Online Language Corpus, Kerwin A. Livingstone

Kerwin A. Livingstone

Linguistics As Structure In Computer Animation: Toward A More Effective Synthesis Of Brow Motion In American Sign Language, Rosalee Wolfe, Peter Cook, John C. Mcdonald, Jerry Schnepp

Jerry C Schnepp

Towards News Verification: Deception Detection Methods For News Discourse, Victoria Rubin, Niall Conroy, Yimin Chen

Victoria Rubin

Predicting Survey Responses: How And Why Semantics Shape Survey Statistics On Organizational Behaviour, Ketil Arnulf, Kai R. Larsen, Øyvind Martinsen, Chih How Bong

Kai R.T. Larsen

Alternative Translation Approach – Part I: "Labor Division", Ludvig Glavati

Ludvig Glavati

Cecl: A New Baseline And A Non-Compositional Approach For The Sick Benchmark., Yves Bestgen

Yves Bestgen

Quantifying The Development Of Phraseological Competence In L2 English Writing: An Automated Approach, Yves Bestgen, Sylviane Granger

Yves Bestgen

Relation Between Harappan And Brahmi Scripts, Subhajit Kumar Ganguly

Subhajit Kumar Ganguly

Maximizing Classification Accuracy In Native Language Identification, Scott Jarvis, Yves Bestgen, Steve Pepper

Yves Bestgen

Evaluation Automatique De Textes Et Cohésion Lexicale, Yves Bestgen

Yves Bestgen

What's In A Letter?, Aaron J. Schein

Aaron J Schein

Using Textual Features To Predict Popular Content On Digg, Paul H. Miller

Paul H Miller

The Low Entropy Conjecture: The Challenges Of Modern Irish Nominal Declension, Robert Malouf, Farrell Ackerman

Robert Malouf

Computational Style Processing, Foaad Khosmood

Foaad Khosmood

Prosodylab-Aligner: A Tool For Forced Alignment Of Laboratory Speech, Kyle Gorman, Jonathan Howell, Michael Wagner

Jonathan Howell

Distribution Of Complexities In The Vai Script, Andrij Rovenchak, Ján Mačutek

Charles L. Riley

Automated Diagnostic Writing Tests: Why? How?, Elena Cotos, Nick Pendar

Elena Cotos

Automatic Identification Of Discourse Moves In Scientific Article Introductions, Elena Cotos, Nick Pendar

Elena Cotos

The Variable Elision Of Unstressed Vowels In European Portuguese: A Case Study, David James Silva

David Silva