Computational Linguistics | Open Access Articles

Statistical Machine Translation Of Japanese, Erik A. Chapla Mar 2007

Statistical Machine Translation Of Japanese, Erik A. Chapla

Theses and Dissertations

The purpose of this research was to find ways to improve the performance of a statistical machine translation system that translates text from Japanese to English. Methods included altering the training and test data by adding a prior linguistic knowledge, altering sentence structures, and looking for better ways to statistically alter the way words align between the two languages. In addition, methods for properly segmenting words in Japanese text through statistical methods were examined. Finally, experiments were conducted on Japanese speech to produce the best text transcription of the speech. The best statistical machine translation methods implemented resulted in improvements …

Go to article

Frequency Based Incremental Attribute Selection For Gre., John D. Kelleher Jan 2007

Frequency Based Incremental Attribute Selection For Gre., John D. Kelleher

Conference papers

The DIT system uses an incremental greedy search to generate descriptions, similar to the incremental algorithm described in (Dale and Reiter, 1995). The selection of the next attribute to be tested for inclusion in the description is ordered by the absolute frequency of each attribute in the training corpus. Attributes are selected in descending order of frequency (i.e. the attribute that occurred most frequently in the training corpus is selected first). Where two or more attributes have the same frequency of occurrence the first attribute found with that frequency is selected. The type attribute is always included in the description. …

Go to article

A Classifier To Evaluate Language Specificity In Medical Documents, Trudi Miller '08, Gondy A. Leroy, Samir Chatterjee, Jie Fan, Brian Thoms '09 Jan 2007

A Classifier To Evaluate Language Specificity In Medical Documents, Trudi Miller '08, Gondy A. Leroy, Samir Chatterjee, Jie Fan, Brian Thoms '09

CGU Faculty Publications and Research

Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience's grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naive Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied …

Go to article

Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi Jan 2007

Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi

Faculty Publications

In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and …

Go to article

Proceedings Of The 4th Acl-Sigsem Workshop On Prepositions At Acl-2007., Fintan Costello, John D. Kelleher, Martin Volk Jan 2007

Proceedings Of The 4th Acl-Sigsem Workshop On Prepositions At Acl-2007., Fintan Costello, John D. Kelleher, Martin Volk

Conference papers

This volume contains the papers presented at the Fourth ACL-SIGSEM Workshop on Prepositions. This workshop is endorsed by the ACL Special Interest Group on Semantics (ACL-SIGSEM), and is hosted in conjunction with ACL 2007, taking place on 28th June, 2007 in Prague, the Czech Republic.

Go to article

Computational Linguistics Commons^™

Full-Text Articles in Computational Linguistics

Statistical Machine Translation Of Japanese, Erik A. Chapla

Theses and Dissertations

Frequency Based Incremental Attribute Selection For Gre., John D. Kelleher

Conference papers

A Classifier To Evaluate Language Specificity In Medical Documents, Trudi Miller '08, Gondy A. Leroy, Samir Chatterjee, Jie Fan, Brian Thoms '09

CGU Faculty Publications and Research

Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi

Faculty Publications

Proceedings Of The 4th Acl-Sigsem Workshop On Prepositions At Acl-2007., Fintan Costello, John D. Kelleher, Martin Volk

Conference papers