Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Institution
- Keyword
-
- Computational linguistics (2)
- Active learning (1)
- Application software (1)
- Artificial intelligence (1)
- Computation (1)
-
- Information technology (1)
- Japanese language--Machine translating (1)
- Maximum Entropy Markov Model tagger (1)
- Methods (Computer system) (1)
- Natural Language Generation (1)
- Natural language processing (Computer science) (1)
- Part-of-speech annotated corpus (1)
- Pattern recognition systems (1)
- Prepositions (1)
- Query by Committee (QBC) (1)
- Query by Uncertainty (QBU) (1)
- Referring Expression Generation (1)
- Semantics (1)
- Publication
- Publication Type
Articles 1 - 5 of 5
Full-Text Articles in Computational Linguistics
Statistical Machine Translation Of Japanese, Erik A. Chapla
Statistical Machine Translation Of Japanese, Erik A. Chapla
Theses and Dissertations
The purpose of this research was to find ways to improve the performance of a statistical machine translation system that translates text from Japanese to English. Methods included altering the training and test data by adding a prior linguistic knowledge, altering sentence structures, and looking for better ways to statistically alter the way words align between the two languages. In addition, methods for properly segmenting words in Japanese text through statistical methods were examined. Finally, experiments were conducted on Japanese speech to produce the best text transcription of the speech. The best statistical machine translation methods implemented resulted in improvements …
Frequency Based Incremental Attribute Selection For Gre., John D. Kelleher
Frequency Based Incremental Attribute Selection For Gre., John D. Kelleher
Conference papers
The DIT system uses an incremental greedy search to generate descriptions, similar to the incremental algorithm described in (Dale and Reiter, 1995). The selection of the next attribute to be tested for inclusion in the description is ordered by the absolute frequency of each attribute in the training corpus. Attributes are selected in descending order of frequency (i.e. the attribute that occurred most frequently in the training corpus is selected first). Where two or more attributes have the same frequency of occurrence the first attribute found with that frequency is selected. The type attribute is always included in the description. …
A Classifier To Evaluate Language Specificity In Medical Documents, Trudi Miller '08, Gondy A. Leroy, Samir Chatterjee, Jie Fan, Brian Thoms '09
A Classifier To Evaluate Language Specificity In Medical Documents, Trudi Miller '08, Gondy A. Leroy, Samir Chatterjee, Jie Fan, Brian Thoms '09
CGU Faculty Publications and Research
Consumer health information written by health care professionals is often inaccessible to the consumers it is written for. Traditional readability formulas examine syntactic features like sentence length and number of syllables, ignoring the target audience's grasp of the words themselves. The use of specialized vocabulary disrupts the understanding of patients with low reading skills, causing a decrease in comprehension. A naive Bayes classifier for three levels of increasing medical terminology specificity (consumer/patient, novice health learner, medical professional) was created with a lexicon generated from a representative medical corpus. Ninety-six percent accuracy in classification was attained. The classifier was then applied …
Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi
Active Learning For Part-Of-Speech Tagging: Accelerating Corpus Annotation, Deryle W. Lonsdale, Eric K. Ringger, Peter J. Mcclanahan, Robbie A. Haertel, George Busby, Marc A. Carmen, James Carroll, Kevin Seppi
Faculty Publications
In the construction of a part-of-speech annotated corpus, we are constrained by a fixed budget. A fully annotated corpus is required, but we can afford to label only a subset. We train a Maximum Entropy Markov Model tagger from a labeled subset and automatically tag the remainder. This paper addresses the question of where to focus our manual tagging efforts in order to deliver an annotation of highest quality. In this context, we find that active learning is always helpful. We focus on Query by Uncertainty (QBU) and Query by Committee (QBC) and report on experiments with several baselines and …
Proceedings Of The 4th Acl-Sigsem Workshop On Prepositions At Acl-2007., Fintan Costello, John D. Kelleher, Martin Volk
Proceedings Of The 4th Acl-Sigsem Workshop On Prepositions At Acl-2007., Fintan Costello, John D. Kelleher, Martin Volk
Conference papers
This volume contains the papers presented at the Fourth ACL-SIGSEM Workshop on Prepositions. This workshop is endorsed by the ACL Special Interest Group on Semantics (ACL-SIGSEM), and is hosted in conjunction with ACL 2007, taking place on 28th June, 2007 in Prague, the Czech Republic.