Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

2013

Discipline
Institution
Keyword
Publication
Publication Type

Articles 1 - 5 of 5

Full-Text Articles in Computational Linguistics

Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes Jun 2013

Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes

Master's Theses

In the field of speech recognition, an algorithm must learn to tell the difference between "a nice rock" and "a gneiss rock". These identical-sounding phrases are called oronyms. Word frequency dictionaries are often used by speech recognition systems to help resolve phonetic sequences with more than one possible orthographic phrase interpretation, by looking up which oronym of the root phonetic sequence contains the most-common words.

Our paper demonstrates a technique used to validate word frequency dictionary values. We chose to use frequency values from the UNISYN dictionary, which tallies each word on a per-occurance basis, using a proprietary text corpus, …


Csc Senior Project: Nlpstats, Michael Mease Mar 2013

Csc Senior Project: Nlpstats, Michael Mease

Computer Science and Software Engineering

Natural Language Processing has recently increased in popularity. The field of authorship analysis, specifically, uses various characteristics of text quantified by markers. NLPStats serves as a tool designed to streamline marker extraction based on user needs. A flexible query system allows for custom marker requests, adjustment of result formatting, and preprocessing options. Furthermore, an efficiently designed structure ensures that users retrieve information quickly. As a whole, NLPStats enables anyone, regardless of NLP experience, to extract important information about the text of a document.


Relation Between Harappan And Brahmi Scripts, Subhajit Kumar Ganguly Jan 2013

Relation Between Harappan And Brahmi Scripts, Subhajit Kumar Ganguly

Subhajit Kumar Ganguly

Around 45 odd signs out of the total number of Harappan signs found make up almost 100 percent of the inscriptions, in some form or other, as said earlier. Out of these 45 signs, around 40 are readily distinguishable. These form an almost exclusive and unique set. The primary signs are seen to have many variants, as in Brahmi. Many of these provide us with quite a vivid picture of their evolution, depending upon the factors of time, place and usefulness. Even minor adjustments in such signs, depending upon these factors, are noteworthy. Many of the signs in this list …


Maximizing Classification Accuracy In Native Language Identification, Scott Jarvis, Yves Bestgen, Steve Pepper Jan 2013

Maximizing Classification Accuracy In Native Language Identification, Scott Jarvis, Yves Bestgen, Steve Pepper

Yves Bestgen

This paper reports our contribution to the 2013 NLI Shared Task. The purpose of the task was to train a machine-learning system to identify the native-language affiliations of 1,100 texts written in English by nonnative speakers as part of a high-stakes test of gen- eral academic English proficiency. We trained our system on the new TOEFL11 corpus, which includes 11,000 essays written by nonnative speakers from 11 native-language backgrounds. Our final system used an SVM classifier with over 400,000 unique features consisting of lexical and POS n-grams occur- ring in at least two texts in the training set. Our system …


Aplicabilidad De La Tipología De Funciones Retóricas De Las Citas Al Género De La Memoria De Máster En Un Contexto Transcultural De Enseñanza Universitaria, David Sánchez-Jiménez Jan 2013

Aplicabilidad De La Tipología De Funciones Retóricas De Las Citas Al Género De La Memoria De Máster En Un Contexto Transcultural De Enseñanza Universitaria, David Sánchez-Jiménez

Publications and Research

The aim of this paper is to compare the rhetorical functions gathered from the citations of (14) fourteen master´s theses written by seven Spanish and seven Philippine authors. A typology of nine categories was used in order to identify the cultural rhetorical differences that exist in the use of citation from the contrast between contrasting this element in the Philippine and Spanish cultures. The methodology used is textual analysis of the linguistic context of these citations and its subsequent classification within these nine categories. The results show that there are quantitative and qualitative differences between the cultural conventions of citations …