Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 5 of 5

Full-Text Articles in Physical Sciences and Mathematics

Systematic Comparison Of Cross-Lingual Projection Techniques For Low-Density Nlp Under Strict Resource Constraints, Joshua Waxman Oct 2014

Systematic Comparison Of Cross-Lingual Projection Techniques For Low-Density Nlp Under Strict Resource Constraints, Joshua Waxman

Dissertations, Theses, and Capstone Projects

The field of low-density NLP is often approached from an engineering perspective, and evaluations are typically haphazard - considering different architectures, given different languages, and different available resources - without a systematic comparison. The resulting architectures are then tested on the unique corpus and language for which this approach has been designed. This makes it difficult to truly evaluate which approach is truly the "best," or which approaches are best for a given language.

In this dissertation, several state-of-the-art architectures and approaches to low-density language Part-Of-Speech Tagging are reimplemented; all of these techniques exploit a relationship between a high-density (HD) …


Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly Oct 2014

Canvas: A Fast And Accurate Geometric Sentence Alignment System Using Lexical Cues Within Complex Misalignment Settings, Hussein M. Ghaly

Dissertations, Theses, and Capstone Projects

In this paper, we present a new sentence alignment system (Canvas), which is a Python implementation of a geometric approach to sentence alignment, based on lexical cues. Canvas system is designed mainly to handle parallel texts exhibiting complex misalignment patterns, namely within English-Arabic pairs for United Nations documents. The system relies heavily on pre-indexing words/tokens in the source and target texts, and it creates correspondences between the token indexes. From this point onward, the alignment problem is reduced to a geometric problem of finding the path that runs through the True Correspondence Points (TCPs). The likelihood of a point being …


Temporal Information Extraction And Knowledge Base Population, Taylor Cassidy Jun 2014

Temporal Information Extraction And Knowledge Base Population, Taylor Cassidy

Dissertations, Theses, and Capstone Projects

Temporal Information Extraction (TIE) from text plays an important role in many Natural Language Processing and Database applications. Many features of the world are time-dependent, and rich temporal knowledge is required for a more complete and precise understanding of the world. In this thesis we address aspects of two core tasks in TIE. First, we provide a new corpus of labeled temporal relations between events and temporal expressions, dense enough to facilitate a change in research directions from relation classification to identification, and present a system designed to address corresponding new challenges. Second, we implement a novel approach for the …


Echolocation: Using Word-Burst Analysis To Rescore Keyword Search Candidates In Low-Resource Languages, Justin Richards Jun 2014

Echolocation: Using Word-Burst Analysis To Rescore Keyword Search Candidates In Low-Resource Languages, Justin Richards

Dissertations, Theses, and Capstone Projects

State of the art technologies for speech recognition are very accurate for heavily studied languages like English. They perform poorly, though, for languages wherein the recorded archives of speech data available to researchers are relatively scant. In the context of these low-resource languages, the task of keyword search within recorded speech is formidable. We demonstrate a method that generates more accurate keyword search results on low-resource languages by studying a pattern not exploited by the speech recognizer. The word-burst, or burstiness, pattern is the tendency for word utterances to appear together in bursts as conversational topics fluctuate. We give evidence …


Automated Classification Of Argument Stance In Student Essays: A Linguistically Motivated Approach With An Application For Supporting Argument Summarization, Adam Robert Faulkner Jun 2014

Automated Classification Of Argument Stance In Student Essays: A Linguistically Motivated Approach With An Application For Supporting Argument Summarization, Adam Robert Faulkner

Dissertations, Theses, and Capstone Projects

This study describes a set of document- and sentence-level classification models designed to automate the task of determining the argument stance (for or against) of a student argumentative essay and the task of identifying any arguments in the essay that provide reasons in support of that stance. A suggested application utilizing these models is presented which involves the automated extraction of a single-sentence summary of an argumentative essay. This summary sentence indicates the overall argument stance of the essay from which the sentence was extracted and provides a representative argument in support of that stance.

A novel set …