Open Access. Powered by Scholars. Published by Universities.®
- Institution
- Keyword
-
- Computational linguistics (3)
- Automatic speech recognition (2)
- Natural Language Processing (2)
- Natural language processing (2)
- Amharic (1)
-
- Appalachian English (1)
- Arabian Nights (1)
- Arbitrary Cases (1)
- Artificial intelligence (1)
- Bare Noun (1)
- Communication (1)
- Computational Linguistics (1)
- Computational Modeling (1)
- Computer mediated communication (1)
- Context (1)
- Corpus Analysis (1)
- Covert Determiner (1)
- DATR (1)
- Dependency Parsing (1)
- Diacritization (1)
- Dictionary-based keyword extraction (1)
- Digital Humanities (1)
- Distant Reading (1)
- Doc2Vec (1)
- Economic geography (1)
- Ethical AI (1)
- Ethiopic (1)
- Finite-state transducer (1)
- Foreign Language Education (1)
- Frame Story (1)
- Publication Year
Articles 1 - 16 of 16
Full-Text Articles in Computational Linguistics
Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch
Expanding The Corpus Of Vocalized Hebrew Text: Compiling An Unvocalized Text Corpus And Building An Online Interface For Vocalization Annotation, Rachel Shanblatt Bloch
Dissertations, Theses, and Capstone Projects
Written modern Hebrew presents a unique challenge for training computational models for language processing because modern Hebrew text often lacks vocalization. The lack of available vocalized Hebrew data can lead to ambiguity in training these models and generally hinders work on natural language processing problems. The goal of this project is to contribute to the collection of vocalized Hebrew text by collecting and preprocessing a large corpus of unvocalized Hebrew text and building an online annotation tool. The annotation tool allows people to upload unvocalized Hebrew text, to annotate by adding Hebrew vocalization, and to download comma-separated values files of …
Destined Failure, Chengjun Pan
Destined Failure, Chengjun Pan
Masters Theses
I attempt to examine the complex structure of human communication, explaining why it is bound to fail. By reproducing experienceable phenomena, I demonstrate how they can expose communication structure and reveal the limitations of our perception and symbolization.I divide the process of communication into six stages: input, detection, symbolization, dictionary, interpretation, and output. In this thesis, I examine the flaws and challenges that arise in the first five stages. I argue that reception acts as a filter and that understanding relies on a symbolic system that is full of redundancies. Therefore, every interpretation is destined to be a deviation.
Covert Determiners In Appalachian English Narrative Declarative Sentences, William Oliver
Covert Determiners In Appalachian English Narrative Declarative Sentences, William Oliver
Dissertations, Theses, and Capstone Projects
In this thesis, I explore the syntax and semantics of covert determiners (Ds) in matrix subject determiner phrases (DPs) with definite specific interpretations. To conduct my investigation, I used the Audio-Aligned and Parsed Corpus of Appalachian English (AAPCAppE), a million-word Penn Treebank corpus, and the software CorpusSearch, a Java program that searches Penn Treebank corpora. My research shows that Appalachian English contains a linguistic phenomenon where speakers drop the D, replacing overt Ds with covert Ds, in definite specific DPs. For example, where Standard English speakers say The doctor came by horseback, Appalachian speakers may use a covert D …
Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor
Detection And Morphological Analysis Of Novel Russian Loanwords, Yulia Spektor
Dissertations, Theses, and Capstone Projects
This paper investigates recent English loanwords in Russian and explores ways in which computational methods can help further theoretical research. The goal of the study is two-fold: to find new, previously unattested loanwords borrowed over the last decade and to examine the rate of adaptation of the new borrowings, attested by the degree to which they conform to the constraints of the Russian language. First, we train a finite-state pipeline that combines character n-gram language models, which encode phonotactic and lexical properties of loanwords, with a binary classifier to detect loanwords. The model achieves state-of-the-art performance results during evaluation, surpassing …
The Public Innovations Explorer: A Geo-Spatial & Linked-Data Visualization Platform For Publicly Funded Innovation Research In The United States, Seth Schimmel
Dissertations, Theses, and Capstone Projects
The Public Innovations Explorer (https://sethsch.github.io/innovations-explorer/app/index.html) is a web-based tool created using Node.js, D3.js and Leaflet.js that can be used for investigating awards made by Federal agencies and departments participating in the Small Business Innovation Research (SBIR) and Small Business Technology Transfer (STTR) grant-making programs between 2008 and 2018. By geocoding the publicly available grants data from SBIR.gov, the Public Innovations Explorer allows users to identify companies performing publicly-funded innovative research in each congressional district and obtain dynamic district-level summaries of funding activity by agency and year. Applying spatial clustering techniques on districts' employment levels across major economic sectors provides users …
Plprepare: A Grammar Checker For Challenging Cases, Jacob Hoyos
Plprepare: A Grammar Checker For Challenging Cases, Jacob Hoyos
Electronic Theses and Dissertations
This study investigates one of the Polish language’s most arbitrary cases: the genitive masculine inanimate singular. It collects and ranks several guidelines to help language learners discern its proper usage and also introduces a framework to provide detailed feedback regarding arbitrary cases. The study tests this framework by implementing and evaluating a hybrid grammar checker called PLPrepare. PLPrepare performs similarly to other grammar checkers and is able to detect genitive case usages and provide feedback based on a number of error classifications.
When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller
When Misclassification Is Misgendering: Gender Prediction In The Context Of Trans Identities, Sean Miller
Dissertations, Theses, and Capstone Projects
As a subdomain of author profiling, gender prediction (sometimes called gender inference) has received a substantial amount of attention—both as a task in itself, and for other downstream analyses. Throughout the existing literature various statistical and machine learning methods have been applied to extract features in order to either characterize and differentiate female and male writing styles, or simply to achieve maximum accuracy on gender prediction as a binary classification task. However, researchers often do not disclose how they conceptualize gender nor do they consider the implications that gender prediction has for non-binary and trans individuals. Along with an overview …
A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco
A Computational Study In The Detection Of English–Spanish Code-Switches, Yohamy C. Polanco
Dissertations, Theses, and Capstone Projects
Code-switching is the linguistic phenomenon where a multilingual person alternates between two or more languages in a conversation, whether that be spoken or written. This thesis studies the automatic detection of code-switching occurring specifically between English and Spanish in two corpora.
Twitter and other social media sites have provided an abundance of linguistic data that is available to researchers to perform countless experiments. Collecting the data is fairly easy if a study is on monolingual text, but if a study requires code-switched data, this becomes a complication as APIs only accept one language as a parameter. This thesis focuses on …
On Polysemy: A Philosophical, Psycholinguistic, And Computational Study, Jiangtian Li
On Polysemy: A Philosophical, Psycholinguistic, And Computational Study, Jiangtian Li
Electronic Thesis and Dissertation Repository
Most words in natural languages are polysemous, that is they have related but different meanings in different contexts. These polysemous meanings (senses) are marked by their structuredness, flexibility, productivity, and regularity. Previous theories have focused on some of these features but not all of them together. Thus, I propose a new theory of polysemy, which has two components. First, word meaning is actively modulated by broad contexts in a continuous fashion. Second, clustering arises from contextual modulations of a word and is then entrenched in our long term memory to facilitate future production and processing. Hence, polysemous senses are entrenched …
Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke
Automatic Keyphrase Extraction From Russian-Language Scholarly Papers In Computational Linguistics, Yves Wienecke
University Honors Theses
The automatic extraction of keyphrases from scholarly papers is a necessary step for many Natural Language Processing (NLP) tasks, including text retrieval, machine translation, and text summarization. However, due to the different grammatical and semantic intricacies of languages, this is a highly language-dependent task. Many free and open source implementations of state-of-the-art keyphrase extraction techniques exist, but they are not adapted for processing Russian text. Furthermore, the multi-linguistic character of scholarly papers in the field of Russian computational linguistics and NLP introduces additional complexity to keyphrase extraction. This paper describes a free and open source program as a proof of …
Losing Shahrazad: A Distant Reading Of 1001 Nights, Taysa Mohler
Losing Shahrazad: A Distant Reading Of 1001 Nights, Taysa Mohler
Senior Projects Spring 2018
This project is a distant reading analysis of seven 19th and 20th-century English translations of One Thousand and One Nights or The Arabian Nights. Through the use of computer programming and distant reading, it becomes clear that the Nights' frame tale is the carrier of the internal logic and generative power of the story cycle. Further, the frame tale expresses the Nights' self-representation, which serves to undermine the historical use of the Nights as synecdoche for the Orient. Therefore, the translators that remove the frame story from their versions further the Nights' use as an Orientalist object, …
Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb
Generating Amharic Present Tense Verbs: A Network Morphology & Datr Account, T. Michael W. Halcomb
Theses and Dissertations--Linguistics
In this thesis I attempt to model, that is, computationally reproduce, the natural transmission (i.e. inflectional regularities) of twenty present tense Amharic verbs (i.e. triradicals beginning with consonants) as used by the language’s speakers. I root my approach in the linguistic theory of network morphology (NM) and model it using the DATR evaluator. In Chapter 1, I provide an overview of Amharic and discuss the fidel as an abugida, the verb system’s root-and-pattern morphology, and how radicals of each lexeme interacts with prefixes and suffixes. I offer an overview of NM in Chapter 2 and DATR in Chapter 3. In …
Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes
Misheard Me Oronyminator: Using Oronyms To Validate The Correctness Of Frequency Dictionaries, Jennifer G. Hughes
Master's Theses
In the field of speech recognition, an algorithm must learn to tell the difference between "a nice rock" and "a gneiss rock". These identical-sounding phrases are called oronyms. Word frequency dictionaries are often used by speech recognition systems to help resolve phonetic sequences with more than one possible orthographic phrase interpretation, by looking up which oronym of the root phonetic sequence contains the most-common words.
Our paper demonstrates a technique used to validate word frequency dictionary values. We chose to use frequency values from the UNISYN dictionary, which tallies each word on a per-occurance basis, using a proprietary text corpus, …
Statistical Machine Translation Of Japanese, Erik A. Chapla
Statistical Machine Translation Of Japanese, Erik A. Chapla
Theses and Dissertations
The purpose of this research was to find ways to improve the performance of a statistical machine translation system that translates text from Japanese to English. Methods included altering the training and test data by adding a prior linguistic knowledge, altering sentence structures, and looking for better ways to statistically alter the way words align between the two languages. In addition, methods for properly segmenting words in Japanese text through statistical methods were examined. Finally, experiments were conducted on Japanese speech to produce the best text transcription of the speech. The best statistical machine translation methods implemented resulted in improvements …
Multilingual Phoneme Models For Rapid Speech Processing System Development, Eric G. Hansen
Multilingual Phoneme Models For Rapid Speech Processing System Development, Eric G. Hansen
Theses and Dissertations
Current speech recognition systems tend to be developed only for commercially viable languages. The resources needed for a typical speech recognition system include hundreds of hours of transcribed speech for acoustic models and 10 to 100 million words of text for language models; both of these requirements can be costly in time and money. The goal of this research is to facilitate rapid development of speech systems to new languages by using multilingual phoneme models to alleviate requirements for large amounts of transcribed speech. The Global Phone database, winch contains transcribed speech from 15 languages, is used as source data …
Speech Recognition Using The Mellin Transform, Jesse R. Hornback
Speech Recognition Using The Mellin Transform, Jesse R. Hornback
Theses and Dissertations
The purpose of this research was to improve performance in speech recognition. Specifically, a new approach was investigating by applying an integral transform known as the Mellin transform (MT) on the output of an auditory model to improve the recognition rate of phonemes through the scale-invariance property of the Mellin transform. Scale-invariance means that as a time-domain signal is subjected to dilations, the distribution of the signal in the MT domain remains unaffected. An auditory model was used to transform speech waveforms into images representing how the brain "sees" a sound. The MT was applied and features were extracted. The …