Open Access. Powered by Scholars. Published by Universities.®

Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

5,963 Full-Text Articles 5,483 Authors 2,582,275 Downloads 204 Institutions

All Articles in Linguistics

Faceted Search

5,963 full-text articles. Page 1 of 172.

Lightning Talk: The Language Archive: Migrating To An Easier, Sustainable Open-Source Solution., Jeroen Geerts 2019 Max Planck Institute Nijmegen

Lightning Talk: The Language Archive: Migrating To An Easier, Sustainable Open-Source Solution., Jeroen Geerts

Digital Initiatives Symposium

The Language Archive at the Max Planck Institute Nijmegen (https://archive.mpi.nl) is an extensive online repository of language resources. The archive was developed using in-house solutions, including metadata creation tools, depositing tools and an archive-browser. Development had been going on for more than 15 years, but was difficult and expensive to maintain. Additionally, some of these tools had were fairly complex to use, not meeting current user needs. Therefore, the choice was made to migrate to a more sustainable open-source solution, easier to use, maintain and to develop upon.

This presentation will provide insight in choosing a new ...


Multicompetence, Multiple Intelligences And First-Year Composition Students, Patricia Rice'-Daniels 2019 California State University - San Bernardino

Multicompetence, Multiple Intelligences And First-Year Composition Students, Patricia Rice'-Daniels

Electronic Theses, Projects, and Dissertations

The purpose of this study is to examine, gain, and ultimately share an understanding of certain cognitive differences, similarities, intelligence patterns, and preferences between competent monolingual (English) and multicompetent bilingual/multilingual first-year composition (FYC) college students. Within this project is an attempt to address the following questions: Do monolingual and bilingual/multilingual FYC students show different strengths and weaknesses in their cognitive abilities? Are there learning preferences and literacy differences or similarities between monolingual and bilingual/multilingual FYC students?

Primarily, two cognitive concepts were used in this examination to provide perspectives and quantitative data in response to the above questions ...


L2 Learners And The Intelligiblity Of The Bostonian And Californian Accents, Russell Paul Kapryn 2019 California State University - San Bernardino

L2 Learners And The Intelligiblity Of The Bostonian And Californian Accents, Russell Paul Kapryn

Electronic Theses, Projects, and Dissertations

This thesis investigates issues of intelligibility through the lens and focus of prosody when the Bostonian and Los Angeles-based accents are heard in casually occurring conversation by native and non-native speakers. Over the spring and summer of 2017, six native speakers and 11 non-native speakers of English were interviewed from having listened to two 2.5 minute audio sample clips of speakers who have these accents. Respondents were asked questions such as what was difficult or easy or whether they could summarize the recordings for me. Findings indicate that while the native speakers often had difficulty with vocabulary due to ...


Generative Linguistics And Neural Networks At 60: Foundation, Friction, And Fusion, Joe Pater 2019 Selected Works

Generative Linguistics And Neural Networks At 60: Foundation, Friction, And Fusion, Joe Pater

Joe Pater

The birthdate of both generative linguistics and neural networks can be taken as 1957, the year of the publication of foundational work by both Noam Chomsky and Frank Rosenblatt. This paper traces the development of these two approaches to cognitive science, from their largely autonomous early development in their first thirty years, through their collision in the 1980s around the past tense debate (Rumelhart and McClelland 1986, Pinker and Prince 1988), and their integration in much subsequent work up to the present. Although this integration has produced a considerable body of results, the continued general gulf between these two lines ...


Corpus Of Usage Examples: What Is It Good For?, Timofey Arkhangelskiy 2019 Universität Hamburg, Alexander von Humboldt Foundation

Corpus Of Usage Examples: What Is It Good For?, Timofey Arkhangelskiy

Proceedings of the Workshop on Computational Methods for Endangered Languages

Lexicography and corpus studies of grammar have a long history of fruitful interaction. For the most part, however, this has been a one-way relationship. Lexicographers have extensively used corpora to identify previously undetected word senses or find natural usage examples; using lexicographic materials when conducting data-driven investigations of grammar, on the other hand, is hardly commonplace. In this paper, I present a Beserman Udmurt corpus made out of "artificial" dictionary examples. I argue that, although such a corpus can not be used for certain kinds of corpus-based research, it is nevertheless a very useful tool for writing a reference grammar ...


Developing Without Developers: Choosing Labor-Saving Tools For Language Documentation Apps, Luke D. Gessler 2019 Georgetown University

Developing Without Developers: Choosing Labor-Saving Tools For Language Documentation Apps, Luke D. Gessler

Proceedings of the Workshop on Computational Methods for Endangered Languages

Application software has the potential to greatly reduce the amount of human labor needed in common language documentation tasks. But despite great advances in the maturity of tools available for apps, language documentation apps have not attained their full potential, and language documentation projects are forgoing apps in favor of less specialized tools like paper and spreadsheets. We argue that this is due to the scarcity of software development labor in language documentation, and that a careful choice of software development tools could make up for this labor shortage by increasing developer productivity. We demonstrate the benefits of strategic tool ...


Applying Support Vector Machines To Pos Tagging Of The Ainu Language, Karol Nowakowski, Michal Ptaszynski, Fumito Masui, Yoshio Momouchi 2019 Kitami Institute of Technology

Applying Support Vector Machines To Pos Tagging Of The Ainu Language, Karol Nowakowski, Michal Ptaszynski, Fumito Masui, Yoshio Momouchi

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Ocr Evaluation Tools For The 21st Century, Eddie A. Santos 2019 National Research Council Canada, University of Alberta

Ocr Evaluation Tools For The 21st Century, Eddie A. Santos

Proceedings of the Workshop on Computational Methods for Endangered Languages

We introduce ocreval, a port of the ISRI OCR Evaluation Tools, now with Unicode support. We describe how we upgraded the ISRI OCR Evaluation Tools to support modern text processing tasks. ocreval supports producing character-level and word-level accuracy reports, supporting all characters representable in the UTF-8 character encoding scheme. In addition, we have implemented the Unicode default word boundary specification in order to support word-level accuracy reports for a broad range of writing systems. We argue that character-level and word-level accuracy reports produce confusion matrices that are useful for tasks beyond OCR evaluation—including tasks supporting the study and computational ...


Building A Common Voice Corpus For Laiholh (Hakha Chin), Kelly Berkson, Samson Lotven, Peng Hlei Thang, Thomas Thawngza, Zai Sung, James C. Wamsley, Francis Tyers, Kenneth Van Bik, Sandra Kübler, Donald Williamson, Matthew Anderson 2019 Indiana University

Building A Common Voice Corpus For Laiholh (Hakha Chin), Kelly Berkson, Samson Lotven, Peng Hlei Thang, Thomas Thawngza, Zai Sung, James C. Wamsley, Francis Tyers, Kenneth Van Bik, Sandra Kübler, Donald Williamson, Matthew Anderson

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Bootstrapping A Neural Morphological Analyzer For St. Lawrence Island Yupik From A Finite-State Transducer, Lane Schwartz, Emily Chen, Benjamin Hunt, Sylvia LR Schreiner 2019 University of Illinois at Urbana-Champaign

Bootstrapping A Neural Morphological Analyzer For St. Lawrence Island Yupik From A Finite-State Transducer, Lane Schwartz, Emily Chen, Benjamin Hunt, Sylvia Lr Schreiner

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Future Directions In Technological Support For Language Documentation, Daan van Esch, Ben Foley, Nay San 2019 Google

Future Directions In Technological Support For Language Documentation, Daan Van Esch, Ben Foley, Nay San

Proceedings of the Workshop on Computational Methods for Endangered Languages

To reduce the annotation burden placed on linguistic fieldworkers, freeing up time for deeper linguistic analysis and descriptive work, the language documentation community has been working with machine learning researchers to investigate what assistive role technology can play, with promising early results. This paper describes a number of potential follow-up technical projects that we believe would be worthwhile and straightforward to do. We provide examples of the annotation tasks for computer scientists; descriptions of the technological challenges involved and the estimated level of complexity; and pointers to relevant literature. We hope providing a clear overview of what the needs are ...


Handling Cross-Cutting Properties In Automatic Inference Of Lexical Classes: A Case Study Of Chintang, Olga Zamaraeva, Kristen Howell, Emily M. Bender 2019 University of Washington

Handling Cross-Cutting Properties In Automatic Inference Of Lexical Classes: A Case Study Of Chintang, Olga Zamaraeva, Kristen Howell, Emily M. Bender

Proceedings of the Workshop on Computational Methods for Endangered Languages

In the context of the ongoing AGGREGATION project concerned with inferring grammars from interlinear glossed text, we explore the integration of morphological patterns extracted from IGT data with inferred syntactic properties in the context of creating implemented linguistic grammars. We present a case study of Chintang, in which we put emphasis on evaluating the accuracy of these predictions by using them to generate a grammar and parse running text. Our coverage over the corpus is low because the lexicon produced by our system only includes intransitive and transitive verbs and nouns, but it outperforms an expert-built, oracle grammar of similar ...


An Online Platform For Community-Based Language Description And Documentation, Rebecca Everson, Wolf Honoré, Scott Grimm 2019 Independent

An Online Platform For Community-Based Language Description And Documentation, Rebecca Everson, Wolf Honoré, Scott Grimm

Proceedings of the Workshop on Computational Methods for Endangered Languages

We present two pieces of interlocking technology in development to facilitate community-based, collaborative language description and documentation: (i) a mobile app where speakers submit text, voice recordings and/or videos, and (ii) a community language portal that organizes submitted data and provides question/answer boards whereby community members can evaluate/supplement submissions.


A Software-Driven Workflow For The Reuse Of Language Documentation Data In Typological Studies, Stephan Druskat, Kilu von Prince 2019 Humboldt-Universität zu Berlin

A Software-Driven Workflow For The Reuse Of Language Documentation Data In Typological Studies, Stephan Druskat, Kilu Von Prince

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Towards A General-Purpose Linguistic Annotation Backend, Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang 2019 Carnegie Mellon University

Towards A General-Purpose Linguistic Annotation Backend, Graham Neubig, Patrick Littell, Chian-Yu Chen, Jean Lee, Zirui Li, Yu-Hsiang Lin, Yuyan Zhang

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Digital Dictionary Development For Torwali, A Less-Studied Language: Process And Challenges, Inam Ullah 2019 Torwali Research Forum

Digital Dictionary Development For Torwali, A Less-Studied Language: Process And Challenges, Inam Ullah

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Bootstrapping A Neural Morphological Generator From Morphological Analyzer Output For Inuktitut, Jeffrey Micher 2019 US Army Research Laboratory

Bootstrapping A Neural Morphological Generator From Morphological Analyzer Output For Inuktitut, Jeffrey Micher

Proceedings of the Workshop on Computational Methods for Endangered Languages

No abstract provided.


Finding Sami Cognates With A Character-Based Nmt Approach, Mika Hämäläinen, Jack Rueter 2019 University of Helsinki

Finding Sami Cognates With A Character-Based Nmt Approach, Mika HäMäLäInen, Jack Rueter

Proceedings of the Workshop on Computational Methods for Endangered Languages

We approach the problem of expanding the set of cognate relations with a sequence-to-sequence NMT model. The language pair of interest, Skolt Sami and North Sami, has too limited a set of parallel data for an NMT model as such. We solve this problem on the one hand, by training the model with North Sami cognates with other Uralic languages and, on the other, by generating more synthetic training data with an SMT model. The cognates found using our method are made publicly available in the Online Dictionary of Uralic Languages.


Seeing More Than Whitespace — Tokenisation And Disambiguation In A North SáMi Grammar Checker, Linda Wiechetek, Kevin B. Unhammer, Sjur N. Moshagen 2019 UiT The Arctic University of Norway

Seeing More Than Whitespace — Tokenisation And Disambiguation In A North SáMi Grammar Checker, Linda Wiechetek, Kevin B. Unhammer, Sjur N. Moshagen

Proceedings of the Workshop on Computational Methods for Endangered Languages

Communities of lesser resourced languages like North Sámi benefit from language tools such as spell checkers and grammar checkers to improve literacy. Accurate error feedback is dependent on well-tokenised input, but traditional tokenisation as shallow preprocessing is inadequate to solve the challenges of real-world language usage. We present an alternative where tokenisation remains ambiguous until we have linguistic context information available. This lets us accurately detect sentence boundaries, multiwords and compound error detection. We describe a North Sámi grammar checker with such a tokenisation system, and show the results of its evaluation.


Improving Low-Resource Morphological Learning With Intermediate Forms From Finite State Transducers, Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden 2019 University of Colorado

Improving Low-Resource Morphological Learning With Intermediate Forms From Finite State Transducers, Sarah Moeller, Ghazaleh Kazeminejad, Andrew Cowell, Mans Hulden

Proceedings of the Workshop on Computational Methods for Endangered Languages

Neural encoder-decoder models are usually applied to morphology learning as an end-to-end process without considering the underlying phonological representations that linguists posit as abstract forms before morphophonological rules are applied. Finite State Transducers for morphology, on the other hand, are developed to contain these underlying forms as an intermediate representation. This paper shows that training a bidirectional two-step encoder-decoder model of Arapaho verbs to learn two separate mappings between tags and abstract morphemes and morphemes and surface allomorphs improves results when training data is limited to 10,000 to 30,000 examples of inflected word forms.


Digital Commons powered by bepress