Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 4 of 4

Full-Text Articles in Computational Linguistics

Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira Jan 2023

Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira

Publications and Research

This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (Mage = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological …


A Lexical Frequency Analysis Of Irish Sign Language, Robert G Smith, Markus Hofmann Sep 2020

A Lexical Frequency Analysis Of Irish Sign Language, Robert G Smith, Markus Hofmann

Other Resources

Word frequency has a significant impact on language acquisition and fluency. It is often a point of reference for the teaching and assessing of a language and indeed, as a control for psycholinguistic studies. This paper presents the results of the first objective frequency analysis of lexical tokens from the Signs of Ireland corpus. We investigate the frequency of fully lexical, partly lexical and non-lexical signs in Irish Sign Language as they are presented in the corpus. We confirm the accuracy of the lexical gloss frequency data with a supplementary corpus subset that is tagged for grammatical class and additional …


Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher Jul 2019

Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find …


General Analysis Of An Online Language Corpus, Kerwin A. Livingstone May 2015

General Analysis Of An Online Language Corpus, Kerwin A. Livingstone

Kerwin A. Livingstone

Corpus-based research is rapidly gaining ground in the field of Applied Linguistics. More interesting is the evidence of many online language corpora which can be easily accessed, with just the click of the mouse. A quick navigation of the Web will produce different kinds of corpora in a vast number of language areas. Given the need to find new and exciting ways to improve the language learning and teaching process, corpus linguistics does have potential for generating significant learner experiences. Taking into consideration the above-mentioned, this paper deals with the general analysis of an online language corpus. The specific corpus …