Open Access. Powered by Scholars. Published by Universities.®

Social and Behavioral Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Corpus

Discipline
Institution
Publication Year
Publication
Publication Type
File Type

Articles 1 - 30 of 32

Full-Text Articles in Social and Behavioral Sciences

Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira Jan 2023

Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira

Publications and Research

This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (Mage = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological …


Employing A Parallel Corpus-Based Approach In Teaching Semantic Prosody And Collocational Behavior To Arabic Efl Learners, Alhassan Abdullah J Alzahrani Aug 2021

Employing A Parallel Corpus-Based Approach In Teaching Semantic Prosody And Collocational Behavior To Arabic Efl Learners, Alhassan Abdullah J Alzahrani

Linguistics & TESOL Dissertations

This dissertation is intended to investigate if, and to what extent, a web-interface parallel corpus known as Reverso Context can assist Arabic EFL learners in addressing two aspects of word knowledge: semantic prosody and collocational behavior. A convergent mixed method design is adopted in this study in which one group of undergraduate L1 Arabic students are asked to do a pretest that is followed by a pedagogical intervention over the course of three 3-hour sessions and then a posttest is administered again with the same group of students. The posttest is followed by a one-on-one interview with the students and …


The Importance Of Linguistic Models In The Development Of Language Bases, Guli Ibragimovna Toirova Jan 2021

The Importance Of Linguistic Models In The Development Of Language Bases, Guli Ibragimovna Toirova

Scientific reports of Bukhara State University

Relevance. In Uzbek linguistics, a number of studies have been carried out on automatic translation, the development of the linguistic foundations of the author's corpus, the processing of lexicographic texts and linguistic-statistical analysis. However, the processing of the Uzbek language as the language of the Internet: spelling, automatic processing and translation programs, search programs for various characters, text generation, the linguistic basis of the text corpus and national corpus, the technology of its software is not studied in any monograph. The article discusses such problems as: the transformation of language into the language of the Internet, computer technology, mathematical linguistics, …


Corpus Linguistics Is A Priority Area Of Modern Applied Linguistics, Shakhnoza Kakhramonovna Gulyamova Aug 2020

Corpus Linguistics Is A Priority Area Of Modern Applied Linguistics, Shakhnoza Kakhramonovna Gulyamova

Scientific reports of Bukhara State University

This article describes an independent branch of computational linguistics - corpus linguistics, which is the main and most promising direction of modern applied linguistics. Based on the essence, goals and objectives of corpus linguistics, the results achieved in world linguistics, the scientific views of a number of scientists are summarized. It was noted that the creation of a national corpus in the Uzbek language is one of the urgent tasks facing science, and comments were made on this.


Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher Jul 2019

Synthetic, Yet Natural: Properties Of Wordnet Random Walk Corpora And The Impact Of Rare Words On Embedding Performance, Filip Klubicka, Alfredo Maldonado, Abhijit Mahalunkar, John D. Kelleher

Conference papers

Creating word embeddings that reflect semantic relationships encoded in lexical knowledge resources is an open challenge. One approach is to use a random walk over a knowledge graph to generate a pseudo-corpus and use this corpus to train embeddings. However, the effect of the shape of the knowledge graph on the generated pseudo-corpora, and on the resulting word embeddings, has not been studied. To explore this, we use English WordNet, constrained to the taxonomic (tree-like) portion of the graph, as a case study. We investigate the properties of the generated pseudo-corpora, and their impact on the resulting embeddings. We find …


The Results Of The Corpus Analysis As A Quantitative Representation Of Linguocultural Concepts, Nozimjon Ataboyev Phd Student Jun 2019

The Results Of The Corpus Analysis As A Quantitative Representation Of Linguocultural Concepts, Nozimjon Ataboyev Phd Student

Philology Matters

The article deals with the relationship bet­ween the newly emerged and fast developing corpus linguistics as a research methodology and the sphere of linguoculturology. The language corpora are regarded as the source of examples representing cultural notions in the process of creating conceptual frames. In particular, the results of corpus analysis have been based on the modern English corpora as “COCA” and “BNC”. The examples related to the application of the notion of “family” have been analyzed by means of several searches through the concordance on the corpus platform. In order to make the final conclusions, the gathered results have …


Modeling Melodic Dictation, David John Baker Jun 2019

Modeling Melodic Dictation, David John Baker

LSU Doctoral Dissertations

Melodic dictation is a cognitively demanding process that requires students to hear a melody, then without any access to an external reference, transcribe the melody within a limited time frame. Despite its ubiquity in curricula within School of Music settings, exactly how an individual learns a melody is not well understood. This dissertation aims to fill the gap in the literature between aural skills practitioners and music psychologists in order to reach conclusions that can be applied systematically in pedagogical contexts. In order to do this, I synthesize literature from music theory, music psychology, and music education in order to …


Designing A Russian Idiom-Annotated Corpus, Katsiaryna Aharodnik, Anna Feldman, Jing Peng Jan 2019

Designing A Russian Idiom-Annotated Corpus, Katsiaryna Aharodnik, Anna Feldman, Jing Peng

Department of Linguistics Faculty Scholarship and Creative Works

This paper describes the development of an idiom-annotated corpus of Russian. The corpus is compiled from freely available resources online and contains texts of different genres. The idiom extraction, annotation procedure, and a pilot experiment using the new corpus are outlined in the paper. Considering the scarcity of publicly available Russian annotated corpora, the corpus is a much-needed resource that can be utilized for literary and linguistic studies, pedagogy as well as for various Natural Language Processing tasks.


Analysis Of Some Problematic Situations In Uzbek, F. I. Khakimova, A. Sh. Esanov Sep 2018

Analysis Of Some Problematic Situations In Uzbek, F. I. Khakimova, A. Sh. Esanov

Central Asian Problems of Modern Science and Education

In this article, authors discuss the issues related to the creation of language corpus of Uzbek Language. In this regard, the matter, such as the collection of language units in the corpus, different national corpus and difficulties in creating the corpus of Uzbek language, are thoroughly analyzed and a number of proposals are suggested.


Russian Sentence Corpus: Benchmark Measures Of Eye Movements In Reading In Russian, Anna K. Laurinavichyute, Irina A. Sekerina, Svetlana Alexeeva, Kristina Bagdasaryan, Reinhold Kliegl Jun 2018

Russian Sentence Corpus: Benchmark Measures Of Eye Movements In Reading In Russian, Anna K. Laurinavichyute, Irina A. Sekerina, Svetlana Alexeeva, Kristina Bagdasaryan, Reinhold Kliegl

Publications and Research

This article introduces a new corpus of eye movements in silent reading—the Russian Sentence Corpus (RSC). Russian uses the Cyrillic script, which has not yet been investigated in cross-linguistic eye movement research. As in every language studied so far, we confirmed the expected effects of low-level parameters, such as word length, frequency, and predictability, on the eye movements of skilled Russian readers. These findings allow us to add Slavic languages using Cyrillic script (exemplified by Russian) to the growing number of languages with different orthographies, ranging from the Roman-based European languages to logographic Asian ones, whose basic eye movement benchmarks …


Perempuan Atau Wanita? Perbandingan Berbasis Korpus Tentang Leksikon Berbias Gender, Susi Yuliawati Apr 2018

Perempuan Atau Wanita? Perbandingan Berbasis Korpus Tentang Leksikon Berbias Gender, Susi Yuliawati

Paradigma: Jurnal Kajian Budaya

Amidst the debates over the most appropriate Indonesian term for ‘woman’, the present research examines the use of the gendered terms perempuan and wanita. The aim of this research is to reveal which term is more preferable and how the terms are used to talk about women. Using corpus-based approach, this study compared the frequency and pattern of word usage of perempuan and wanita obtained from two corpora, namely IndonesianWac and ind_mixed_2013. The research used a mixed-method design in which quantitative analysis was used to identify word frequency and to measure significant collocation, while qualitative analysis was used to determine …


Pun Strategies Across Joke Schemata: A Corpus-Based Study, Robert Nishan Crapo Apr 2018

Pun Strategies Across Joke Schemata: A Corpus-Based Study, Robert Nishan Crapo

Theses and Dissertations

In the linguistic study of humor, research has largely been centered around the formulation of models and theories or the dissecting and categorization of jokes. Because of the often difficult-to-categorize aspects of verbal jokes, much time has been spent trying to create taxonomies for humor types and mechanisms. Linguists such as Raskin and Attardo have sought to categorize all verbal humor according to various functional elements (Attardo & Raskin, 1991). Such elements include, but are not limited to, the logical mechanism that drives the humor in the joke or the situation where the joke takes place. These categorizations are helpful …


Comparing The Awl And Avl In Textbooks From An Intensive English Program, Michelle Morgan Hernandez Jul 2017

Comparing The Awl And Avl In Textbooks From An Intensive English Program, Michelle Morgan Hernandez

Theses and Dissertations

Academic vocabulary is an important determiner of academic success for both native and non-native speakers of English (Corson, 1997; Gardner, 2013; Hsueh-chao & Nation, 2000). In an attempt to address this need, Coxhead (2000) developed the Academic Word List (AWL)—a list of words common across a range of academic disciplines; however, Gardner & Davies (2014) identified potential limitations in the AWL and have more recently produced their own list of core academic vocabulary—the Academic Vocabulary List (AVL). This study compares the occurrences of the AWL and AVL word families in an intensive English program (IEP) corpus of 50 texts to …


Applying Corpus-Assisted Critical Discourse Analysis To An Unrestricted Corpus: A Case Study In Indonesian And Malay Newspapers, Sara Luanne White Jul 2017

Applying Corpus-Assisted Critical Discourse Analysis To An Unrestricted Corpus: A Case Study In Indonesian And Malay Newspapers, Sara Luanne White

Theses and Dissertations

In 2008, Baker et al. proposed a nine-step method that combines quantitative corpus linguistics with qualitative critical discourse analysis. To date this cycle has only been used to analyze a single language with a restricted corpus. Can this method, originally designed for this narrow focus, be applied cross-culturally to an unrestricted corpus? There are two over-arching goals for this paper, one linguistic and one methodological. The first goal is to learn about language ideologies in Indonesian and Malay newspapers; the second goal is to evaluate the efficacy of a mixed-methods corpus-driven approach to discourse analysis using the methods proposed by …


Style And Flow: A Commentary On Duinker & Martin, Jonah Katz Jan 2017

Style And Flow: A Commentary On Duinker & Martin, Jonah Katz

Faculty & Staff Scholarship

Duinker and Martin’s excellent study presents a wealth of new data, findings, and analyses. It represents a welcome focus on the details of musical aspects of hip-hop, as well as an effort to combine those details with more global aspects of recordings in order to clarify what our notions of hip-hop ‘style’ or ‘sound’ are based on. The examination of instrumental backgrounds and production parameters is particularly novel. I would suggest, however, that the study could have benefitted from the use of details pertaining to flow, particularly in the examination of trends over time and stylistic sub-groupings. I show that …


Nominalized Adverbs In Spanish: The Intriguing Case Of Detrás Mío And Its Cohorts, David Eddington Jan 2017

Nominalized Adverbs In Spanish: The Intriguing Case Of Detrás Mío And Its Cohorts, David Eddington

Faculty Publications

Instances of adverbs modified by adjectives (e.g. detrás mío, delante tuyo) were extracted from the Corpus del Español. The corpus analysis reveals that these constructions are attested in all 21 Spanish-speaking countries to varying degrees, but are most frequent in Argentina and Uruguay. Adjectives following the adverbs in questions are predominantly masculine; however, in Peninsular varieties feminine forms are quite common. Although alrededor and lado are both adverbs as well as masculine nouns, they are occasionally followed by feminine adjectives (e.g. al lado suya), which is arguably due to the use of the feminine in other constructions such as encima …


The Reflection And Reification Of Racialized Language In Popular Media, Kelly E. Wright Jan 2017

The Reflection And Reification Of Racialized Language In Popular Media, Kelly E. Wright

Theses and Dissertations--Linguistics

This work highlights specific lexical items that have become racialized in specific contextual applications and tests how these words are cognitively processed. This work presents the results of a visual world (Huettig et al 2011) eye-tracking study designed to determine the perception and application of racialized (Coates 2011) adjectives. To objectively select the racialized adjectives used, I developed a corpus comprised of popular media sources, designed specifically to suit my research question. I collected publications from digital media sources such as Sports Illustrated, USA Today, and Fortune by scraping articles featuring specific search terms from their websites. This experiment seeks …


A Corpus-Based Comparison Of The Academic Word List And The Academic Vocabulary List, Jacob Andrew Newman Jul 2016

A Corpus-Based Comparison Of The Academic Word List And The Academic Vocabulary List, Jacob Andrew Newman

Theses and Dissertations

Research has identified the importance of academic vocabulary (e.g., Corson, 1997; Gardner, 2013; Hsueh-chao & Nation, 2000). In turn, many researchers have focused on identifying the most frequent and salient words present in academic texts across registers and presenting these words in lists, such as The Academic Word List (AWL) (Coxhead, 2000). Gardner and Davies (2014), recognizing the limitations of the AWL, have developed a new list known as The Academic Vocabulary List (AVL). This present study examines the appearance of the 570 AWL word families and the top 570 AVL word families in the Academic Textbook Corpus (ATC) – …


Lexical Trends In Young Adult Literature: A Corpus-Based Approach, Kyra Mckinzie Nelson Mar 2016

Lexical Trends In Young Adult Literature: A Corpus-Based Approach, Kyra Mckinzie Nelson

Theses and Dissertations

Young Adult (YA) literature is widely read and published, yet few linguistic studies have researched it. With an increasing push to include YA texts in the classroom, it becomes necessary to thoroughly research the linguistic nature of the register. A 1-million-word corpus of YA fiction and non-fiction texts was created. Children's and adult fiction corpora were taken from a subset of the Corpus of Contemporary American English (COCA) database. The study noted differences in use of modals and pronouns among children's, YA, and adult registers. Previous research has suggested that children's literature focus more on spatial relations, while adult literature …


A Corpus-Based Analysis Of Russian Word Order Patterns, Stephanie Kay Billings Dec 2015

A Corpus-Based Analysis Of Russian Word Order Patterns, Stephanie Kay Billings

Theses and Dissertations

Some scholars say that Russian syntax has free word order. However, other researchers claim that the basic word order of Russian is Subject, Verb, Object (SVO). Some researchers also assert that the use of different word orders may be influenced by various factors, including positions of discourse topic and focus, and register (spoken, fiction, academic, non-academic). In addition, corpora have been shown to be useful tools in gathering empirical linguistic data, and modern advances in computing have made corpora freely available and their use widespread. The Russian National Corpus is a large corpus of Russian that is widely used and …


Syriac Rhetorical Particles: Variable Second-Position Clitic Placement, Patrick Brendon Pearson Dec 2015

Syriac Rhetorical Particles: Variable Second-Position Clitic Placement, Patrick Brendon Pearson

Theses and Dissertations

Investigation on second-position clitic phenomena has steadily increased since Wackernagel’s (1892) observations. Researchers have applied contemporary clitic typology to various Semitic languages though Syriac has received little attention. This thesis identifies a group of Syriac rhetorical particles and describes their categorization as clitics, versus words or affixes. It establishes each of the Syriac particles as second-position clitics and provides evidence of this conclusion from a state-of-the-art digitized corpus of Syriac literature. Extending previous Syriac analyses, this thesis describes the nature of attachment of these second-position clitics as enclisis to either the first word or the first constituent/phrase of their domain. …


General Analysis Of An Online Language Corpus, Kerwin A. Livingstone May 2015

General Analysis Of An Online Language Corpus, Kerwin A. Livingstone

Kerwin A. Livingstone

Corpus-based research is rapidly gaining ground in the field of Applied Linguistics. More interesting is the evidence of many online language corpora which can be easily accessed, with just the click of the mouse. A quick navigation of the Web will produce different kinds of corpora in a vast number of language areas. Given the need to find new and exciting ways to improve the language learning and teaching process, corpus linguistics does have potential for generating significant learner experiences. Taking into consideration the above-mentioned, this paper deals with the general analysis of an online language corpus. The specific corpus …


Conditional Sentences In Egyptian Colloquial And Modern Standard Arabic: A Corpus Study, Randell S. Bentley Mar 2015

Conditional Sentences In Egyptian Colloquial And Modern Standard Arabic: A Corpus Study, Randell S. Bentley

Theses and Dissertations

This thesis examines the difference between conditional phrases in Egyptian Colloquial (EC) and Modern Standard Arabic (MSA). It focuses on two different conditional particles 'iḏa and law. Verb tenses featured after the conditional particle determine the difference between EC and MSA usage. Grammars for EC and MSA provide a prescriptive approach for a comparison with empirical data from Arabic corpora. The study uses data from the ArabiCorpus along with a corpus of Egyptian Colloquial that were compiled specifically for this study. The results of this study demonstrate that each particle (‘iḏa and law) and register (EC and …


Management Of Indigenous Knowledge (Ifa And Egungun) In Osun State, Nigeria, Tunde Idris Yusuf, Kayode Joseph Olusegun Jan 2015

Management Of Indigenous Knowledge (Ifa And Egungun) In Osun State, Nigeria, Tunde Idris Yusuf, Kayode Joseph Olusegun

Library Philosophy and Practice (e-journal)

This study discussed the Management of Indigenous Knowledge (ifa and egungun) in Osun state, Nigeria. The literature is replete with indigenous knowledge and its cultural expression and heritage. In order to successful carry out the study, a survey research method was adopted using interview as the main instrument and personal observation to complement the main instrument, data gathering, the interview questions was analyzed and interpreted. Thus, the study concluded that knowledge has to be documented and managed for future reference, aid in decision making, education and for archival purpose. Appropriate recommendations were put forward for solving the present situation. Some …


Pro-Drop And Word-Order Variation In Brazilian Portuguese: A Corpus Study, Stewart Daniel Smith Jul 2013

Pro-Drop And Word-Order Variation In Brazilian Portuguese: A Corpus Study, Stewart Daniel Smith

Theses and Dissertations

The present study examines certain syntactic properties of the Brazilian variety of Portuguese (BP): 1) BP is a pro-drop language with instances of both null subjects and covert objects, and 2) BP exhibits several possible word orders. To determine the frequency of pro-drop and word-order variations, the CDP (The Portuguese Corpus) was used to provide samples of transitive, main clauses, which were then categorized based on whether or not they had null subjects and covert objects. The clauses were also categorized according to word order. In addition to providing samples, the corpus allowed for the comparison of four different registers …


Iotacism And The Pattern Of Vowel Leveling In Roman To Byzantine Era Manuscripts: Perspectives From The Thomas Gignac Corpus, Craig Meister Jan 2012

Iotacism And The Pattern Of Vowel Leveling In Roman To Byzantine Era Manuscripts: Perspectives From The Thomas Gignac Corpus, Craig Meister

Student Works

After centuries of debate surrounding the change of the Greek simple vowels and diphthongs ι, υ, η, οι, and ει into the phoneme /i/, the process known as iotacism (sometimes referred to as itacism) has become not only an anomaly of philological analysis, but the phonetic reality of this vowel shift and leveling from the phonemes /i/, /oi/, /e:/, /y/, and /ei/ to /i/ have yet to be linguistically analyzed successfully within various systems of linguistic modeling. In order to fill this important gap within the history of the Greek language, this research seeks to use the Roman and Byzantine …


Analysis Of Four-Word Lexical Bundles In Published Resesarch Articles Written By Turkish Scholars, Betul Bal Nov 2010

Analysis Of Four-Word Lexical Bundles In Published Resesarch Articles Written By Turkish Scholars, Betul Bal

Applied Linguistics and English as a Second Language Theses

This study investigated the use of lexical bundles in research articles written in English by Turkish scholars. For the purpose of the study, a corpus of published research articles produced by Turkish scholars in six different academic disciplines was collected. The four-word lexical bundles that appeared at least twenty times in this one million word corpus were identified and further analyzed both structurally and functionally based on the previous taxonomies developed by Biber, Johansson, Leech, Conrad and Finegan (1999) and Biber, Conrad and Cortes (2004). The results of this study revealed that the lexical bundles found have structural correlates as …


Teaching Grammar And What Students Errors In The Use Of The English Auxiliary "Be" Can Tell Us, Arshad Abd Samad, Hawanum Hussein Jan 2010

Teaching Grammar And What Students Errors In The Use Of The English Auxiliary "Be" Can Tell Us, Arshad Abd Samad, Hawanum Hussein

Arshad Abd Samad

In teaching grammar, teachers often are faced with the dilemma of either emphasising the formal properties of the language or its meaning aspect. One of the more popular language teaching approaches of the last three decades has been the communicative approach. This approach has had a signifi cant impact on the teaching of grammar as its objective of communicative competence has led to a diminished role for grammar teaching. However, of late, numerous voices have advocated a more prominent role for grammar in achieving this objective. The question of whether to emphasise form or meaning remains central. Several theorists have …


Emotional Speech Corpus Construction, Annotation And Distribution, Brian Vaughan, Charlie Cullen, Spyros Kousidis, John Mcauley May 2008

Emotional Speech Corpus Construction, Annotation And Distribution, Brian Vaughan, Charlie Cullen, Spyros Kousidis, John Mcauley

Conference papers

This paper details a process of creating an emotional speech corpus by collecting natural emotional speech assets, analysisng and tagging them (for certain acoustic and linguistic features) and annotating them within an on-line database. The definition of specific metadata for use with an emotional speech corpus is crucial, in that poorly (or inaccurately) annotated assets are of little use in analysis. This problem is compounded by the lack of standardisation for speech corpora, particularly in relation to emotion content. The ISLE Metadata Initiative (IMDI) is the only cohesive attempt at corpus metadata standardisation performed thus far. Although not a comprehensive …


Mayanwiki: An Online, Consensus-Based Linguistic Corpus Of The Mayan Hieroglyphs, Robbie A. Haertel Dec 2007

Mayanwiki: An Online, Consensus-Based Linguistic Corpus Of The Mayan Hieroglyphs, Robbie A. Haertel

Theses and Dissertations

The writing system used by the ancient Maya civilization has intrigued researchers and aficionados for centuries. Now that it has mostly been deciphered, the emphasis in the field of Mayan epigraphy has shifted to a study of the system of phonological, morphological, and grammatical rules that once governed the language that the hieroglyphs encode. One of the most important resources for linguistic study of this type is a comprehensive, electronic corpus of texts to investigate phraseology, frequency information, and collocations. Because Mayan linguistic epigraphy is in the early stages, a publicly available, editable corpus would be an invaluable resource in …