Open Access. Powered by Scholars. Published by Universities.®
- Discipline
-
- Physical Sciences and Mathematics (83)
- Computer Sciences (76)
- Artificial Intelligence and Robotics (43)
- Arts and Humanities (43)
- Discourse and Text Linguistics (23)
-
- Semantics and Pragmatics (23)
- Phonetics and Phonology (21)
- Applied Linguistics (20)
- Communication (19)
- Psycholinguistics and Neurolinguistics (18)
- Psychology (18)
- Engineering (17)
- First and Second Language Acquisition (15)
- Language Description and Documentation (14)
- Computer Engineering (13)
- Other Computer Sciences (13)
- Library and Information Science (12)
- Syntax (12)
- Other Linguistics (11)
- Anthropological Linguistics and Sociolinguistics (10)
- Cognition and Perception (9)
- Communication Technology and New Media (9)
- Comparative and Historical Linguistics (9)
- Databases and Information Systems (9)
- Business (8)
- Data Science (8)
- Medicine and Health Sciences (8)
- Institution
-
- City University of New York (CUNY) (52)
- Technological University Dublin (29)
- University of Kentucky (15)
- Selected Works (13)
- University of Massachusetts Amherst (12)
-
- Brigham Young University (9)
- SelectedWorks (7)
- University of Nebraska - Lincoln (7)
- Chapman University (6)
- Western University (5)
- East Tennessee State University (4)
- Air Force Institute of Technology (3)
- Bard College (3)
- California Polytechnic State University, San Luis Obispo (3)
- Minnesota State University, Mankato (3)
- Montclair State University (3)
- University of Central Florida (3)
- Binghamton University (2)
- Boise State University (2)
- Claremont Colleges (2)
- Florida International University (2)
- Iowa State University (2)
- Portland State University (2)
- University of Tennessee, Knoxville (2)
- Ursinus College (2)
- Bowling Green State University (1)
- COBRA (1)
- Cedarville University (1)
- Clemson University (1)
- Dartmouth College (1)
- Keyword
-
- Computational linguistics (21)
- Natural Language Processing (15)
- Natural language processing (11)
- Machine learning (10)
- Computational Linguistics (7)
-
- Machine Learning (7)
- Dialogue (6)
- Linguistics (6)
- Phonology (6)
- Prosody (6)
- WordNet (6)
- Data mining (5)
- Deep learning (5)
- Prepositions (5)
- Sentiment analysis (5)
- Social media (5)
- Artificial intelligence (4)
- BERT (4)
- Computational linguistics: LSA, Second language assesment (4)
- Corpus (4)
- Corpus linguistics (4)
- Semantics (4)
- Situated Dialog (4)
- Sociolinguistics (4)
- Spatial Language (4)
- Spatial Templates (4)
- Stylometry (4)
- Text classification (4)
- Text mining (4)
- Authorship attribution (3)
- Publication Year
- Publication
-
- Dissertations, Theses, and Capstone Projects (47)
- Conference papers (18)
- Faculty Publications (10)
- Theses and Dissertations--Linguistics (9)
- Doctoral Dissertations (8)
-
- Masters Theses (6)
- Publications and Research (5)
- Articles (4)
- Communication Sciences and Disorders Faculty Articles and Research (4)
- Conference Papers (4)
- Electronic Thesis and Dissertation Repository (4)
- Yves Bestgen (4)
- Electronic Theses and Dissertations (3)
- Other Resources (3)
- Theses and Dissertations (3)
- CGU Faculty Publications and Research (2)
- CSE Conference and Workshop Papers (2)
- Commonwealth Computational Summit (2)
- Computer Science Summer Fellows (2)
- Department of Linguistics Faculty Scholarship and Creative Works (2)
- Dissertations (2)
- Electronic Literature Organization Conference 2020 (2)
- Elena Cotos (2)
- FIU Electronic Theses and Dissertations (2)
- Honors Theses (2)
- Jonathan Howell (2)
- Library Philosophy and Practice (e-journal) (2)
- Master's Theses (2)
- Northeast Journal of Complex Systems (NEJCS) (2)
- Robert Frank (2)
- Publication Type
- File Type
Articles 1 - 30 of 231
Full-Text Articles in Computational Linguistics
Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez
Uncovering The Mimicry Of Online Review Breadth And Depth And Its Subsequent Effect On Consumer Responses, Andrea Pelaez Martinez
Dissertations, Theses, and Capstone Projects
Word-of-mouth (WOM) in marketing occurs when consumers discuss a company's product or service or any consumption experience with their friends, family, and others with whom they have any relationship. With the advent of social media, this phenomenon has expanded rapidly into virtual environments where consumer conversation is enabled through chats, forums, social media posts, and online reviews. In response to this rapid growth of online WOM, academics and practitioners have focused their interest on this phenomenon and its implications on consumers, firms, and society. So far, the evidence of the critical role that online WOM plays in helping consumers make …
Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz
Skyler's Lunch, Noah Sherman, Autumn Boone, Hilaria Cruz
LING 590/Internet Language
Our class was studying the use of emojis across different platforms and wanted to explore how stories using emojis could impact young readers. Here, we try to translate the story of Skyler into emoji, providing translations along the way. We replace words completely with emoji, represent phrases with a few emoji, and use additional emoji to make sense of the content, including punctuation. In this book, we explore the character of Skyler, who is a picky eater. But they learn to eat the nutritious food that is good for them. In the end, they even get a reward!
Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez
Retórica Intercultural En El Discurso Académico Universitario: Las Funciones Retóricas De La Citación En Los Trabajos De Fin De Máster Escritos En Español Y En Inglés Por Hablantes Nativos Y No Nativos, David Sanchez-Jimenez
Publications and Research
This research derives from the interest in learning the cultural differences in citation practices in the academic genre of Master's thesis of native Spanish (Ee), non-native Filipino writers of Spanish (Fe), native Filipino writers of English (Fi), and American writers of English. A total of thirty-two (32) master´s theses – eight (8) for each group – were analyzed. A quantitative and qualitative methodology was used to study this phenomenon based on the computerized textual analysis of the rhetorical function of citations arranged in typological classification that modified the outline proposed by Petrić in his 2007 article. The results obtained from …
How Do We Learn What We Cannot Say?, Daniel Yakubov
How Do We Learn What We Cannot Say?, Daniel Yakubov
Dissertations, Theses, and Capstone Projects
The contributions of this thesis are two-fold. First, this thesis presents UDTube, an easily usable software developed to perform morphological analysis in a multi-task fashion. This work shows the strong performance of UDTube versus the current state-of-the-art, UDPipe, across eight languages, primarily in the annotation of morphological features. The second contribution of this thesis is a exploration into the study of defectivity. UDTube is used to annotate a large amount of data in Greek and Russian which is ultimately used to investigate the plausibility of Indirect Negative Evidence (INE), a popular approach to the acquisition of morphological defectivity. The reported …
Consonant (De)Gradation In Ingrian?, Andrea M. Harrison
Consonant (De)Gradation In Ingrian?, Andrea M. Harrison
Dissertations, Theses, and Capstone Projects
This paper will present a dual method toward data enrichment for low-resource languages. Using Yoyodyne -- a Fairseq-inspired neural library for small-vocabulary sequence-to-sequence generation -- a morphological generation task was tested across labeled data encompassing multiple stages of enrichment for the low-resource language Ingrian. Due to limitations in the available data for Ingrian, weighted finite-state transducers (WFSTs) were used to generate an expanded vocabulary via HFST's toolkit for Uralic languages, and GiellaLT, a source for FST-driven lexica for low-resource languages. Further stages of experimentation used labeled data from related, higher-resource languages (Finnish, Estonian) to encourage cross-lingual transfer in the interest …
The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey
The Ring Cycle: Journeying Through The Language Of Tolkien’S Third Age With Corpus Linguistics, Michael Livesey
Journal of Tolkien Research
This article explores the journey taken by the One Ring across J.R.R. Tolkien’s Third Age writings. It employs a digital humanities approach to analyse linguistic patterns in Tolkien’s use of the word ring, across The Hobbit and The Lord of the Rings. Specifically, the article employs corpus linguistic methods to track shifts in the quantities and qualities of the Ring’s appearance across these texts. It uses techniques of keyness and collocation analysis to trace transformations in these quantities/qualities, including: a) the Ring’s transition from a central to a peripheral place in the Third Age’s narrative arc; and b) …
A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin
A Computer-Assisted Approach To Lexical Borrowing In Northeast Caucasian Languages, Bonnie Eleanor Wren-Hardin
Theses and Dissertations--Linguistics
The disambiguation of loanwords and cognates can be a challenge, especially in areas where there has been intense language contact over an extended period of time, when the contact is between genetically related languages, and when the number of languages involved is large Over the past several decades, more and more computational approaches to automatic cognate and borrowing detection have been created in an attempt to ease the load of examining hundreds to thousands of individual lexemes, as well as determine language family relationships with allegedly greater accuracy. While these methods are not perfect and cannot replace the knowledge or …
Guilty Machines: On Ab-Sens In The Age Of Ai, Dylan Lackey, Katherine Weinschenk
Guilty Machines: On Ab-Sens In The Age Of Ai, Dylan Lackey, Katherine Weinschenk
Critical Humanities
For Lacan, guilt arises in the sublimation of ab-sens (non-sense) into the symbolic comprehension of sen-absexe (sense without sex, sense in the deficiency of sexual relation), or in the maturation of language to sensibility through the effacement of sex. Though, as Slavoj Žižek himself points out in a recent article regarding ChatGPT, the split subject always misapprehends the true reason for guilt’s manifestation, such guilt at best provides a sort of evidence for the inclusion of the subject in the order of language, acting as a necessary, even enjoyable mark of the subject’s coherence (or, more importantly, the subject’s separation …
The Near-Synonymous Classifiers In Mandarin Chinese: Etymology, Modern Usage, And Possible Problems In L2 Classroom, Irina Kavokina
The Near-Synonymous Classifiers In Mandarin Chinese: Etymology, Modern Usage, And Possible Problems In L2 Classroom, Irina Kavokina
Masters Theses
Many Chinese classifiers are nearly synonymic – they can be used with the same head nouns without changing the meaning of the sentence, in other words, such classifiers can be used interchangeably or almost interchangeably. This poses a challenge for Chinese language learners, especially those who lack such a grammatical category in their own native language. Another complication arises from the ambiguous English translations of many classifiers.
In this paper we investigate the collocation behavior of near-synonymous Chinese classifiers, focusing on their semantic nuances and interchangeability. Analyzing 6 pairs of classifiers — 栋 and 幢, 匹 and 头, 批 and …
Executive Order On The Safe, Secure, And Trustworthy Development And Use Of Artificial Intelligence, Joseph R. Biden
Executive Order On The Safe, Secure, And Trustworthy Development And Use Of Artificial Intelligence, Joseph R. Biden
Copyright, Fair Use, Scholarly Communication, etc.
Section 1. Purpose. Artificial intelligence (AI) holds extraordinary potential for both promise and peril. Responsible AI use has the potential to help solve urgent challenges while making our world more prosperous, productive, innovative, and secure. At the same time, irresponsible use could exacerbate societal harms such as fraud, discrimination, bias, and disinformation; displace and disempower workers; stifle competition; and pose risks to national security. Harnessing AI for good and realizing its myriad benefits requires mitigating its substantial risks. This endeavor demands a society-wide effort that includes government, the private sector, academia, and civil society.
My Administration places the highest urgency …
Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore
Towards Interpretable Machine Reading Comprehension With Mixed Effects Regression And Exploratory Prompt Analysis, Luca Del Signore
Dissertations, Theses, and Capstone Projects
We investigate the properties of natural language prompts that determine their difficulty in machine reading comprehension tasks. While much work has been done benchmarking language model performance at the task level, there is considerably less literature focused on how individual task items can contribute to interpretable evaluations of natural language understanding. Such work is essential to deepening our understanding of language models and ensuring their responsible use as a key tool in human machine communication. We perform an in depth mixed effects analysis on the behavior of three major generative language models, comparing their performance on a large reading comprehension …
A Computational Analysis Of Volodymyr Zelenskyy's Public Diplomacy Discourse In Times Of Crisis, Amber Brittain-Hale
A Computational Analysis Of Volodymyr Zelenskyy's Public Diplomacy Discourse In Times Of Crisis, Amber Brittain-Hale
Education Division Scholarship
In this study, we delve into the public diplomacy discourse of Ukrainian President Volodymyr Zelenskyy during the ongoing crisis of the Russo-Ukrainian War. We aim to conduct a computational analysis of Zelenskyy's English, Russian, and Ukrainian speeches, exploring the linguistic patterns and code-switching employed in his discourse. The study period encompasses Russia’s build-up to and full-scale invasion of Ukraine from May 2019 to May 30, 2023. This time frame is crucial as it captures the dynamic development of the crisis and the expansion of Zelenskyy's presidency, providing a unique context for analyzing his public diplomacy efforts. By utilizing Linguistic Inquiry …
Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama
Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama
Business Faculty Articles and Research
We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral context vector independent of ideology, and a latent position vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents …
Destined Failure, Chengjun Pan
Destined Failure, Chengjun Pan
Masters Theses
I attempt to examine the complex structure of human communication, explaining why it is bound to fail. By reproducing experienceable phenomena, I demonstrate how they can expose communication structure and reveal the limitations of our perception and symbolization.I divide the process of communication into six stages: input, detection, symbolization, dictionary, interpretation, and output. In this thesis, I examine the flaws and challenges that arise in the first five stages. I argue that reception acts as a filter and that understanding relies on a symbolic system that is full of redundancies. Therefore, every interpretation is destined to be a deviation.
The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang
The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang
Journal of English and Applied Linguistics
This paper examines the prevalence of Cantonese-English code-mixing in Hong Kong through an under-researched digital medium. Prior research on this code-alternation practice has often been limited to exploring either the social or linguistic constraints of code-switching in spoken or written communication. Our study takes a holistic approach to analyzing code-switching in a hybrid medium that exhibits features of both spoken and written discourse. We specifically analyze the code-switching patterns of 24 undergraduates from a Hong Kong university on WhatsApp and examine how both social and linguistic factors potentially constrain these patterns. Utilizing a self-compiled sociolinguistic corpus as well as survey …
Evaluating Neural Networks As Cognitive Models For Learning Quasi-Regularities In Language, Xiaomeng Ma
Evaluating Neural Networks As Cognitive Models For Learning Quasi-Regularities In Language, Xiaomeng Ma
Dissertations, Theses, and Capstone Projects
Many aspects of language can be categorized as quasi-regular: the relationship between the inputs and outputs is systematic but allows many exceptions. Common domains that contain quasi-regularity include morphological inflection and grapheme-phoneme mapping. How humans process quasi-regularity has been debated for decades. This thesis implemented modern neural network models, transformer models, on two tasks: English past tense inflection and Chinese character naming, to investigate how transformer models perform quasi-regularity tasks. This thesis focuses on investigating to what extent the models' performances can represent human behavior. The results show that the transformers' performance is very similar to human behavior in many …
Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni
Topics For He But Not For She: Quantifying And Classifying Gender Bias In The Media, Tyler J. Lanni
Dissertations, Theses, and Capstone Projects
In this study, we used computational techniques to analyze the language used in news articles to describe female and male politicians. Our corpus included 370 subtexts for male candidates and 374 subtexts for female candidates, gathered through the New York Times API. We conducted two experiments: an LDA topic analysis to explore the data, and a logistic regression to classify the subtexts as either male or female. Our analysis revealed some noteworthy findings that suggest the possibility of developing a gender bias classifier in the future. However, to create a more robust understanding of bias, additional research and data are …
Neural Network Vs. Rule-Based G2p: A Hybrid Approach To Stress Prediction And Related Vowel Reduction In Bulgarian, Maria Karamihaylova
Neural Network Vs. Rule-Based G2p: A Hybrid Approach To Stress Prediction And Related Vowel Reduction In Bulgarian, Maria Karamihaylova
Dissertations, Theses, and Capstone Projects
An effective grapheme-to-phoneme (G2P) conversion system is a critical element of speech synthesis. Rule-based systems were an early method for G2P conversion. In recent years, machine learning tools have been shown to outperform rule-based approaches in G2P tasks. We investigate neural network sequence-to-sequence modeling for the prediction of syllable stress and resulting vowel reductions in the Bulgarian language. We then develop a hybrid G2P approach which combines manually written grapheme-to-phoneme mapping rules with neural network-enabled syllable stress predictions by inserting stress markers in the predicted stress position of the transcription produced by the rule-based finite-state transducer. Finally, we apply vowel …
Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li
Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li
Dissertations
Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens' perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups.
This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine …
Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham
Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham
Electronic Theses and Dissertations
The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …
Improving Sign Recognition With Phonology, Lee Kezar, Jesse Thomason, Zed Sevcikova Sehyr
Improving Sign Recognition With Phonology, Lee Kezar, Jesse Thomason, Zed Sevcikova Sehyr
Communication Sciences and Disorders Faculty Articles and Research
We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition …
Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard
Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard
SMU Data Science Review
The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …
Single-Case Pilot Study For Longitudinal Analysis Of Referential Failures And Sentiment In Schizophrenic Speech From Client-Centered Psychotherapy Recordings, Travis A. Musich
Single-Case Pilot Study For Longitudinal Analysis Of Referential Failures And Sentiment In Schizophrenic Speech From Client-Centered Psychotherapy Recordings, Travis A. Musich
Dissertations
Though computational linguistic analyses have revealed the presence of distinctly characteristic language features in schizophrenic disordered speech, the relative stability of these language features in longitudinal samples is still unknown. This longitudinal pilot study analyzed schizophrenic disordered speech data from the archival therapy audio recordings of one patient spanning 23 years. End-to-end Neural Coreference Resolution software was used to analyze transcribed speech data from three therapy sessions to identify ambiguous pronouns, referred to as referential failures, which were reviewed and confirmed by multiple raters. Speech samples were analyzed using Google Cloud Natural Language API software for sentiment variables (i.e., score, …
Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)
Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)
Library Philosophy and Practice (e-journal)
Abstract
Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …
A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy
A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy
Dissertations, Theses, and Capstone Projects
On social media, the use of “Filipinx” as a gender neutral, inclusive term for “Filipino” tends to generate high user engagement, at times without regard for the original context in which the word appears. This project applies computational methods to collect a large dataset in English/Filipino from Twitter containing “Filipinx”, and to train a Naïve Bayes model to classify tweets into three sentiments: positive, neutral, and negative. My methodology takes inspiration from that of four related studies that similarly conducted sentiment analysis on English/Filipino tweets involving various topics, and whose resulting accuracy scores were compared side-by-side. Conducting sentiment analysis on …
Simulating The Machine Translation Of Low-Resource Languages By Designing A Translator Between English And An Artificially Constructed Language, Michaela Snyder
Simulating The Machine Translation Of Low-Resource Languages By Designing A Translator Between English And An Artificially Constructed Language, Michaela Snyder
Mahurin Honors College Capstone Experience/Thesis Projects
Natural language processing (NLP), or the use of computers to analyze natural language, is a field that relies heavily on syntax. It would seem intuitive that computers would thrive in this area due to their strict syntax requirements, but the syntax of natural languages leaves them unable to properly parse and generate sentences that seem normal to the average speaker. A subfield of NLP, machine translation, works mainly to computerize translation between different languages. Unfortunately, such translation is not without its weaknesses; language documentation is not created equal, and many low-resource languages—languages with relatively few kinds of documentation, most often …
‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill
‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill
Theses and Dissertations--Linguistics
The purpose of this study is to investigate how Mississippi Gulf Coast Creoles perceive language differences in their home area. A pile-sort task was carried out in which respondents were given stacks of cards with local communities written on them and instructed to stack together the regions where people “talk the same.” Once the piles were made, the fieldworker discussed their sortings with the respondents. The stacks were analyzed by means of a hierarchal agglomerative cluster analysis and non-parametric multidimensional scaling with k-means cluster analysis overlays to extract the perceived dialect areas. The groupings reveal that respondent strategies are based …
Automatic Transcription Of Northern Prinmi Oral Art: Approaches And Challenges To Automatic Speech Recognition For Language Documentation, Connor Bechler
Automatic Transcription Of Northern Prinmi Oral Art: Approaches And Challenges To Automatic Speech Recognition For Language Documentation, Connor Bechler
Theses and Dissertations--Linguistics
One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying …
Evaluation Of Different Machine Learning, Deep Learning And Text Processing Techniques For Hate Speech Detection, Nabil Shawkat
Evaluation Of Different Machine Learning, Deep Learning And Text Processing Techniques For Hate Speech Detection, Nabil Shawkat
MSU Graduate Theses
Social media has become a domain that involves a lot of hate speech. Some users feel entitled to engage in abusive conversations by sending abusive messages, tweets, or photos to other users. It is critical to detect hate speech and prevent innocent users from becoming victims. In this study, I explore the effectiveness and performance of various machine learning methods employing text processing techniques to create a robust system for hate speech identification. I assess the performance of Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Logistic Regression, and K Nearest Neighbors using three distinct datasets sourced from social …
Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira
Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira
Publications and Research
This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (Mage = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological …