Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

233 Full-Text Articles 347 Authors 192,439 Downloads 63 Institutions

All Articles in Computational Linguistics

Faceted Search

233 full-text articles. Page 2 of 11.

Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li 2023 New Jersey Institute of Technology

Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li

Dissertations

Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens' perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups.

This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine …


Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham 2023 East Tennessee State University

Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham

Electronic Theses and Dissertations

The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …


Improving Sign Recognition With Phonology, Lee Kezar, Jesse Thomason, Zed Sevcikova Sehyr 2023 University of Southern California

Improving Sign Recognition With Phonology, Lee Kezar, Jesse Thomason, Zed Sevcikova Sehyr

Communication Sciences and Disorders Faculty Articles and Research

We use insights from research on American Sign Language (ASL) phonology to train models for isolated sign language recognition (ISLR), a step towards automatic sign language understanding. Our key insight is to explicitly recognize the role of phonology in sign production to achieve more accurate ISLR than existing work which does not consider sign language phonology. We train ISLR models that take in pose estimations of a signer producing a single sign to predict not only the sign but additionally its phonological characteristics, such as the handshape. These auxiliary predictions lead to a nearly 9% absolute gain in sign recognition …


Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard 2023 Southern Methodist University

Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard

SMU Data Science Review

The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …


Single-Case Pilot Study For Longitudinal Analysis Of Referential Failures And Sentiment In Schizophrenic Speech From Client-Centered Psychotherapy Recordings, Travis A. Musich 2023 National Louis University

Single-Case Pilot Study For Longitudinal Analysis Of Referential Failures And Sentiment In Schizophrenic Speech From Client-Centered Psychotherapy Recordings, Travis A. Musich

Dissertations

Though computational linguistic analyses have revealed the presence of distinctly characteristic language features in schizophrenic disordered speech, the relative stability of these language features in longitudinal samples is still unknown. This longitudinal pilot study analyzed schizophrenic disordered speech data from the archival therapy audio recordings of one patient spanning 23 years. End-to-end Neural Coreference Resolution software was used to analyze transcribed speech data from three therapy sessions to identify ambiguous pronouns, referred to as referential failures, which were reviewed and confirmed by multiple raters. Speech samples were analyzed using Google Cloud Natural Language API software for sentiment variables (i.e., score, …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) 2023 Central University of South Bihar, Panchanpur, Gaya, Bihar

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy 2023 The Graduate Center, City University of New York

A Sentiment Analysis Of "Filipinx" On Twitter Using A Multinomial Naïve Bayes Classification Model, Clarisse Taboy

Dissertations, Theses, and Capstone Projects

On social media, the use of “Filipinx” as a gender neutral, inclusive term for “Filipino” tends to generate high user engagement, at times without regard for the original context in which the word appears. This project applies computational methods to collect a large dataset in English/Filipino from Twitter containing “Filipinx”, and to train a Naïve Bayes model to classify tweets into three sentiments: positive, neutral, and negative. My methodology takes inspiration from that of four related studies that similarly conducted sentiment analysis on English/Filipino tweets involving various topics, and whose resulting accuracy scores were compared side-by-side. Conducting sentiment analysis on …


Simulating The Machine Translation Of Low-Resource Languages By Designing A Translator Between English And An Artificially Constructed Language, Michaela Snyder 2023 Western Kentucky University

Simulating The Machine Translation Of Low-Resource Languages By Designing A Translator Between English And An Artificially Constructed Language, Michaela Snyder

Mahurin Honors College Capstone Experience/Thesis Projects

Natural language processing (NLP), or the use of computers to analyze natural language, is a field that relies heavily on syntax. It would seem intuitive that computers would thrive in this area due to their strict syntax requirements, but the syntax of natural languages leaves them unable to properly parse and generate sentences that seem normal to the average speaker. A subfield of NLP, machine translation, works mainly to computerize translation between different languages. Unfortunately, such translation is not without its weaknesses; language documentation is not created equal, and many low-resource languages—languages with relatively few kinds of documentation, most often …


Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira 2023 CUNY College of Staten Island

Brazilian Portuguese-Russian (Braporus) Corpus: Automatic Transcription And Acoustic Quality Of Elderly Speech During Covid-19 Pandemic, Irina A. Sekerina, Anna Smirnova Henriques, Aleksandra Skorobogatova, Natalia Tyulina, Tatiana V. Kachkovskaia, Svetlana Ruseishvili, Sandra Madureira

Publications and Research

This article presents the Brazilian Portuguese-Russian (BraPoRus) corpus, whose goal is to collect, analyze, and preserve for posterity the spoken heritage Russian still used today in Brazil by approximately 1,500 elderly bilingual heritage Russian–Brazilian Portuguese speakers. Their unique 100-year-old variety of moribund Russian is disappearing because it has not been passed to their descendants born in Brazil. During the COVID-19 pandemic, we remotely collected 170 h of speech samples in heritage Russian from 26 participants (Mage = 75.7 years) in naturalistic settings using Zoom or a phone call. To estimate the quality of collected data, we focus on two methodological …


Evaluation Of Different Machine Learning, Deep Learning And Text Processing Techniques For Hate Speech Detection, Nabil Shawkat 2023 Missouri State University

Evaluation Of Different Machine Learning, Deep Learning And Text Processing Techniques For Hate Speech Detection, Nabil Shawkat

MSU Graduate Theses

Social media has become a domain that involves a lot of hate speech. Some users feel entitled to engage in abusive conversations by sending abusive messages, tweets, or photos to other users. It is critical to detect hate speech and prevent innocent users from becoming victims. In this study, I explore the effectiveness and performance of various machine learning methods employing text processing techniques to create a robust system for hate speech identification. I assess the performance of Naïve Bayes, Support Vector Machines, Decision Trees, Random Forests, Logistic Regression, and K Nearest Neighbors using three distinct datasets sourced from social …


Automatic Transcription Of Northern Prinmi Oral Art: Approaches And Challenges To Automatic Speech Recognition For Language Documentation, Connor Bechler 2023 University of Kentucky

Automatic Transcription Of Northern Prinmi Oral Art: Approaches And Challenges To Automatic Speech Recognition For Language Documentation, Connor Bechler

Theses and Dissertations--Linguistics

One significant issue facing language documentation efforts is the transcription bottleneck: each documented recording must be transcribed and annotated, and these tasks are extremely labor intensive (Ćavar et al., 2016). Researchers have sought to accelerate these tasks with partial automation via forced alignment, natural language processing, and automatic speech recognition (ASR) (Neubig et al., 2020). Neural network—especially transformer-based—approaches have enabled large advances in ASR over the last decade. Models like XLSR-53 promise improved performance on under-resourced languages by leveraging massive data sets from many different languages (Conneau et al., 2020). This project extends these efforts to a novel context, applying …


‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill 2023 University of Kentucky

‘A Category Of Their Own’: Quantitative Methods In The Use Of Pile-Sort Data In Perceptual Dialectology, Zachary Ty Gill

Theses and Dissertations--Linguistics

The purpose of this study is to investigate how Mississippi Gulf Coast Creoles perceive language differences in their home area. A pile-sort task was carried out in which respondents were given stacks of cards with local communities written on them and instructed to stack together the regions where people “talk the same.” Once the piles were made, the fieldworker discussed their sortings with the respondents. The stacks were analyzed by means of a hierarchal agglomerative cluster analysis and non-parametric multidimensional scaling with k-means cluster analysis overlays to extract the perceived dialect areas. The groupings reveal that respondent strategies are based …


Technology In The Classroom: The Features Language Teachers Should Consider, Sophie Cuocci, Padideh Fattahi Marnani 2022 University of Central Florida

Technology In The Classroom: The Features Language Teachers Should Consider, Sophie Cuocci, Padideh Fattahi Marnani

Journal of English Learner Education

The fast development of technology and the new generation of highly computer literate students led to consider the integration of technology in school as essential. Throughout the last two decades, research has identified multiple factors leading to the successful and unsuccessful integration of technology in the classroom. Educators must consider these factors when deciding on which technology tools to use and how to integrate them to their lessons. Simultaneously, the increasing number of English learners in the United States calls for the identification of teaching strategies that will best support their needs. Many language teachers now rely on teaching techniques …


Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander 2022 Technical University of Munich

Creating Data From Unstructured Text With Context Rule Assisted Machine Learning (Craml), Stephen Meisenbacher, Peter Norlander

School of Business: Faculty Publications and Other Works

Popular approaches to building data from unstructured text come with limitations, such as scalability, interpretability, replicability, and real-world applicability. These can be overcome with Context Rule Assisted Machine Learning (CRAML), a method and no-code suite of software tools that builds structured, labeled datasets which are accurate and reproducible. CRAML enables domain experts to access uncommon constructs within a document corpus in a low-resource, transparent, and flexible manner. CRAML produces document-level datasets for quantitative research and makes qualitative classification schemes scalable over large volumes of text. We demonstrate that the method is useful for bibliographic analysis, transparent analysis of proprietary data, …


Data-Driven Neuroanatomical Subtypes In Various Stages Of Schizophrenia: Linking Cortical Thickness, Glutamate, And Language Functioning, Liangbing Liang 2022 The University of Western Ontario

Data-Driven Neuroanatomical Subtypes In Various Stages Of Schizophrenia: Linking Cortical Thickness, Glutamate, And Language Functioning, Liangbing Liang

Electronic Thesis and Dissertation Repository

The considerable variation in the spatial distribution of cortical thickness changes has been used to parse heterogeneity in schizophrenia. We aimed to recover a ‘cortical impoverishment’ subgroup with widespread cortical thinning. We applied hierarchical cluster analysis to cortical thickness data of three datasets in different stages of psychosis and studied the cognitive, functional, neurochemical, language and symptom profiles of the observed subgroups. Our consensus-based clustering procedure consistently produced a subgroup characterized by significantly lower cortical thickness. This ‘cortical impoverishment’ subgroup was associated with a higher symptom burden in a clinically stable sample and higher glutamate levels with language impairments in …


Phonotactic Learning With Distributional Representations, Max A. Nelson 2022 University of Massachusetts Amherst

Phonotactic Learning With Distributional Representations, Max A. Nelson

Doctoral Dissertations

This dissertation explores the possibility that the phonological grammar manipulates phone representations based on learned distributional class memberships rather than those based on substantive linguistic features. In doing so, this work makes three primary contributions. First, I propose three novel algorithms for learning a phonological class system from the distributional statistics of a language, all of which are based on partitioning graph representations of phone distributions. Second, I propose a new method for fitting Maximum Entropy phonotactic grammars, MaxEntGrams, which offers theoretical complexity improvements over the widely-adopted approach taken by Hayes and Wilson [2008]. Third, I present a series of …


Restrictive Tier Induction, Seoyoung Kim 2022 University of Massachusetts Amherst

Restrictive Tier Induction, Seoyoung Kim

Doctoral Dissertations

This dissertation proposes the Restrictive Tier Learner, which automatically induces only the tiers that are absolutely necessary in capturing phonological long-distance dependencies. The core of my learner is the addition of an extra evaluation step to the existing Inductive Projection Learner (Gouskova and Gallagher 2020), where the necessity and accuracy of the candidate tiers are determined. An important building block of my learner is a typological observation, namely the dichotomy between trigram-bound and unbounded patterns. The fact that this dichotomy is attested in both consonant interactions and vowel interactions allows for a unified approach to be used. Another important piece …


From Sesame Street To Beyond: Multi-Domain Discourse Relation Classification With Pretrained Bert, Isaac R. Raff 2022 The Graduate Center, City University of New York

From Sesame Street To Beyond: Multi-Domain Discourse Relation Classification With Pretrained Bert, Isaac R. Raff

Dissertations, Theses, and Capstone Projects

Research efforts in transfer learning have gained massive popularity in recent years. Pretrained language models have demonstrated the most successful results in producing high quality neural networks capable of quality inference after training across domains via transfer learning. This study expands on the domain transfer introduced in \cite{ferracane-etal-2019-news} exploring neural methods for transfer learning of discourse parsing between a news source domain and a medical target domain. \cite{ferracane-etal-2019-news} specifically discuss transfer learning from news articles to PubMed medical journal articles. Experiments in transfer learning in the current work expand to include three domains: Wall Street Journal articles previously annotated with …


Linguistic Abstractions In Children’S Very Early Utterances, Qihui Xu 2022 The Graduate Center, City University of New York

Linguistic Abstractions In Children’S Very Early Utterances, Qihui Xu

Dissertations, Theses, and Capstone Projects

How early do children produce multiword utterances? Do children's early utterances reflect abstract syntactic knowledge or are they the result of data-driven learning? We examine this issue through corpus analysis, computational modeling, and adult simulation experiments. Chapter 1 investigates when children start producing multiword utterances; we use corpora to establish the development of multiword utterances and a probabilistic computational model to account for the quantitative change of early multiword utterances. We find that multiword utterances of different lengths appear early in acquisition and increase together, and the length growth pattern can be viewed as a probabilistic and dynamic process.

Chapter …


Towards Explaining Variation In Entrainment, Andreas Weise 2022 The Graduate Center, City University of New York

Towards Explaining Variation In Entrainment, Andreas Weise

Dissertations, Theses, and Capstone Projects

Entrainment refers to the tendency of human speakers to adapt to their interlocutors to become more similar to them. This affects various dimensions and occurs in many contexts, allowing for rich applications in human-computer interaction. However, it is not exhibited by every speaker in every conversation but varies widely across features, speakers, and contexts, hindering broad application. This variation, whose guiding principles are poorly understood even after decades of entrainment research, is the subject of this thesis. We begin with a comprehensive literature review that serves as the foundation of our own work and provides a reference to guide future …


Digital Commons powered by bepress