Open Access. Powered by Scholars. Published by Universities.®

Computational Linguistics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Computational Linguistics

Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama Jul 2023

Ideology Prediction From Scarce And Biased Supervision: Learn To Disregard The “What” And Focus On The “How”!, Chen Chen, Dylan Walker, Venkatesh Saligrama

Business Faculty Articles and Research

We propose a novel supervised learning approach for political ideology prediction (PIP) that is capable of predicting out-of-distribution inputs. This problem is motivated by the fact that manual data-labeling is expensive, while self-reported labels are often scarce and exhibit significant selection bias. We propose a novel statistical model that decomposes the document embeddings into a linear superposition of two vectors; a latent neutral context vector independent of ideology, and a latent position vector aligned with ideology. We train an end-to-end model that has intermediate contextual and positional vectors as outputs. At deployment time, our model predicts labels for input documents …


The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang Jun 2023

The Sociolinguistics Of Code-Switching In Hong Kong’S Digital Landscape: A Mixed-Methods Exploration Of Cantonese-English Alternation Patterns On Whatsapp, Wilkinson Daniel Wong Gonzales, Yuen Man Tsang

Journal of English and Applied Linguistics

This paper examines the prevalence of Cantonese-English code-mixing in Hong Kong through an under-researched digital medium. Prior research on this code-alternation practice has often been limited to exploring either the social or linguistic constraints of code-switching in spoken or written communication. Our study takes a holistic approach to analyzing code-switching in a hybrid medium that exhibits features of both spoken and written discourse. We specifically analyze the code-switching patterns of 24 undergraduates from a Hong Kong university on WhatsApp and examine how both social and linguistic factors potentially constrain these patterns. Utilizing a self-compiled sociolinguistic corpus as well as survey …


Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li May 2023

Ai Approaches To Understand Human Deceptions, Perceptions, And Perspectives In Social Media, Chih-Yuan Li

Dissertations

Social media platforms have created virtual space for sharing user generated information, connecting, and interacting among users. However, there are research and societal challenges: 1) The users are generating and sharing the disinformation 2) It is difficult to understand citizens' perceptions or opinions expressed on wide variety of topics; and 3) There are overloaded information and echo chamber problems without overall understanding of the different perspectives taken by different people or groups.

This dissertation addresses these three research challenges with advanced AI and Machine Learning approaches. To address the fake news, as deceptions on the facts, this dissertation presents Machine …


Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham May 2023

Predicting High-Cap Tech Stock Polarity: A Combined Approach Using Support Vector Machines And Bidirectional Encoders From Transformers, Ian L. Grisham

Electronic Theses and Dissertations

The abundance, accessibility, and scale of data have engendered an era where machine learning can quickly and accurately solve complex problems, identify complicated patterns, and uncover intricate trends. One research area where many have applied these techniques is the stock market. Yet, financial domains are influenced by many factors and are notoriously difficult to predict due to their volatile and multivariate behavior. However, the literature indicates that public sentiment data may exhibit significant predictive qualities and improve a model’s ability to predict intricate trends. In this study, momentum SVM classification accuracy was compared between datasets that did and did not …


Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard Apr 2023

Content-Based Unsupervised Fake News Detection On Ukraine-Russia War, Yucheol Shin, Yvan Sojdehei, Limin Zheng, Brad Blanchard

SMU Data Science Review

The Ukrainian-Russian war has garnered significant attention worldwide, with fake news obstructing the formation of public opinion and disseminating false information. This scholarly paper explores the use of unsupervised learning methods and the Bidirectional Encoder Representations from Transformers (BERT) to detect fake news in news articles from various sources. BERT topic modeling is applied to cluster news articles by their respective topics, followed by summarization to measure the similarity scores. The hypothesis posits that topics with larger variances are more likely to contain fake news. The proposed method was evaluated using a dataset of approximately 1000 labeled news articles related …


Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian) Mar 2023

Chatgpt As Metamorphosis Designer For The Future Of Artificial Intelligence (Ai): A Conceptual Investigation, Amarjit Kumar Singh (Library Assistant), Dr. Pankaj Mathur (Deputy Librarian)

Library Philosophy and Practice (e-journal)

Abstract

Purpose: The purpose of this research paper is to explore ChatGPT’s potential as an innovative designer tool for the future development of artificial intelligence. Specifically, this conceptual investigation aims to analyze ChatGPT’s capabilities as a tool for designing and developing near about human intelligent systems for futuristic used and developed in the field of Artificial Intelligence (AI). Also with the helps of this paper, researchers are analyzed the strengths and weaknesses of ChatGPT as a tool, and identify possible areas for improvement in its development and implementation. This investigation focused on the various features and functions of ChatGPT that …


Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian Apr 2022

Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian

Northeast Journal of Complex Systems (NEJCS)

In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts …


Non-Manual Articulators In Irish Sign Language Verbs: An Analysis With Data Mining Association Rules, Robert G. Smith, Markus Hofmann Nov 2018

Non-Manual Articulators In Irish Sign Language Verbs: An Analysis With Data Mining Association Rules, Robert G. Smith, Markus Hofmann

Conference Papers

The Signs of Ireland (SOI) corpus (Leeson et al., 2006) deploys a complex multi-tiered temporal data structure. The process of manually analyzing such data is laborious, cannot eliminate bias and often, important patterns can go completely unnoticed. In addition to this, as a result of the complex nature of grammatical structures contained in the corpus, identifying complex linguistic associations or patterns across tiers is simply too intricate a task for a human to carry out in an acceptable timeframe. This work explores the application of data mining techniques on a set of multi-tiered temporal data from the SOI corpus. Building …