Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Computer Sciences

Stereotypes And Language Models: Understanding How Language Models Encode Stereotypes, Debiasing Language Models, And Examining How Stereotypes Affect Conversations, Brian C. Wang Jun 2023

Stereotypes And Language Models: Understanding How Language Models Encode Stereotypes, Debiasing Language Models, And Examining How Stereotypes Affect Conversations, Brian C. Wang

Computer Science Senior Theses

This thesis describes a variety of approaches in examining how language models encode stereotypes (understanding stereotypes from a model point-of-view), debiasing language models, and using language models to understand how stereotypes affect conversations (understanding stereotypes from a conversational point-of-view). We present a novel approach for textual clues analysis that makes language models more interpretable, combining the understanding of what stereotypes the internal structures of language models have encoded during their initial training (via attention-based analysis) and understanding what textual clues are most relevant to identifying stereotypes for models trained to detect stereotypes (via SHAP-based analysis). We find that different pre-trained …


An Algorithmic Approach To Jazz Guitar Voice-Leading Chord Fingerings, Matthew B. Keating May 2023

An Algorithmic Approach To Jazz Guitar Voice-Leading Chord Fingerings, Matthew B. Keating

Computer Science Senior Theses

A problem in guitar practice is choosing chord voicings that fit together in sequence, a process known as voice leading. In jazz, a guitarist follows voice leading by maintaining stepwise or limited motion for smoother harmony. The main avenues to learn jazz guitar voice leading theory are through a guitar instructor or chord books. To our knowledge, no computational method of generating voice-leading given chord labels exists. First, we demonstrate the complexity of this problem by presenting a graph search algorithm to optimize for a simplified version of voice leading. Then, we present a novel approach to algorithmically derive tablature …


Towards A Computational Model Of Narrative On Social Media, Anne Bailey Jun 2022

Towards A Computational Model Of Narrative On Social Media, Anne Bailey

Dartmouth College Undergraduate Theses

This thesis describes a variety of approaches to developing a computational model of narrative on social media. Our goal is to use such a narrative model to identify efforts to manipulate public opinion on social media platforms like Twitter. We present a model in which narratives in a collection of tweets are represented as a graph. Elements from each tweet that are relevant to potential narratives are made into nodes in the graph; for this thesis, we populate graph nodes with tweets’ authors, hashtags, named entities (people, locations, organizations, etc.,), and moral foundations (central moral values framing the discussion). Two …


Machine Learning And The Network Analysis Of Ethereum Trading Data, Santosh Sivakumar Jun 2022

Machine Learning And The Network Analysis Of Ethereum Trading Data, Santosh Sivakumar

Dartmouth College Undergraduate Theses

Since their conception, cryptocurrencies have captured the public interest, motivating a growing body of research aimed at exploring blockchain-based transactions. This said, little work has been done to draw conclusions from transaction patterns, particularly in the realm of predicting cryptocurrency price movements. Moreover, research in the cryptocurrency sphere largely focuses on Bitcoin, paying little attention to Ethereum, Bitcoin's second-in-line with respect to market capitalization. In this paper, we construct hourly networks for a year of Ethereum transactions, using computed graph metrics as features in a series of machine learning models. We find that regression-based approaches to predicting Ether prices/price deltas …


Symplectically Integrated Symbolic Regression Of Hamiltonian Dynamical Systems, Daniel Dipietro Jun 2022

Symplectically Integrated Symbolic Regression Of Hamiltonian Dynamical Systems, Daniel Dipietro

Computer Science Senior Theses

Here we present Symplectically Integrated Symbolic Regression (SISR), a novel technique for learning physical governing equations from data. SISR employs a deep symbolic regression approach, using a multi-layer LSTMRNN with mutation to probabilistically sample Hamiltonian symbolic expressions. Using symplectic neural networks, we develop a model-agnostic approach for extracting meaningful physical priors from the data that can be imposed on-the-fly into the RNN output, limiting its search space. Hamiltonians generated by the RNN are optimized and assessed using a fourth-order symplectic integration scheme; prediction performance is used to train the LSTM-RNN to generate increasingly better functions via a risk-seeking policy gradients …


Entity Based Sentiment Analysis For Textual Health Advice, Dae Lim Chung Apr 2022

Entity Based Sentiment Analysis For Textual Health Advice, Dae Lim Chung

Computer Science Senior Theses

This work explores entity based sentiment analysis for textual health advice through deep learning. We fine tuned a pretrained BERT model to analyze sentiments across five different predetermined categories which consist of food, medicine, disease, exercise, and vitality for three different sentiments: positive, negative, and neutral. Original set of annotated medical dataset from Dartmouth College’s Persist Lab was used to conduct the experiments. For the aim of tailoring the data for the purpose of entity based sentiment analysis, we explored data transformation techniques to generate optimum training examples. During the experiments, we were able to discover that the wide variety …


Analyzing Behavioral Adaptation To Covid-19 And Return To Pre-Pandemic Baselines In A Cohort Of College Seniors, Vlado Vojdanovski Jan 2022

Analyzing Behavioral Adaptation To Covid-19 And Return To Pre-Pandemic Baselines In A Cohort Of College Seniors, Vlado Vojdanovski

Computer Science Senior Theses

As the critical phase of the COVID-19 pandemic seems to be winding down, it is important to analyze the adjustment to COVID-19 and return to normalcy of various populations. In this study we focus on the behavioral adjustments exhibited by a cohort of N=114 college seniors. To infer COVID-19 adjustment we compare the 2021 year (second year of COVID-19) to the 2020 year (first year of COVID-19) and 2019 (prepandemic baseline year). We begin with a broad analysis between the second and first covid year, finding that the second year of COVID-19 shows significant returns to pre-pandemic baselines on multiple …


Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan Jun 2021

Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan

Dartmouth College Undergraduate Theses

This thesis describes our approach towards the fine-grained detection of hate speech using deep learning. We leverage the transformer encoder architecture to propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the prediction boundaries. The post-processing steps involve (1) labeling character offsets between consecutive toxic tokens as toxic and (2) assigning a toxic label to words that have at least one token labeled as toxic. Through experiments, we show that these two post-processing steps improve the performance of our model by 4.16% on …


Lexical Complexity Prediction With Assembly Models, Aadil Islam Jun 2021

Lexical Complexity Prediction With Assembly Models, Aadil Islam

Dartmouth College Undergraduate Theses

Tuning the complexity of one's writing is essential to presenting ideas in a logical, intuitive manner to audiences. This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model and a deep neural network model with an underlying Transformer architecture based on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonetic measures. Visualizations of BERT …


Analyses And Creation Of Author Stylized Text, Keith Carlson May 2021

Analyses And Creation Of Author Stylized Text, Keith Carlson

Dartmouth College Ph.D Dissertations

Written text is one of the major ways that humans communicate their thoughts. A single thought can be expressed through many different combinations of words, and the writer must choose which they will use. We call the idea which is communicated the content of the message, and the particular words chosen to express the content, the style. The same content expressed in a different style may tell something useful about the author of the text (e.g., the author's identity), may be easier to understand for different audiences, or may evoke different emotions in the reader.

In this work we explore …