Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 8 of 8

Full-Text Articles in Physical Sciences and Mathematics

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris Jun 2021

Counting And Sampling Small Structures In Graph And Hypergraph Data Streams, Themistoklis Haris

Dartmouth College Undergraduate Theses

In this thesis, we explore the problem of approximating the number of elementary substructures called simplices in large k-uniform hypergraphs. The hypergraphs are assumed to be too large to be stored in memory, so we adopt a data stream model, where the hypergraph is defined by a sequence of hyperedges.

First we propose an algorithm that (ε, δ)-estimates the number of simplices using O(m1+1/k / T) bits of space. In addition, we prove that no constant-pass streaming algorithm can (ε, δ)- approximate the number of simplices using less than O( m 1+1/k / T ) bits of space. Thus …


A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic Jun 2021

A Configurable Social Network For Running Irb-Approved Experiments, Mihovil Mandic

Dartmouth College Undergraduate Theses

Our world has never been more connected, and the size of the social media landscape draws a great deal of attention from academia. However, social networks are also a growing challenge for the Institutional Review Boards concerned with the subjects’ privacy. These networks contain a monumental variety of personal information of almost 4 billion people, allow for precise social profiling, and serve as a primary news source for many users. They are perfect environments for influence operations that are becoming difficult to defend against. Motivated to study online social influence via IRB-approved experiments, we designed and implemented a flexible, scalable, …


Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan Jun 2021

Fine-Grained Detection Of Hate Speech Using Bertoxic, Yakoob Khan

Dartmouth College Undergraduate Theses

This thesis describes our approach towards the fine-grained detection of hate speech using deep learning. We leverage the transformer encoder architecture to propose BERToxic, a system that fine-tunes a pre-trained BERT model to locate toxic text spans in a given text and utilizes additional post-processing steps to refine the prediction boundaries. The post-processing steps involve (1) labeling character offsets between consecutive toxic tokens as toxic and (2) assigning a toxic label to words that have at least one token labeled as toxic. Through experiments, we show that these two post-processing steps improve the performance of our model by 4.16% on …


Lexical Complexity Prediction With Assembly Models, Aadil Islam Jun 2021

Lexical Complexity Prediction With Assembly Models, Aadil Islam

Dartmouth College Undergraduate Theses

Tuning the complexity of one's writing is essential to presenting ideas in a logical, intuitive manner to audiences. This paper describes a system submitted by team BigGreen to LCP 2021 for predicting the lexical complexity of English words in a given context. We assemble a feature engineering-based model and a deep neural network model with an underlying Transformer architecture based on BERT. While BERT itself performs competitively, our feature engineering-based model helps in extreme cases, eg. separating instances of easy and neutral difficulty. Our handcrafted features comprise a breadth of lexical, semantic, syntactic, and novel phonetic measures. Visualizations of BERT …


Improving Existing Methods For Calculating Embodied Carbon Emissions In Trade Through Feature Discovery: An Information Theoretic Approach, Sam Morton Jun 2021

Improving Existing Methods For Calculating Embodied Carbon Emissions In Trade Through Feature Discovery: An Information Theoretic Approach, Sam Morton

Dartmouth College Undergraduate Theses

The continued societal and ecological risks posed by climate change have spurred renewed interest in quantitative tools that can improve policy aimed at climate mitigation. In 2008, international trade accounted for up to 26\% of global anthropogenic emissions, and therefore trade has garnered increased attention from policymakers seeking carbon mitigation. The concept of embodied carbon emissions in trade (EET) quantifies overall carbon emitted in the production and transport of goods for the purposes of trade. EET in theory could prove an indispensable tool to climate-concerned policymakers, but current implementations and data availability limit EET calculation to annual snapshots that extend …


Exploring The Long Tail, Joseph H. Hajjar Jun 2021

Exploring The Long Tail, Joseph H. Hajjar

Dartmouth College Undergraduate Theses

The migration of datasets online has created a near-infinite inventory for big name retailers such as Amazon and Netflix, giving rise to recommendation systems to assist users in navigating the massive catalog. This has also allowed for the possibility of retailers storing much less popular, uncommon items which would not appear in a more traditional brick-and-mortar setting due to the cost of storage. Nevertheless, previous work has highlighted the profit potential which lies in the so-called "long tail'' of niche, unpopular items. Unfortunately, due to the limited amount of data in this subset of the inventory, recommendation systems often struggle …


Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur Jun 2021

Exploring The Use Of Social Media To Infer Relationships Between Demographics, Psychographics And Vaccine Hesitancy, Abhimanyu Kapur

Computer Science Senior Theses

The growing popularity of social media as a platform to obtain information and share one's opinions on various topics makes it a rich source of information for research. In this study, we aimed to develop a framework to infer relationships between demographic and psychographic characteristics of a user and their opinion on a specific narrative - in this case, their stance on taking the COVID-19 vaccine. Twitter was the chosen platform due to the large USA user base and easily available data. Demographic traits included Race, Age, Gender, and Human-vs-Organization Status. Psychographic traits included the Big Five personality traits (Conscientiousness, …


A Multi-Resolution Graph Convolution Network For Contiguous Epitope Prediction, Lisa Oh Jan 2021

A Multi-Resolution Graph Convolution Network For Contiguous Epitope Prediction, Lisa Oh

Dartmouth College Master’s Theses

Computational methods for predicting binding interfaces between antigens and antibodies (epitopes and paratopes) are faster and cheaper than traditional experimental structure determination methods. A sufficiently reliable computational predictor that could scale to large sets of available antibody sequence data could thus inform and expedite many biomedical pursuits, such as better understanding immune responses to vaccination and natural infection and developing better drugs and vaccines. However, current state-of-the-art predictors produce discontiguous predictions, e.g., predicting the epitope in many different spots on an antigen, even though in reality they typically comprise a single localized region. We seek to produce contiguous predicted epitopes, …