Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Applied Statistics

PDF

Institution
Keyword
Publication Year
Publication
Publication Type

Articles 1 - 30 of 124

Full-Text Articles in Computer Sciences

Judging Our New Judges: Why We Must Remove Artificial Intelligence From Our Courtrooms Now, Kieran Duffy Newcomb Jan 2024

Judging Our New Judges: Why We Must Remove Artificial Intelligence From Our Courtrooms Now, Kieran Duffy Newcomb

Honors Theses and Capstones

In this paper, I explore some of the ways in which artificial intelligence might enhance the sentencing process through recidivism prediction technology. Notably, this technology can increase the accuracy of risk predictions and the speed with which sentencing decisions are reached. I then show, however, that the recidivism prediction technology is likely to turn into what data scientist Cathy O’Neil calls a Weapon of Math Destruction. The potential harmfulness of this technology is due not to the inherent nature of the technology, but the symbiotic relationship it will have with our already harmful criminal justice system. I argue that the …


Applications Of Independent And Identically Distributed (Iid) Random Processes In Polarimetry And Climatology, Dan Kestner Jan 2024

Applications Of Independent And Identically Distributed (Iid) Random Processes In Polarimetry And Climatology, Dan Kestner

Dissertations, Master's Theses and Master's Reports

The unifying theme of this thesis is the characterization of “perfect randomness,” i.e., independent and identically distributed (IID) stochastic processes as these are applied in physical science. Two specific and mathematically distinct applications are chosen: (i) Radar and optical polarimetry; (ii) Analysis of time series in meteorology. In (i), IID process of a special kind, namely, with a distribution defined by symmetry, is used to link its multivariate Gaussian density to uniformity on the Poincaré sphere. This “statistical ellipsometry” approach is then used to relate polarimetric mismatches or imbalances to ellipsometric variables and suitably chosen cross-correlation measures. In (ii), recently …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Foundations Of Memory Capacity In Models Of Neural Cognition, Chandradeep Chowdhury Dec 2023

Foundations Of Memory Capacity In Models Of Neural Cognition, Chandradeep Chowdhury

Master's Theses

A central problem in neuroscience is to understand how memories are formed as a result of the activities of neurons. Valiant’s neuroidal model attempted to address this question by modeling the brain as a random graph and memories as subgraphs within that graph. However the question of memory capacity within that model has not been explored: how many memories can the brain hold? Valiant introduced the concept of interference between memories as the defining factor for capacity; excessive interference signals the model has reached capacity. Since then, exploration of capacity has been limited, but recent investigations have delved into the …


Statistical And Machine Learning Approaches To Describe Factors Affecting Preweaning Mortality Of Piglets, Md Towfiqur Rahman, Tami M. Brown-Brandl, Gary A. Rohrer, Sudhendu R. Sharma, Vamsi Manthena, Yeyin Shi Oct 2023

Statistical And Machine Learning Approaches To Describe Factors Affecting Preweaning Mortality Of Piglets, Md Towfiqur Rahman, Tami M. Brown-Brandl, Gary A. Rohrer, Sudhendu R. Sharma, Vamsi Manthena, Yeyin Shi

Department of Biological Systems Engineering: Papers and Publications

High preweaning mortality (PWM) rates for piglets are a significant concern for the worldwide pork industries, causing economic loss and well-being issues. This study focused on identifying the factors affecting PWM, overlays, and predicting PWM using historical production data with statistical and machine learning models. Data were collected from 1,982 litters from the United States Meat Animal Research Center, Nebraska, over the years 2016 to 2021. Sows were housed in a farrowing building with three rooms, each with 20 farrowing crates, and taken care of by well-trained animal caretakers. A generalized linear model was used to analyze the various sow, …


Reducing Uncertainty In Sea-Level Rise Prediction: A Spatial-Variability-Aware Approach, Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian Oct 2023

Reducing Uncertainty In Sea-Level Rise Prediction: A Spatial-Variability-Aware Approach, Subhankar Ghosh, Shuai An, Arun Sharma, Jayant Gupta, Shashi Shekhar, Aneesh Subramanian

I-GUIDE Forum

Given multi-model ensemble climate projections, the goal is to accurately and reliably predict future sea-level rise while lowering the uncertainty. This problem is important because sea-level rise affects millions of people in coastal communities and beyond due to climate change's impacts on polar ice sheets and the ocean. This problem is challenging due to spatial variability and unknowns such as possible tipping points (e.g., collapse of Greenland or West Antarctic ice-shelf), climate feedback loops (e.g., clouds, permafrost thawing), future policy decisions, and human actions. Most existing climate modeling approaches use the same set of weights globally, during either regression or …


Multi-Representation Variational Autoencoder Via Iterative Latent Attention And Implicit Differentiation, Nhu Thuat Tran, Hady Wirawan Lauw Oct 2023

Multi-Representation Variational Autoencoder Via Iterative Latent Attention And Implicit Differentiation, Nhu Thuat Tran, Hady Wirawan Lauw

Research Collection School Of Computing and Information Systems

Variational Autoencoder (VAE) offers a non-linear probabilistic modeling of user's preferences. While it has achieved remarkable performance at collaborative filtering, it typically samples a single vector for representing user's preferences, which may be insufficient to capture the user's diverse interests. Existing solutions extend VAE to model multiple interests of users by resorting a variant of self-attentive method, i.e., employing prototypes to group items into clusters, each capturing one topic of user's interests. Despite showing improvements, the current design could be more effective since prototypes are randomly initialized and shared across users, resulting in uninformative and non-personalized clusters.To fill the gap, …


Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner Aug 2023

Cannabidiol Tweet Miner: A Framework For Identifying Misinformation In Cbd Tweets., Jason Turner

Electronic Theses and Dissertations

As regulations surrounding cannabis continue to develop, the demand for cannabis-based products is on the rise. Despite not producing the psychoactive effects commonly associated with THC, products containing cannabidiol (CBD) have gained immense popularity in recent years as a potential treatment option for a range of conditions, particularly those associated with pain or sleep disorders. However, due to current federal policies, these products have yet to undergo comprehensive safety and efficacy testing. Fortunately, utilizing advanced natural language processing (NLP) techniques, data harvested from social networks have been employed to investigate various social trends within healthcare, such as disease tracking and …


Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan May 2023

Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan

Computer Science Senior Theses

We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …


Statistical Methods To Generate Artificial Slot Floor Data For The Advancement Of Casino Related Research, Courtney Bonner, Anastasia (Stasi) D. Baran, Jason D. Fiege, Saman Muthukumarana May 2023

Statistical Methods To Generate Artificial Slot Floor Data For The Advancement Of Casino Related Research, Courtney Bonner, Anastasia (Stasi) D. Baran, Jason D. Fiege, Saman Muthukumarana

International Conference on Gambling & Risk Taking

Abstract:

A common difficulty when researching gambling topics is the availability of high-quality data sets for development and testing. Due to the high level of secrecy within the gambling industry, if data is obtained for research purposes it is often prohibitively obfuscated, incomplete, or aggregated. Although these data have allowed for advancement in academic work, it leaves both the researchers and readers left wondering about what would be possible if more detailed data sets were available. To mitigate the paucity of data available to researchers, we present a Markov chain-based statistical process for producing artificial event data for a simulated …


Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile May 2023

Optimizing Tumor Xenograft Experiments Using Bayesian Linear And Nonlinear Mixed Modelling And Reinforcement Learning, Mary Lena Bleile

Statistical Science Theses and Dissertations

Tumor xenograft experiments are a popular tool of cancer biology research. In a typical such experiment, one implants a set of animals with an aliquot of the human tumor of interest, applies various treatments of interest, and observes the subsequent response. Efficient analysis of the data from these experiments is therefore of utmost importance. This dissertation proposes three methods for optimizing cancer treatment and data analysis in the tumor xenograft context. The first of these is applicable to tumor xenograft experiments in general, and the second two seek to optimize the combination of radiotherapy with immunotherapy in the tumor xenograft …


Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski May 2023

Uconn Baseball Batting Order Optimization, Gavin Rublewski, Gavin Rublewski

Honors Scholar Theses

Challenging conventional wisdom is at the very core of baseball analytics. Using data and statistical analysis, the sets of rules by which coaches make decisions can be justified, or possibly refuted. One of those sets of rules relates to the construction of a batting order. Through data collection, data adjustment, the construction of a baseball simulator, and the use of a Monte Carlo Simulation, I have assessed thousands of possible batting orders to determine the roster-specific strategies that lead to optimal run production for the 2023 UConn baseball team. This paper details a repeatable process in which basic player statistics …


Interpretable Learning In Multivariate Big Data Analysis For Network Monitoring, José Camacho, Rasmus Bro, David Kotz Apr 2023

Interpretable Learning In Multivariate Big Data Analysis For Network Monitoring, José Camacho, Rasmus Bro, David Kotz

Dartmouth Scholarship

There is an increasing interest in the development of new data-driven models useful to assess the performance of communication networks. For many applications, like network monitoring and troubleshooting, a data model is of little use if it cannot be interpreted by a human operator. In this paper, we present an extension of the Multivariate Big Data Analysis (MBDA) methodology, a recently proposed interpretable data analysis tool. In this extension, we propose a solution to the automatic derivation of features, a cornerstone step for the application of MBDA when the amount of data is massive. The resulting network monitoring approach allows …


A Graphical User Interface Using Spatiotemporal Interpolation To Determine Fine Particulate Matter Values In The United States, Kelly M. Entrekin Apr 2023

A Graphical User Interface Using Spatiotemporal Interpolation To Determine Fine Particulate Matter Values In The United States, Kelly M. Entrekin

Honors College Theses

Fine particulate matter or PM2.5 can be described as a pollution particle that has a diameter of 2.5 micrometers or smaller. These pollution particle values are measured by monitoring sites installed across the United States throughout the year. While these values are helpful, a lot of areas are not accounted for as scientists are not able to measure all of the United States. Some of these unmeasured regions could be reaching high PM2.5 values over time without being aware of it. These high values can be dangerous by causing or worsening health conditions, such as cardiovascular and lung diseases. Within …


Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn Mar 2023

Self-Learning Algorithms For Intrusion Detection And Prevention Systems (Idps), Juan E. Nunez, Roger W. Tchegui Donfack, Rohit Rohit, Hayley Horn

SMU Data Science Review

Today, there is an increased risk to data privacy and information security due to cyberattacks that compromise data reliability and accessibility. New machine learning models are needed to detect and prevent these cyberattacks. One application of these models is cybersecurity threat detection and prevention systems that can create a baseline of a network's traffic patterns to detect anomalies without needing pre-labeled data; thus, enabling the identification of abnormal network events as threats. This research explored algorithms that can help automate anomaly detection on an enterprise network using Canadian Institute for Cybersecurity data. This study demonstrates that Neural Networks with Bayesian …


Nearby Galaxies: Modelling Star Formation Histories And Contamination By Unresolved Background Galaxies, Hadi Papei Jan 2023

Nearby Galaxies: Modelling Star Formation Histories And Contamination By Unresolved Background Galaxies, Hadi Papei

Electronic Thesis and Dissertation Repository

Galaxies are complex systems of stars, gas, dust, and dark matter which evolve over billions of years, and one of the main goals of astrophysics is to understand how these complex systems form and change. Measuring the star formation history of nearby galaxies, in which thousands of stars can be resolved individually, has provided us with a clear picture of their evolutionary history and the evolution of galaxies in general.

In this work, we have developed the first public Python package, SFHPy, to measure star formation histories of nearby galaxies using their colour-magnitude diagrams. In this algorithm, an observed colour-magnitude …


Knowledge Discovery On The Integrative Analysis Of Electrical And Mechanical Dyssynchrony To Improve Cardiac Resynchronization Therapy, Zhuo He Jan 2023

Knowledge Discovery On The Integrative Analysis Of Electrical And Mechanical Dyssynchrony To Improve Cardiac Resynchronization Therapy, Zhuo He

Dissertations, Master's Theses and Master's Reports

Cardiac resynchronization therapy (CRT) is a standard method of treating heart failure by coordinating the function of the left and right ventricles. However, up to 40% of CRT recipients do not experience clinical symptoms or cardiac function improvements. The main reasons for CRT non-response include: (1) suboptimal patient selection based on electrical dyssynchrony measured by electrocardiogram (ECG) in current guidelines; (2) mechanical dyssynchrony has been shown to be effective but has not been fully explored; and (3) inappropriate placement of the CRT left ventricular (LV) lead in a significant number of patients.

In terms of mechanical dyssynchrony, we utilize an …


Investigating Collaborative Explainable Ai (Cxai)/Social Forum As An Explainable Ai (Xai) Method In Autonomous Driving (Ad), Tauseef Ibne Mamun Jan 2023

Investigating Collaborative Explainable Ai (Cxai)/Social Forum As An Explainable Ai (Xai) Method In Autonomous Driving (Ad), Tauseef Ibne Mamun

Dissertations, Master's Theses and Master's Reports

Explainable AI (XAI) systems primarily focus on algorithms, integrating additional information into AI decisions and classifications to enhance user or developer comprehension of the system's behavior. These systems often incorporate untested concepts of explainability, lacking grounding in the cognitive and educational psychology literature (S. T. Mueller et al., 2021). Consequently, their effectiveness may be limited, as they may address problems that real users don't encounter or provide information that users do not seek.

In contrast, an alternative approach called Collaborative XAI (CXAI), as proposed by S. Mueller et al (2021), emphasizes generating explanations without relying solely on algorithms. CXAI centers …


Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler Sep 2022

Application Of Probabilistic Ranking Systems On Women’S Junior Division Beach Volleyball, Cameron Stewart, Michael Mazel, Bivin Sadler

SMU Data Science Review

Women’s beach volleyball is one of the fastest growing collegiate sports today. The increase in popularity has come with an increase in valuable scholarship opportunities across the country. With thousands of athletes to sort through, college scouts depend on websites that aggregate tournament results and rank players nationally. This project partnered with the company Volleyball Life, who is the current market leader in the ranking space of junior beach volleyball players. Utilizing the tournament information provided by Volleyball Life, this study explored replacements to the current ranking systems, which are designed to aggregate player points from recent tournament placements. Three …


Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero Aug 2022

Better Understanding Genomic Architecture With The Use Of Applied Statistics And Explainable Artificial Intelligence, Jonathon C. Romero

Doctoral Dissertations

With the continuous improvements in biological data collection, new techniques are needed to better understand the complex relationships in genomic and other biological data sets. Explainable Artificial Intelligence (X-AI) techniques like Iterative Random Forest (iRF) excel at finding interactions within data, such as genomic epistasis. Here, the introduction of new methods to mine for these complex interactions is shown in a variety of scenarios. The application of iRF as a method for Genomic Wide Epistasis Studies shows that the method is robust in finding interacting sets of features in synthetic data, without requiring the exponentially increasing computation time of many …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore Feb 2022

Session 5: Equipment Finance Credit Risk Modeling - A Case Study In Creative Model Development & Nimble Data Engineering, Edward Krueger, Landon Thompson, Josh Moore

SDSU Data Science Symposium

This presentation will focus first on providing an overview of Channel and the Risk Analytics team that performed this case study. Given that context, we’ll then dive into our approach for building the modeling development data set, techniques and tools used to develop and implement the model into a production environment, and some of the challenges faced upon launch. Then, the presentation will pivot to the data engineering pipeline. During this portion, we will explore the application process and what happens to the data we collect. This will include how we extract & store the data along with how it …


Colonial Markets, Consumers, And Trade: A Comparative Analysis Of Historic Ceramics From The Bluefields Bay Area, Westmoreland, Jamaica, Lacy Risner Jan 2022

Colonial Markets, Consumers, And Trade: A Comparative Analysis Of Historic Ceramics From The Bluefields Bay Area, Westmoreland, Jamaica, Lacy Risner

Murray State Theses and Dissertations

The ceramic assemblages from a British colonial settlement in Bluefields Bay, Jamaica, provide a unique window into the market availability, exchange routes, and consumption patterns of the eighteenth century. This study compares the historic ceramics collected from two sites in Bluefields Bay to one another and to other intra-island (Jamaica), intraregional (Lesser Antilles), and international (North America) colonial and postcolonial sites to reveal patterns of individual and global ceramic consumption and distribution in the emergent capitalist networks and markets of the colonial era. Integrating small British colonial sites into the networks of other more extensive studies focusing primarily on plantations …


Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg Jan 2022

Reinforcement Learning: Low Discrepancy Action Selection For Continuous States And Actions, Jedidiah Lindborg

Electronic Theses and Dissertations

In reinforcement learning the process of selecting an action during the exploration or exploitation stage is difficult to optimize. The purpose of this thesis is to create an action selection process for an agent by employing a low discrepancy action selection (LDAS) method. This should allow the agent to quickly determine the utility of its actions by prioritizing actions that are dissimilar to ones that it has already picked. In this way the learning process should be faster for the agent and result in more optimal policies.


A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky Jan 2022

A Simple Algorithm For Generating A New Two Sample Type-Ii Progressive Censoring With Applications, E. M. Shokr, Rashad Mohamed El-Sagheer, Mahmoud Mansour, H. M. Faied, B. S. El-Desouky

Basic Science Engineering

In this article, we introduce a simple algorithm to generating a new type-II progressive censoring scheme for two samples. It is observed that the proposed algorithm can be applied for any continues probability distribution. Moreover, the description model and necessary assumptions are discussed. In addition, the steps of simple generation algorithm along with programming steps are also constructed on real example. The inference of two Weibull Frechet populations are discussed under the proposed algorithm. Both classical and Bayesian inferential approaches of the distribution parameters are discussed. Furthermore, approximate confidence intervals are constructed based on the asymptotic distribution of the maximum …


Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim Dec 2021

Integration Of Blockchain Technology Into Automobiles To Prevent And Study The Causes Of Accidents, John Kim

Electronic Theses, Projects, and Dissertations

Automobile collisions occur daily. We now live in an information-driven world, one where technology is quickly evolving. Blockchain technology can change the automotive industry, the safety of the motoring public and its surrounding environment by incorporating this vast array of information. It can place safety and efficiency at the forefront to pedestrians, public establishments, and provide public agencies with pertinent information securely and efficiently. Other industries where Blockchain technology has been effective in are as follows: supply chain management, logistics, and banking. This paper reviews some statistical information regarding automobile collisions, Blockchain technology, Smart Contracts, Smart Cities; assesses the feasibility …


Risk-Based Machine Learning Approaches For Probabilistic Transient Stability, Umair Shahzad Dec 2021

Risk-Based Machine Learning Approaches For Probabilistic Transient Stability, Umair Shahzad

Department of Electrical and Computer Engineering: Dissertations, Theses, and Student Research

Power systems are getting more complex than ever and are consequently operating close to their limit of stability. Moreover, with the increasing demand of renewable wind generation, and the requirement to maintain a secure power system, the importance of transient stability cannot be overestimated. Considering its significance in power system security, it is important to propose a different approach for enhancing the transient stability, considering uncertainties. Current deterministic industry practices of transient stability assessment ignore the probabilistic nature of variables (fault type, fault location, fault clearing time, etc.). These approaches typically provide a conservative criterion and can result in expensive …


Shape-Based Classification Of Partially Observed Curves, With Applications To Anthropology, Gregory J. Matthews, Karthik Bharath, Sebastian Kurtek, Juliet K. Brophy, George K. Thiruvathukal, Ofer Harel Oct 2021

Shape-Based Classification Of Partially Observed Curves, With Applications To Anthropology, Gregory J. Matthews, Karthik Bharath, Sebastian Kurtek, Juliet K. Brophy, George K. Thiruvathukal, Ofer Harel

Computer Science: Faculty Publications and Other Works

We consider the problem of classifying curves when they are observed only partially on their parameter domains. We propose computational methods for (i) completion of partially observed curves; (ii) assessment of completion variability through a nonparametric multiple imputation procedure; (iii) development of nearest neighbor classifiers compatible with the completion techniques. Our contributions are founded on exploiting the geometric notion of shape of a curve, defined as those aspects of a curve that remain unchanged under translations, rotations and reparameterizations. Explicit incorporation of shape information into the computational methods plays the dual role of limiting the set of all possible completions …


Applying Emotional Analysis For Automated Content Moderation, John Shelnutt May 2021

Applying Emotional Analysis For Automated Content Moderation, John Shelnutt

Computer Science and Computer Engineering Undergraduate Honors Theses

The purpose of this project is to explore the effectiveness of emotional analysis as a means to automatically moderate content or flag content for manual moderation in order to reduce the workload of human moderators in moderating toxic content online. In this context, toxic content is defined as content that features excessive negativity, rudeness, or malice. This often features offensive language or slurs. The work involved in this project included creating a simple website that imitates a social media or forum with a feed of user submitted text posts, implementing an emotional analysis algorithm from a word emotions dataset, designing …