Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Data Science

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin Sep 2022

The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin

Dissertations, Theses, and Capstone Projects

An artificial urban shallow lake, Prospect Park Lake (PPL), is situated on a terminal moraine in Brooklyn New York, and supplied with municipal water treated with ortho-phosphates. The constant input of the phosphate nutrient is the primary source of eutrophication in the lake. The numerous pools along the water course houses various aquatic phototrophs, which influence the water quality and the state of the system, driving conditions into favoring the survival of their species. In the first half of the dissertation, the focus of the project is on analyzing how the different primary producers in different regions of PPL affect …


Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv Aug 2022

Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv

Dissertations

Large-scale collection of pedestrian facility (crosswalks, sidewalks, etc.) presence data is vital to the success of efforts to improve pedestrian facility management, safety analysis, and road network planning. However, this kind of data is typically not available on a large scale due to the high labor and time costs that are the result of relying on manual data collection methods. Therefore, methods for automating this process using techniques such as machine learning are currently being explored by researchers. In our work, we mainly focus on machine learning methods for the detection of crosswalks and sidewalks from both aerial and street-view …


Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston Jun 2022

Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston

Computer Science Senior Theses

The ability of patients to understand health-related text is important for optimal health outcomes. A system that can automatically annotate medical entities could help patients better understand health-related text. Such a system would also accelerate manual data annotation for this low-resource domain as well as assist in down- stream medical NLP tasks such as finding textual similarity, identifying conflicting medical advice, and aspect-based sentiment analysis. In this work, we investigate a state-of-the-art entity set expansion model, BootstrapNet, for the task of medical entity classification on a new dataset of medical advice text. We also propose EP SBERT, a simple model …


Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth May 2022

Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth

Theses

Machine learning models have been shown to be vulnerable against various backdoor and data poisoning attacks that adversely affect model behavior. Additionally, these attacks have been shown to make unfair predictions with respect to certain protected features. In federated learning, multiple local models contribute to a single global model communicating only using local gradients, the issue of attacks become more prevalent and complex. Previously published works revolve around solving these issues both individually and jointly. However, there has been little study on the effects of attacks against model fairness. Demonstrated in this work, a flexible attack, which we call Un-Fair …


Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire May 2022

Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire

Honors Theses

The 2021-2022 Signature Performance Design Studio team developed a live audio call-flagging system that enables faster responses and new response pathways to veteran crises by call service representatives and their management team. Using a custom made deep learning model, live audio streaming server, and Teams broadcasting add-on, the system empowers Signature Performance call service representatives to make quicker and more well informed decisions to provide veteran’s the best care possible.


Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali May 2022

Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali

Computational and Data Sciences (PhD) Dissertations

Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …


Beyond Accuracy In Machine Learning., Aneseh Alvanpour May 2022

Beyond Accuracy In Machine Learning., Aneseh Alvanpour

Electronic Theses and Dissertations

Machine Learning (ML) algorithms are widely used in our daily lives. The need to increase the accuracy of ML models has led to building increasingly powerful and complex algorithms known as black-box models which do not provide any explanations about the reasons behind their output. On the other hand, there are white-box ML models which are inherently interpretable while having lower accuracy compared to black-box models. To have a productive and practical algorithmic decision system, precise predictions may not be sufficient. The system may need to have transparency and be able to provide explanations, especially in applications with safety-critical contexts …


New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene May 2022

New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene

Electronic Theses and Dissertations

Recommender Systems are widely used to personalize the user experience in a diverse set of online applications ranging from e-commerce and education to social media and online entertainment. These State of the Art AI systems can suffer from several biases that may occur at different stages of the recommendation life-cycle. For instance, using biased data to train recommendation models may lead to several issues, such as the discrepancy between online and offline evaluation, decreasing the recommendation performance, and hurting the user experience. Bias can occur during the data collection stage where the data inherits the user-item interaction biases, such as …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian Apr 2022

Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian

Northeast Journal of Complex Systems (NEJCS)

In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir Jan 2022

A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir

Dissertations and Theses

Street flooding is problematic in urban areas, where impervious surfaces, such as concrete, brick, and asphalt prevail, impeding the infiltration of water into the ground. During rain events, water ponds and rise to levels that cause considerable economic damage and physical harm. The main goal of this dissertation is to develop novel approaches toward the comprehension of urban flood risk using data science techniques on crowd-sourced data. This is accomplished by developing a series of data-driven models to identify flood factors of significance and localized areas of flood vulnerability in New York City (NYC). First, the infrastructural (catch basin clogs, …


Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy Jan 2022

Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy

Graduate Theses, Dissertations, and Problem Reports

Blood glucose monitoring is a key process in the prevention and management of certain chronic diseases, such as diabetes. Currently, glucose monitoring for those interested in their blood glucose levels are confronted with options that are primarily invasive and relatively costly. A growing topic of note is the development of non-invasive monitoring methods for blood glucose. This development holds a significant promise for improvement to the quality of life of a significant portion of the population and is overall met with great enthusiasm from the scientific community as well as commercial interest. This work aims to develop a potential pipeline …


Facial Landmark Feature Fusion In Transfer Learning Of Child Facial Expressions, Megan A. Witherow, Manar D. Samad, Norou Diawara, Khan M. Iftekharuddin Jan 2022

Facial Landmark Feature Fusion In Transfer Learning Of Child Facial Expressions, Megan A. Witherow, Manar D. Samad, Norou Diawara, Khan M. Iftekharuddin

Electrical & Computer Engineering Faculty Publications

Automatic classification of child facial expressions is challenging due to the scarcity of image samples with annotations. Transfer learning of deep convolutional neural networks (CNNs), pretrained on adult facial expressions, can be effectively finetuned for child facial expression classification using limited facial images of children. Recent work inspired by facial age estimation and age-invariant face recognition proposes a fusion of facial landmark features with deep representation learning to augment facial expression classification performance. We hypothesize that deep transfer learning of child facial expressions may also benefit from fusing facial landmark features. Our proposed model architecture integrates two input branches: a …