Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

2022

Machine learning

Institution
Publication
Publication Type

Articles 1 - 30 of 39

Full-Text Articles in Physical Sciences and Mathematics

Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, Lisamarie Turk Dec 2022

Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, Lisamarie Turk

Nursing ETDs

A problem of classification places adolescents and emerging adults with special health care needs among the most at risk for poor or life-threatening health outcomes. This preliminary proof-of-concept study was conducted to determine if phenotypes of health care transition (HCT) for this vulnerable population could be established. Such phenotypes could support development of future studies that require data classifications as input. Mining of electronic health record data and cluster analysis were implemented to identify phenotypes. Subsequently, a machine learning concept model was developed for predicting acute care and medical condition severity. Three clusters were identified and described (Cluster 1, n …


Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic Dec 2022

Pyseg: A Python Package For 2d Material Flake Localization, Segmentation, And Thickness Prediction, Diana B. Horangic

Student Research Projects

Thin materials are of interest for their extraordinary physical, mechanical, thermal, electrical, and optical properties. Monolayers and bilayers of 2D materials can be manufactured through a variety of exfoliation methods. To determine layer thickness, Raman spectroscopy or other methods like Rayleigh scattering are used. These methods are, however, slow, and they require equipment beyond an optical microscope. A Python package that automates flake identification processes was built, with access solely to RGB data from an optical microscope assumed. My package, pyseg, localizes flakes on a substrate and then makes a rough estimate of their thickness from first principles. It can …


The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher Dec 2022

The Interaction Of Normalisation And Clustering In Sub-Domain Definition For Multi-Source Transfer Learning Based Time Series Anomaly Detection, Matthew Nicholson, Rahul Agrahari, Clare Conran, Haythem Assem, John D. Kelleher

Articles

This paper examines how data normalisation and clustering interact in the definition of sub-domains within multi-source transfer learning systems for time series anomaly detection. The paper introduces a distinction between (i) clustering as a primary/direct method for anomaly detection, and (ii) clustering as a method for identifying sub-domains within the source or target datasets. Reporting the results of three sets of experiments, we find that normalisation after feature extraction and before clustering results in the best performance for anomaly detection. Interestingly, we find that in the multi-source transfer learning scenario clustering on the target dataset and identifying subdomains in the …


Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr. Nov 2022

Design Of Secure Communication Schemes To Provide Authentication And Integrity Among The Iot Devices, Vidya Rao Dr.

Technical Collection

The fast growth in Internet-of-Things (IoT) based applications, has increased the number of end-devices communicating over the Internet. The end devices are made with fewer resources and are low battery-powered. These resource-constrained devices are exposed to various security and privacy concerns over publicly available Internet communication. Thus, it becomes essential to provide lightweight security solutions to safeguard data and user privacy. Elliptic Curve Cryptography (ECC) can be used to generate the digital signature and also encrypt the data. The method can be evaluated on a real-time testbed deployed using Raspberry Pi3 devices and every message transmitted is subjected to ECC. …


The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin Sep 2022

The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin

Dissertations, Theses, and Capstone Projects

An artificial urban shallow lake, Prospect Park Lake (PPL), is situated on a terminal moraine in Brooklyn New York, and supplied with municipal water treated with ortho-phosphates. The constant input of the phosphate nutrient is the primary source of eutrophication in the lake. The numerous pools along the water course houses various aquatic phototrophs, which influence the water quality and the state of the system, driving conditions into favoring the survival of their species. In the first half of the dissertation, the focus of the project is on analyzing how the different primary producers in different regions of PPL affect …


Tempering The Adversary: An Exploration Into The Applications Of Game Theoretic Feature Selection And Regression, Stephen Mcgee Aug 2022

Tempering The Adversary: An Exploration Into The Applications Of Game Theoretic Feature Selection And Regression, Stephen Mcgee

All Dissertations

Most modern machine learning algorithms tend to focus on an "average-case" approach, where every data point contributes the same amount of influence towards calculating the fit of a model. This "per-data point" error (or loss) is averaged together into an overall loss and typically minimized with an objective function. However, this can be insensitive to valuable outliers. Inspired by game theory, the goal of this work is to explore the utility of incorporating an optimally-playing adversary into feature selection and regression frameworks. The adversary assigns weights to the data elements so as to degrade the modeler's performance in an optimal …


Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu Aug 2022

Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu

Electronic Theses and Dissertations

The rise of network connected devices and applications leads to a significant increase in the volume of data that are continuously generated overtime time, called data streams. In real world applications, storing the entirety of a data stream for analyzing later is often not practical, due to the data stream’s potentially infinite volume. Data stream mining techniques and frameworks are therefore created to analyze streaming data as they arrive. However, compared to traditional data mining techniques, challenges unique to data stream mining also emerge, due to the high arrival rate of data streams and their dynamic nature. In this dissertation, …


Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv Aug 2022

Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv

Dissertations

Large-scale collection of pedestrian facility (crosswalks, sidewalks, etc.) presence data is vital to the success of efforts to improve pedestrian facility management, safety analysis, and road network planning. However, this kind of data is typically not available on a large scale due to the high labor and time costs that are the result of relying on manual data collection methods. Therefore, methods for automating this process using techniques such as machine learning are currently being explored by researchers. In our work, we mainly focus on machine learning methods for the detection of crosswalks and sidewalks from both aerial and street-view …


Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi Jun 2022

Applications Of Machine Learning Algorithms In Materials Science And Bioinformatics, Mohammed Quazi

Mathematics & Statistics ETDs

The piezoelectric response has been a measure of interest in density functional theory (DFT) for micro-electromechanical systems (MEMS) since the inception of MEMS technology. Piezoelectric-based MEMS devices find wide applications in automobiles, mobile phones, healthcare devices, and silicon chips for computers, to name a few. Piezoelectric properties of doped aluminum nitride (AlN) have been under investigation in materials science for piezoelectric thin films because of its wide range of device applicability. In this research using rigorous DFT calculations, high throughput ab-initio simulations for 23 AlN alloys are generated.

This research is the first to report strong enhancements of piezoelectric properties …


Models And Machine Learning Techniques For Improving The Planning And Operation Of Electricity Systems In Developing Regions, Santiago Correa Cardona Jun 2022

Models And Machine Learning Techniques For Improving The Planning And Operation Of Electricity Systems In Developing Regions, Santiago Correa Cardona

Doctoral Dissertations

The enormous innovation in computational intelligence has disrupted the traditional ways we solve the main problems of our society and allowed us to make more data-informed decisions. Energy systems and the ways we deliver electricity are not exceptions to this trend: cheap and pervasive sensing systems and new communication technologies have enabled the collection of large amounts of data that are being used to monitor and predict in real-time the behavior of this infrastructure. Bringing intelligence to the power grid creates many opportunities to integrate new renewable energy sources more efficiently, facilitate grid planning and expansion, improve reliability, optimize electricity …


Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston Jun 2022

Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston

Computer Science Senior Theses

The ability of patients to understand health-related text is important for optimal health outcomes. A system that can automatically annotate medical entities could help patients better understand health-related text. Such a system would also accelerate manual data annotation for this low-resource domain as well as assist in down- stream medical NLP tasks such as finding textual similarity, identifying conflicting medical advice, and aspect-based sentiment analysis. In this work, we investigate a state-of-the-art entity set expansion model, BootstrapNet, for the task of medical entity classification on a new dataset of medical advice text. We also propose EP SBERT, a simple model …


Exploring The Effectiveness Of Multiple-Exemplar Training For Visual Analysis Of Ab-Design Graphs, Verena S. Bethke Jun 2022

Exploring The Effectiveness Of Multiple-Exemplar Training For Visual Analysis Of Ab-Design Graphs, Verena S. Bethke

Dissertations, Theses, and Capstone Projects

In behavior analysis, data are usually analyzed using visual analysis of the graphed data. There are a wide range of methods used to visually analyze data, from a basic ‘textbook’ style approach to the use of visual aids, decision-rubrics, and computer-based approaches. In the literature, there have been some comparisons of the efficacy of different approaches. Visual analysis as a behavior can be taught using a variety of methods, independent of how the skill itself is to be performed. Teaching methods include lecture, online instruction, and equivalence-based instruction. There is not much research on the teaching of visual analysis specifically, …


A Comparison Of Machine Learning Techniques For Validating Students’ Proficiency In Mathematics, Alexander Avdeev Jun 2022

A Comparison Of Machine Learning Techniques For Validating Students’ Proficiency In Mathematics, Alexander Avdeev

Dissertations, Theses, and Capstone Projects

A principal goal of this project was to compare several machine learning (ML) algorithms to explore and validate math proficiency classifications based on standardized test scores. The data used in these analyses came from the 6th-grade students’ mathematics assessment records of the New York State Education Department’s Testing Program (NYSTP). Our approach was to test a number of competing machine learning (ML) algorithms for classifying students’ as proficient based on their test scores and other demographic information. Our samples were drawn from the 2016 test-taking cohort of 6th-grade students (N=156,800). Five classifiers including multinominal logistic regression (MLR), XGBoost, Tree-As, Lagrangian …


Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth May 2022

Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth

Theses

Machine learning models have been shown to be vulnerable against various backdoor and data poisoning attacks that adversely affect model behavior. Additionally, these attacks have been shown to make unfair predictions with respect to certain protected features. In federated learning, multiple local models contribute to a single global model communicating only using local gradients, the issue of attacks become more prevalent and complex. Previously published works revolve around solving these issues both individually and jointly. However, there has been little study on the effects of attacks against model fairness. Demonstrated in this work, a flexible attack, which we call Un-Fair …


Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire May 2022

Real Time Call-Flagging System To Respond To Suicidal Ideation In Call Centers, Vishnu Menon, Joseph Carrigan, Charles Floeder, Thomas Walton, Devin Mcguire

Honors Theses

The 2021-2022 Signature Performance Design Studio team developed a live audio call-flagging system that enables faster responses and new response pathways to veteran crises by call service representatives and their management team. Using a custom made deep learning model, live audio streaming server, and Teams broadcasting add-on, the system empowers Signature Performance call service representatives to make quicker and more well informed decisions to provide veteran’s the best care possible.


Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti May 2022

Generating A Dataset For Comparing Linear Vs. Non-Linear Prediction Methods In Education Research, Jack Mauro, Elena Martinez, Anna Bargagliotti

Honors Thesis

Machine learning is often used to build predictive models by extracting patterns from large data sets. Such techniques are increasingly being utilized to predict outcomes in the social sciences. One such application is predicting student success. Machine learning can be applied to predicting student acceptance and success in academia. Using these tools for education-related data analysis, may enable the evaluation of programs, resources and curriculum. Currently, research is needed to examine application, admissions, and retention data in order to address equity in college computer science programs. However, most student-level data sets contain sensitive data that cannot be made public. To …


Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali May 2022

Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali

Computational and Data Sciences (PhD) Dissertations

Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …


Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo May 2022

Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo

Senior Honors Papers / Undergraduate Theses

Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation …


Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii May 2022

Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii

Undergraduate Honors Theses

Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …


Beyond Accuracy In Machine Learning., Aneseh Alvanpour May 2022

Beyond Accuracy In Machine Learning., Aneseh Alvanpour

Electronic Theses and Dissertations

Machine Learning (ML) algorithms are widely used in our daily lives. The need to increase the accuracy of ML models has led to building increasingly powerful and complex algorithms known as black-box models which do not provide any explanations about the reasons behind their output. On the other hand, there are white-box ML models which are inherently interpretable while having lower accuracy compared to black-box models. To have a productive and practical algorithmic decision system, precise predictions may not be sufficient. The system may need to have transparency and be able to provide explanations, especially in applications with safety-critical contexts …


New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene May 2022

New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene

Electronic Theses and Dissertations

Recommender Systems are widely used to personalize the user experience in a diverse set of online applications ranging from e-commerce and education to social media and online entertainment. These State of the Art AI systems can suffer from several biases that may occur at different stages of the recommendation life-cycle. For instance, using biased data to train recommendation models may lead to several issues, such as the discrepancy between online and offline evaluation, decreasing the recommendation performance, and hurting the user experience. Bias can occur during the data collection stage where the data inherits the user-item interaction biases, such as …


New Accurate, Explainable, And Unbiased Machine Learning Models For Recommendation With Implicit Feedback., Khalil Damak May 2022

New Accurate, Explainable, And Unbiased Machine Learning Models For Recommendation With Implicit Feedback., Khalil Damak

Electronic Theses and Dissertations

Recommender systems have become ubiquitous Artificial Intelligence (AI) tools that play an important role in filtering online information in our daily lives. Whether we are shopping, browsing movies, or listening to music online, AI recommender systems are working behind the scene to provide us with curated and personalized content, that has been predicted to be relevant to our interest. The increasing prevalence of recommender systems has challenged researchers to develop powerful algorithms that can deliver recommendations with increasing accuracy. In addition to the predictive accuracy of recommender systems, recent research has also started paying attention to their fairness, in particular …


Nucleate Boiling Under Different Gravity Values: Numerical Simulations & Data-Driven Techniques., Sandipan Banerjee May 2022

Nucleate Boiling Under Different Gravity Values: Numerical Simulations & Data-Driven Techniques., Sandipan Banerjee

Electronic Theses and Dissertations

Nucleate boiling is important in nuclear applications and cooling applications under earth gravity conditions. Under reduced gravity or microgravity environment, it is significant too, especially in space exploration applications. Although multiple studies have been performed on nucleate boiling, the effect of gravity on nucleate boiling is not well understood. This dissertation primarily deals with numerical simulations of nucleate boiling using an adaptive Moment-of-Fluid (MoF) method for a single vapor bubble (water vapor or Perfluoro-n-hexane) in saturated liquid for different gravity levels. Results concerning the growth rate of the bubble, specifically the departure diameter and departure time have been provided. The …


Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu Apr 2022

Early-Warning Alert Systems For Financial-Instability Detection: An Hmm-Driven Approach, Xing Gu

Electronic Thesis and Dissertation Repository

Regulators’ early intervention is crucial when the financial system is experiencing difficulties. Financial stability must be preserved to avert banks’ bailouts, which hugely drain government's financial resources. Detecting in advance periods of financial crisis entails the development and customisation of accurate and robust quantitative techniques. The goal of this thesis is to construct automated systems via the interplay of various mathematical and statistical methodologies to signal financial instability episodes in the near-term horizon. These signal alerts could provide regulatory bodies with the capacity to initiate appropriate response that will thwart or at least minimise the occurrence of a financial crisis. …


Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano Apr 2022

Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano

Electrical and Computer Engineering ETDs

Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …


Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian Apr 2022

Toward Suicidal Ideation Detection With Lexical Network Features And Machine Learning, Ulya Bayram, William Lee, Daniel Santel, Ali Minai, Peggy Clark, Tracy Glauser, John Pestian

Northeast Journal of Complex Systems (NEJCS)

In this study, we introduce a new network feature for detecting suicidal ideation from clinical texts and conduct various additional experiments to enrich the state of knowledge. We evaluate statistical features with and without stopwords, use lexical networks for feature extraction and classification, and compare the results with standard machine learning methods using a logistic classifier, a neural network, and a deep learning method. We utilize three text collections. The first two contain transcriptions of interviews conducted by experts with suicidal (n=161 patients that experienced severe ideation) and control subjects (n=153). The third collection consists of interviews conducted by experts …


Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan Mar 2022

Volitional Control Of Lower-Limb Prosthesis With Vision-Assisted Environmental Awareness, S M Shafiul Hasan

FIU Electronic Theses and Dissertations

Early and reliable prediction of user’s intention to change locomotion mode or speed is critical for a smooth and natural lower limb prosthesis. Meanwhile, incorporation of explicit environmental feedback can facilitate context aware intelligent prosthesis which allows seamless operation in a variety of gait demands. This dissertation introduces environmental awareness through computer vision and enables early and accurate prediction of intention to start, stop or change speeds while walking. Electromyography (EMG), Electroencephalography (EEG), Inertial Measurement Unit (IMU), and Ground Reaction Force (GRF) sensors were used to predict intention to start, stop or increase walking speed. Furthermore, it was investigated whether …


Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu Mar 2022

Telemetry Data Mining For Unmanned Aircraft Systems, Li Yu

Theses and Dissertations

With ever more data becoming available to the US Air Force, it is vital to develop effective methods to leverage this strategic asset. Machine learning (ML) techniques present a means of meeting this challenge, as these tools have demonstrated successful use in commercial applications. For this research, three ML methods were applied to a unmanned aircraft system (UAS) telemetry dataset with the aim of extracting useful insight related to phases of flight. It was shown that ML provides an advantage in exploratory data analysis and as well as classification of phases. Neural network models demonstrated the best performance with over …


Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher Jan 2022

Assessing Feature Representations For Instance-Based Cross-Domain Anomaly Detection In Cloud Services Univariate Time Series Data, Rahul Agrahari, Matthew Nicholson, Clare Conran, Haythem Assem, John D. Kelleher

Articles

In this paper, we compare and assess the efficacy of a number of time-series instance feature representations for anomaly detection. To assess whether there are statistically significant differences between different feature representations for anomaly detection in a time series, we calculate and compare confidence intervals on the average performance of different feature sets across a number of different model types and cross-domain time-series datasets. Our results indicate that the catch22 time-series feature set augmented with features based on rolling mean and variance performs best on average, and that the difference in performance between this feature set and the next best …


Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik Jan 2022

Development Of Advanced Machine Learning Models For Analysis Of Plutonium Surrogate Optical Emission Spectra, Ashwin P. Rao, Phillip R. Jenkins, John D. Auxier Ii, Michael B. Shattan, Anil Patnaik

Faculty Publications

This work investigates and applies machine learning paradigms seldom seen in analytical spectroscopy for quantification of gallium in cerium matrices via processing of laser-plasma spectra. Ensemble regressions, support vector machine regressions, Gaussian kernel regressions, and artificial neural network techniques are trained and tested on cerium-gallium pellet spectra. A thorough hyperparameter optimization experiment is conducted initially to determine the best design features for each model. The optimized models are evaluated for sensitivity and precision using the limit of detection (LoD) and root mean-squared error of prediction (RMSEP) metrics, respectively. Gaussian kernel regression yields the superlative predictive model with an RMSEP of …