Open Access. Powered by Scholars. Published by Universities.®
- Institution
-
- Dartmouth College (5)
- Virginia Commonwealth University (4)
- New Jersey Institute of Technology (3)
- University of Louisville (3)
- University of Massachusetts Amherst (3)
-
- West Virginia University (3)
- Chapman University (2)
- City University of New York (CUNY) (2)
- Clemson University (2)
- East Tennessee State University (2)
- Louisiana State University (2)
- Marshall University (2)
- University of Kentucky (2)
- University of Montana (2)
- Western University (2)
- Air Force Institute of Technology (1)
- Colby College (1)
- DePaul University (1)
- Embry-Riddle Aeronautical University (1)
- Georgia Southern University (1)
- John Carroll University (1)
- Kennesaw State University (1)
- Michigan Technological University (1)
- Minnesota State University, Mankato (1)
- Missouri State University (1)
- The Texas Medical Center Library (1)
- The University of Southern Mississippi (1)
- University of Arkansas, Fayetteville (1)
- University of New Mexico (1)
- University of Tennessee, Knoxville (1)
- Publication
-
- Electronic Theses and Dissertations (5)
- Theses and Dissertations (5)
- Dissertations (4)
- Doctoral Dissertations (4)
- Computer Science Senior Theses (3)
-
- Graduate Theses, Dissertations, and Problem Reports (3)
- Electronic Thesis and Dissertation Repository (2)
- Graduate Student Theses, Dissertations, & Professional Papers (2)
- LSU Doctoral Dissertations (2)
- Theses and Dissertations--Computer Science (2)
- Theses, Dissertations and Capstones (2)
- All Dissertations (1)
- All Graduate Theses, Dissertations, and Other Capstone Projects (1)
- All Theses (1)
- College of Computing and Digital Media Dissertations (1)
- Computational and Data Sciences (MS) Theses (1)
- Computational and Data Sciences (PhD) Dissertations (1)
- Dartmouth College Master’s Theses (1)
- Dartmouth College Undergraduate Theses (1)
- Dissertations & Theses (Open Access) (1)
- Dissertations and Theses (1)
- Dissertations, Master's Theses and Master's Reports (1)
- Dissertations, Theses, and Capstone Projects (1)
- Doctoral Dissertations and Master's Theses (1)
- Electrical and Computer Engineering ETDs (1)
- Graduate Theses and Dissertations (1)
- Honors Theses (1)
- MSU Graduate Theses (1)
- Master of Science in Computer Science Theses (1)
- Senior Honors Papers / Undergraduate Theses (1)
Articles 1 - 30 of 57
Full-Text Articles in Data Science
Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan
Automated Identification And Mapping Of Interesting Mineral Spectra In Crism Images, Arun M. Saranathan
Doctoral Dissertations
The Compact Reconnaissance Imaging Spectrometer for Mars (CRISM) has proven to be an invaluable tool for the mineralogical analysis of the Martian surface. It has been crucial in identifying and mapping the spatial extents of various minerals. Primarily, the identification and mapping of these mineral spectral-shapes have been performed manually. Given the size of the CRISM image dataset, manual analysis of the full dataset would be arduous/infeasible. This dissertation attempts to address this issue by describing an (machine learning based) automated processing pipeline for CRISM data that can be used to identify and map the unique mineral signatures present in …
Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia
Data To Science With Ai And Human-In-The-Loop, Gustavo Perez Sarabia
Doctoral Dissertations
AI has the potential to accelerate scientific discovery by enabling scientists to analyze vast datasets more efficiently than traditional methods. For example, this thesis considers the detection of star clusters in high-resolution images of galaxies taken from space telescopes, as well as studying bird migration from RADAR images. In these applications, the goal is to make measurements to answer scientific questions, such as how the star formation rate is affected by mass, or how the phenology of bird migration is influenced by climate change. However, current computer vision systems are far from perfect for conducting these measurements directly. They may …
Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry
Adaptive Multi-Label Classification On Drifting Data Streams, Martha Roseberry
Theses and Dissertations
Drifting data streams and multi-label data are both challenging problems. When multi-label data arrives as a stream, the challenges of both problems must be addressed along with additional challenges unique to the combined problem. Algorithms must be fast and flexible, able to match both the speed and evolving nature of the stream. We propose four methods for learning from multi-label drifting data streams. First, a multi-label k Nearest Neighbors with Self Adjusting Memory (ML-SAM-kNN) exploits short- and long-term memories to predict the current and evolving states of the data stream. Second, a punitive k nearest neighbors algorithm with a self-adjusting …
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Learning Mortality Risk For Covid-19 Using Machine Learning And Statistical Methods, Shaoshi Zhang
Electronic Thesis and Dissertation Repository
This research investigates the mortality risk of COVID-19 patients across different variant waves, using the data from Centers for Disease Control and Prevention (CDC) websites. By analyzing the available data, including patient medical records, vaccination rates, and hospital capacities, we aim to discern patterns and factors associated with COVID-19-related deaths.
To explore features linked to COVID-19 mortality, we employ different techniques such as Filter, Wrapper, and Embedded methods for feature selection. Furthermore, we apply various machine learning methods, including support vector machines, decision trees, random forests, logistic regression, K-nearest neighbours, na¨ıve Bayes methods, and artificial neural networks, to uncover underlying …
Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi
Cm-Ii Meditation As An Intervention To Reduce Stress And Improve Attention: A Study Of Ml Detection, Spectral Analysis, And Hrv Metrics, Sreekanth Gopi
Master of Science in Computer Science Theses
Students frequently face heightened stress due to academic and social pressures, particularly in de- manding fields like computer science and engineering. These challenges are often associated with serious mental health issues, including ADHD (Attention Deficit Hyperactivity Disorder), depression, and an increased risk of suicide. The average student attention span has notably decreased from 21⁄2 minutes to just 47 seconds, and now it typically takes about 25 minutes to switch attention to a new task (Mark, 2023). Research findings suggest that over 95% of individuals who die by suicide have been diagnosed with depression (Shahtahmasebi, 2013), and almost 20% of students …
Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu
Exact Models, Heuristics, And Supervised Learning Approaches For Vehicle Routing Problems, Zefeng Lyu
Doctoral Dissertations
This dissertation presents contributions to the field of vehicle routing problems by utilizing exact methods, heuristic approaches, and the integration of machine learning with traditional algorithms. The research is organized into three main chapters, each dedicated to a specific routing problem and a unique methodology. The first chapter addresses the Pickup and Delivery Problem with Transshipments and Time Windows, a variant that permits product transfers between vehicles to enhance logistics flexibility and reduce costs. To solve this problem, we propose an efficient mixed-integer linear programming model that has been shown to outperform existing ones. The second chapter discusses a practical …
Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna
Towards Robust Long-Form Text Generation Systems, Kalpesh Krishna
Doctoral Dissertations
Text generation is an important emerging AI technology that has seen significant research advances in recent years. Due to its closeness to how humans communicate, mastering text generation technology can unlock several important applications such as intelligent chat-bots, creative writing assistance, or newer applications like task-agnostic few-shot learning. Most recently, the rapid scaling of large language models (LLMs) has resulted in systems like ChatGPT, capable of generating fluent, coherent and human-like text. However, despite their remarkable capabilities, LLMs still suffer from several limitations, particularly when generating long-form text. In particular, (1) long-form generated text is filled with factual inconsistencies to …
Spoken Language Processing And Modeling For Aviation Communications, Aaron Van De Brook
Spoken Language Processing And Modeling For Aviation Communications, Aaron Van De Brook
Doctoral Dissertations and Master's Theses
With recent advances in machine learning and deep learning technologies and the creation of larger aviation-specific corpora, applying natural language processing technologies, especially those based on transformer neural networks, to aviation communications is becoming increasingly feasible. Previous work has focused on machine learning applications to natural language processing, such as N-grams and word lattices. This thesis experiments with a process for pretraining transformer-based language models on aviation English corpora and compare the effectiveness and performance of language models transfer learned from pretrained checkpoints and those trained from their base weight initializations (trained from scratch). The results suggest that transformer language …
Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan
Data-Optimized Spatial Field Predictions For Robotic Adaptive Sampling: A Gaussian Process Approach, Zachary Nathan
Computer Science Senior Theses
We introduce a framework that combines Gaussian Process models, robotic sensor measurements, and sampling data to predict spatial fields. In this context, a spatial field refers to the distribution of a variable throughout a specific area, such as temperature or pH variations over the surface of a lake. Whereas existing methods tend to analyze only the particular field(s) of interest, our approach optimizes predictions through the effective use of all available data. We validated our framework on several datasets, showing that errors can decline by up to two-thirds through the inclusion of additional colocated measurements. In support of adaptive sampling, …
Beyond News Values On Twitter: Predicting Factors That Drive User Engagement In News, Zhiyan Zhong
Beyond News Values On Twitter: Predicting Factors That Drive User Engagement In News, Zhiyan Zhong
Dartmouth College Master’s Theses
When deciding on what news stories to cover, traditional journalism determines news values by following several elements of newsworthiness, such as impact, timeliness, and prominence. However, these guidelines do not always seem to correspond with the success of content on social media. As people are increasingly turning to social media for news, our research aims to understand and predict factors that drive user engagement for news on social media. In this study, we analyze news content published on Twitter, and examine a diverse set of characteristics like metrics retrieved from the Twitter API and semantics by natural language processing, including …
Visual Analytics And Modeling Of Materials Property Data, Diwas Bhattarai
Visual Analytics And Modeling Of Materials Property Data, Diwas Bhattarai
LSU Doctoral Dissertations
Due to significant advancements in experimental and computational techniques, materials data are abundant. To facilitate data-driven research, it calls for a system for managing and sharing data and supporting a set of tools for effective data analysis and modeling. Generally, a given material property M can be considered as a multivariate data problem. The dimensions of M are the values of the property itself, the conditions (pressure P, temperature T, and multi-component composition X) that control the concerned property, and relevant metadata I (source, date).
Here we present a comprehensive database considering both experimental and computational sources …
Invasive Buckthorn Mapping: A Uav-Based Approach Utilizing Machine Learning, Gis, And Remote Sensing Techniques In The Upper Peninsula Of Michigan, Vikranth Madeppa
Invasive Buckthorn Mapping: A Uav-Based Approach Utilizing Machine Learning, Gis, And Remote Sensing Techniques In The Upper Peninsula Of Michigan, Vikranth Madeppa
Dissertations, Master's Theses and Master's Reports
An Invasive species is a species that is alien or non-native to the ecosystem which causes harm to economic, environmental, or human health (E.O. 13112 of Feb 3, 1999). Invasive species have posed a serious threat to ecosystems across the globe. These invasive species have impacts on the biodiversity and productivity of invaded forests. Remotely sensed data is a valuable resource for understanding and addressing issues related to invasive species. This study presents a novel approach for mapping the distribution of two invasive plant species, Common and Glossy Buckthorn, using unmanned aerial vehicles (UAVs), machine learning algorithms, geographic information systems …
Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko
Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko
All Graduate Theses, Dissertations, and Other Capstone Projects
With the surge in information management technology reliance and the looming presence of cyber threats, user authentication has become paramount in computer security. Traditional static or one-time authentication has its limitations, prompting the emergence of continuous authentication as a frontline approach for enhanced security. Continuous authentication taps into behavior-based metrics for ongoing user identity validation, predominantly utilizing machine learning techniques to continually model user behaviors. This study elucidates the potential of mouse movement dynamics as a key metric for continuous authentication. By examining mouse movement patterns across two contrasting gaming scenarios - the high-intensity "Team Fortress" and the low-intensity strategic …
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
The Basil Technique: Bias Adaptive Statistical Inference Learning Agents For Learning From Human Feedback, Jonathan Indigo Watson
Theses and Dissertations--Computer Science
We introduce a novel approach for learning behaviors using human-provided feedback that is subject to systematic bias. Our method, known as BASIL, models the feedback signal as a combination of a heuristic evaluation of an action's utility and a probabilistically-drawn bias value, characterized by unknown parameters. We present both the general framework for our technique and specific algorithms for biases drawn from a normal distribution. We evaluate our approach across various environments and tasks, comparing it to interactive and non-interactive machine learning methods, including deep learning techniques, using human trainers and a synthetic oracle with feedback distorted to varying degrees. …
Time Series Forecasting For Stock Market Prices, Albert Zhou
Time Series Forecasting For Stock Market Prices, Albert Zhou
Senior Honors Projects
No abstract provided.
Application Of Big Data Technology, Text Classification, And Azure Machine Learning For Financial Risk Management Using Data Science Methodology, Oluwaseyi A. Ijogun
Application Of Big Data Technology, Text Classification, And Azure Machine Learning For Financial Risk Management Using Data Science Methodology, Oluwaseyi A. Ijogun
Electronic Theses and Dissertations
Data science plays a crucial role in enabling organizations to optimize data-driven opportunities within financial risk management. It involves identifying, assessing, and mitigating risks, ultimately safeguarding investments, reducing uncertainty, ensuring regulatory compliance, enhancing decision-making, and fostering long-term sustainability. This thesis explores three facets of Data Science projects: enhancing customer understanding, fraud prevention, and predictive analysis, with the goal of improving existing tools and enabling more informed decision-making. The first project examined leveraged big data technologies, such as Hadoop and Spark, to enhance financial risk management by accurately predicting loan defaulters and their repayment likelihood. In the second project, we investigated …
The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin
The Interaction Of Different Primary Producers And Physical And Chemical Dynamics Of An Urban Shallow Lake, Majid Sahin
Dissertations, Theses, and Capstone Projects
An artificial urban shallow lake, Prospect Park Lake (PPL), is situated on a terminal moraine in Brooklyn New York, and supplied with municipal water treated with ortho-phosphates. The constant input of the phosphate nutrient is the primary source of eutrophication in the lake. The numerous pools along the water course houses various aquatic phototrophs, which influence the water quality and the state of the system, driving conditions into favoring the survival of their species. In the first half of the dissertation, the focus of the project is on analyzing how the different primary producers in different regions of PPL affect …
Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv
Data Collection And Machine Learning Methods For Automated Pedestrian Facility Detection And Mensuration, Joseph Bailey Luttrell Iv
Dissertations
Large-scale collection of pedestrian facility (crosswalks, sidewalks, etc.) presence data is vital to the success of efforts to improve pedestrian facility management, safety analysis, and road network planning. However, this kind of data is typically not available on a large scale due to the high labor and time costs that are the result of relying on manual data collection methods. Therefore, methods for automating this process using techniques such as machine learning are currently being explored by researchers. In our work, we mainly focus on machine learning methods for the detection of crosswalks and sidewalks from both aerial and street-view …
Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu
Solving The Challenges Of Concept Drift In Data Stream Classification., Hanqing Hu
Electronic Theses and Dissertations
The rise of network connected devices and applications leads to a significant increase in the volume of data that are continuously generated overtime time, called data streams. In real world applications, storing the entirety of a data stream for analyzing later is often not practical, due to the data stream’s potentially infinite volume. Data stream mining techniques and frameworks are therefore created to analyze streaming data as they arrive. However, compared to traditional data mining techniques, challenges unique to data stream mining also emerge, due to the high arrival rate of data streams and their dynamic nature. In this dissertation, …
Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston
Leveraging Context Patterns For Medical Entity Classification, Garrett Johnston
Computer Science Senior Theses
The ability of patients to understand health-related text is important for optimal health outcomes. A system that can automatically annotate medical entities could help patients better understand health-related text. Such a system would also accelerate manual data annotation for this low-resource domain as well as assist in down- stream medical NLP tasks such as finding textual similarity, identifying conflicting medical advice, and aspect-based sentiment analysis. In this work, we investigate a state-of-the-art entity set expansion model, BootstrapNet, for the task of medical entity classification on a new dataset of medical advice text. We also propose EP SBERT, a simple model …
Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth
Un-Fair Trojan: Targeted Backdoor Attacks Against Model Fairness, Nicholas Furth
Theses
Machine learning models have been shown to be vulnerable against various backdoor and data poisoning attacks that adversely affect model behavior. Additionally, these attacks have been shown to make unfair predictions with respect to certain protected features. In federated learning, multiple local models contribute to a single global model communicating only using local gradients, the issue of attacks become more prevalent and complex. Previously published works revolve around solving these issues both individually and jointly. However, there has been little study on the effects of attacks against model fairness. Demonstrated in this work, a flexible attack, which we call Un-Fair …
Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali
Computational Approaches To Facilitate Automated Interchange Between Music And Art, Rao Hamza Ali
Computational and Data Sciences (PhD) Dissertations
Recently, there has been a tremendous increase in generating and synthesizing music and art using various computational techniques. An area that is still under-researched, however, is how one medium can be converted into the other, while maintaining the overall aesthetics. Over the last few centuries, artists, composers, and scholars, have attempted to use substitute one form of art for the other: by proposing techniques where music notes are synonymous to colors, by inventing instruments that combine the aesthetics of music and visual art, and by incorporating the two media in live performances. A widely accepted computational approach, for the conversion, …
Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii
Intraday Algorithmic Trading Using Momentum And Long Short-Term Memory Network Strategies, Andrew R. Whitinger Ii
Undergraduate Honors Theses
Intraday stock trading is an infamously difficult and risky strategy. Momentum and reversal strategies and long short-term memory (LSTM) neural networks have been shown to be effective for selecting stocks to buy and sell over time periods of multiple days. To explore whether these strategies can be effective for intraday trading, their implementations were simulated using intraday price data for stocks in the S&P 500 index, collected at 1-second intervals between February 11, 2021 and March 9, 2021 inclusive. The study tested 160 variations of momentum and reversal strategies for profitability in long, short, and market-neutral portfolios, totaling 480 portfolios. …
Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo
Dataset Evaluation For Data Trading Using Expected Loss And Homomorphic Encryption, Minsung Joo
Senior Honors Papers / Undergraduate Theses
Supervised machine learning suffers from the ``garbage-in garbage-out" phenomenon where the performance of a model is limited by the quality of the data. While a myriad of data is collected every second, there is no general rigorous method of evaluating the quality of a given dataset. This hinders fair pricing of data in scenarios where a buyer may look to buy data for use with machine learning. In this work, I propose using the expected loss corresponding to a dataset as a measure of its quality, relying on Bayesian methods for uncertainty quantification. Furthermore, I present a secure multi-party computation …
Beyond Accuracy In Machine Learning., Aneseh Alvanpour
Beyond Accuracy In Machine Learning., Aneseh Alvanpour
Electronic Theses and Dissertations
Machine Learning (ML) algorithms are widely used in our daily lives. The need to increase the accuracy of ML models has led to building increasingly powerful and complex algorithms known as black-box models which do not provide any explanations about the reasons behind their output. On the other hand, there are white-box ML models which are inherently interpretable while having lower accuracy compared to black-box models. To have a productive and practical algorithmic decision system, precise predictions may not be sufficient. The system may need to have transparency and be able to provide explanations, especially in applications with safety-critical contexts …
New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene
New Debiasing Strategies In Collaborative Filtering Recommender Systems: Modeling User Conformity, Multiple Biases, And Causality., Mariem Boujelbene
Electronic Theses and Dissertations
Recommender Systems are widely used to personalize the user experience in a diverse set of online applications ranging from e-commerce and education to social media and online entertainment. These State of the Art AI systems can suffer from several biases that may occur at different stages of the recommendation life-cycle. For instance, using biased data to train recommendation models may lead to several issues, such as the discrepancy between online and offline evaluation, decreasing the recommendation performance, and hurting the user experience. Bias can occur during the data collection stage where the data inherits the user-item interaction biases, such as …
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Intra-Hour Solar Forecasting Using Cloud Dynamics Features Extracted From Ground-Based Infrared Sky Images, Guillermo Terrén-Serrano
Electrical and Computer Engineering ETDs
Due to the increasing use of photovoltaic systems, power grids are vulnerable to the projection of shadows from moving clouds. An intra-hour solar forecast provides power grids with the capability of automatically controlling the dispatch of energy, reducing the additional cost for a guaranteed, reliable supply of energy (i.e., energy storage). This dissertation introduces a novel sky imager consisting of a long-wave radiometric infrared camera and a visible light camera with a fisheye lens. The imager is mounted on a solar tracker to maintain the Sun in the center of the images throughout the day, reducing the scattering effect produced …
A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo
A Predictive Model To Predict Cyberattack Using Self-Normalizing Neural Networks, Oluwapelumi Eniodunmo
Theses, Dissertations and Capstones
Cyberattack is a never-ending war that has greatly threatened secured information systems. The development of automated and intelligent systems provides more computing power to hackers to steal information, destroy data or system resources, and has raised global security issues. Statistical and Data mining tools have received continuous research and improvements. These tools have been adopted to create sophisticated intrusion detection systems that help information systems mitigate and defend against cyberattacks. However, the advancement in technology and accessibility of information makes more identifiable elements that can be used to gain unauthorized access to systems and resources. Data mining and classification tools …
A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir
A Citizen-Science Approach For Urban Flood Risk Analysis Using Data Science And Machine Learning, Candace Agonafir
Dissertations and Theses
Street flooding is problematic in urban areas, where impervious surfaces, such as concrete, brick, and asphalt prevail, impeding the infiltration of water into the ground. During rain events, water ponds and rise to levels that cause considerable economic damage and physical harm. The main goal of this dissertation is to develop novel approaches toward the comprehension of urban flood risk using data science techniques on crowd-sourced data. This is accomplished by developing a series of data-driven models to identify flood factors of significance and localized areas of flood vulnerability in New York City (NYC). First, the infrastructural (catch basin clogs, …
Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy
Classifying Blood Glucose Levels Through Noninvasive Features, Rishi Reddy
Graduate Theses, Dissertations, and Problem Reports
Blood glucose monitoring is a key process in the prevention and management of certain chronic diseases, such as diabetes. Currently, glucose monitoring for those interested in their blood glucose levels are confronted with options that are primarily invasive and relatively costly. A growing topic of note is the development of non-invasive monitoring methods for blood glucose. This development holds a significant promise for improvement to the quality of life of a significant portion of the population and is overall met with great enthusiasm from the scientific community as well as commercial interest. This work aims to develop a potential pipeline …