Open Access. Powered by Scholars. Published by Universities.®

Data Science Commons

Open Access. Powered by Scholars. Published by Universities.®

1,483 Full-Text Articles 2,962 Authors 435,013 Downloads 189 Institutions

All Articles in Data Science

Faceted Search

1,483 full-text articles. Page 21 of 73.

Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko 2023 Minnesota State University, Mankato

Unlocking User Identity: A Study On Mouse Dynamics In Dual Gaming Environments For Continuous Authentication, Marcho Setiawan Handoko

All Graduate Theses, Dissertations, and Other Capstone Projects

With the surge in information management technology reliance and the looming presence of cyber threats, user authentication has become paramount in computer security. Traditional static or one-time authentication has its limitations, prompting the emergence of continuous authentication as a frontline approach for enhanced security. Continuous authentication taps into behavior-based metrics for ongoing user identity validation, predominantly utilizing machine learning techniques to continually model user behaviors. This study elucidates the potential of mouse movement dynamics as a key metric for continuous authentication. By examining mouse movement patterns across two contrasting gaming scenarios - the high-intensity "Team Fortress" and the low-intensity strategic …


Making Data-Driven Decisions For Investing In Restaurant Business: A Case Study Based On Zomato Dataset, Rachna Shah 2023 Minnesota State University, Mankato

Making Data-Driven Decisions For Investing In Restaurant Business: A Case Study Based On Zomato Dataset, Rachna Shah

All Graduate Theses, Dissertations, and Other Capstone Projects

In today’s fast-paced world, where time is a precious commodity, the ability to order a wide array of cuisines from the comfort of your home or office impacts your quality of life. With an increasing number of food delivery services, with just a few taps on the smartphone or clicks on the computer, we can enjoy the food we want. The importance of this convenience cannot be overstated, as it allows people to save time and effort that would otherwise be spent on cooking, grocery shopping, or dining out. As the food delivery system grows and develops, its economic framework …


Optimal Design And Operation Of Integrated Hydrogen Generation And Utilization Plants, Ijiwole Solomon Ijiyinka 2023 West Virginia University

Optimal Design And Operation Of Integrated Hydrogen Generation And Utilization Plants, Ijiwole Solomon Ijiyinka

Graduate Theses, Dissertations, and Problem Reports

There are considerable efforts worldwide for reducing the use of fossil fuel for energy production. While renewable energy sources are being increasingly used, fossil fuel still contribute about 80% of the energy used worldwide. As a result, the level of CO2 is still increasing fast in the atmosphere currently exceeding about 410 parts per million (ppm). For reducing CO2 build up in the atmosphere, various approaches are being investigated. For the electric power generation sector, two key approaches are post-combustion CO2 capture and use of hydrogen as a fuel for power generation. These two solutions can also …


Aircraft Damage Classification By Using Machine Learning Methods, Tüzün Tolga İnan 2023 Bahcesehir University

Aircraft Damage Classification By Using Machine Learning Methods, Tüzün Tolga İnan

International Journal of Aviation, Aeronautics, and Aerospace

Safety is the most significant factor that affected incidents (non-fatal) and accidents (fatal) in civil aviation history related to scheduled flights. In the history of scheduled flights, the total incident and accident number until 2022 is 1988. In this study, 677 of them are taken into consideration since 11 September 2001. The purpose of this study is to reveal the factors that can classify type of aircraft damages such as none, minor and substantial in all-time incidents and accidents. ML algorithms with different configurations are applied for the classification process. The RFE and PCA are used to find the most …


Predicting Housing Prices Using Ai, Eric Sconyers 2023 The University of Akron

Predicting Housing Prices Using Ai, Eric Sconyers

Williams Honors College, Honors Research Projects

I have created an AI model that can predict housing prices with 70 percent accuracy in Ames Iowa. I was able to use data from a website called Kaggle.com which is a website that provides datasets to the public so they can create AI models with the data. I found the dataset pertaining to housing prices in Ames Iowa. With this data, I was able to create an AI model that can predict the housing price of these homes. The technology I used in this project was Python as the programming language, and I used the scikit-learn library which has …


Development Of Machine Learning Based Approach To Predict Fuel Consumption And Maintenance Cost Of Heavy-Duty Vehicles Using Diesel And Alternative Fuels, Sasanka Katreddi 2023 West Virginia University

Development Of Machine Learning Based Approach To Predict Fuel Consumption And Maintenance Cost Of Heavy-Duty Vehicles Using Diesel And Alternative Fuels, Sasanka Katreddi

Graduate Theses, Dissertations, and Problem Reports

One of the major contributors of human-made greenhouse gases (GHG) namely carbon dioxide (CO2), methane (CH4), and nitrous oxide (NOX) in the transportation sector and heavy-duty vehicles (HDV) contributing to about 27% of the overall fraction. In addition to the rapid increase in global temperature, airborne pollutants from diesel vehicles also present a risk to human health. Even a small improvement that could potentially drive energy savings to the century-old mature diesel technology could yield a significant impact on minimizing greenhouse gas emissions. With the increasing focus on reducing emissions and operating costs, there is a need for efficient and …


Automating Intersection Marking Data Collection And Condition Assessment At Scale With An Artificial Intelligence-Powered System, Kun Xie, Huiming Sun, Xiaomeng Dong, Hong Yang, Hongkai Yu 2023 Old Dominion University

Automating Intersection Marking Data Collection And Condition Assessment At Scale With An Artificial Intelligence-Powered System, Kun Xie, Huiming Sun, Xiaomeng Dong, Hong Yang, Hongkai Yu

Civil & Environmental Engineering Faculty Publications

Intersection markings play a vital role in providing road users with guidance and information. The conditions of intersection markings will be gradually degrading due to vehicular traffic, rain, and/or snowplowing. Degraded markings can confuse drivers, leading to increased risk of traffic crashes. Timely obtaining high-quality information of intersection markings lays a foundation for making informed decisions in safety management and maintenance prioritization. However, current labor-intensive and high-cost data collection practices make it very challenging to gather intersection data on a large scale. This paper develops an automated system to intelligently detect intersection markings and to assess their degradation conditions with …


Application Of Big Data Technology, Text Classification, And Azure Machine Learning For Financial Risk Management Using Data Science Methodology, Oluwaseyi A. Ijogun 2023 Georgia Southern University

Application Of Big Data Technology, Text Classification, And Azure Machine Learning For Financial Risk Management Using Data Science Methodology, Oluwaseyi A. Ijogun

Electronic Theses and Dissertations

Data science plays a crucial role in enabling organizations to optimize data-driven opportunities within financial risk management. It involves identifying, assessing, and mitigating risks, ultimately safeguarding investments, reducing uncertainty, ensuring regulatory compliance, enhancing decision-making, and fostering long-term sustainability. This thesis explores three facets of Data Science projects: enhancing customer understanding, fraud prevention, and predictive analysis, with the goal of improving existing tools and enabling more informed decision-making. The first project examined leveraged big data technologies, such as Hadoop and Spark, to enhance financial risk management by accurately predicting loan defaulters and their repayment likelihood. In the second project, we investigated …


Comparative Analysis Of Fullstack Development Technologies: Frontend, Backend And Database, Qozeem Odeniran 2023 Georgia Southern University

Comparative Analysis Of Fullstack Development Technologies: Frontend, Backend And Database, Qozeem Odeniran

Electronic Theses and Dissertations

Accessing websites with various devices has brought changes in the field of application development. The choice of cross-platform, reusable frameworks is very crucial in this era. This thesis embarks in the evaluation of front-end, back-end, and database technologies to address the status quo. Study-a explores front-end development, focusing on angular.js and react.js. Using these frameworks, comparative web applications were created and evaluated locally. Important insights were obtained through benchmark tests, lighthouse metrics, and architectural evaluations. React.js proves to be a performance leader in spite of the possible influence of a virtual machine, opening the door for additional research. Study b …


A Deep Bilstm Machine Learning Method For Flight Delay Prediction Classification, Desmond B. Bisandu PhD, Irene Moulitsas PhD 2023 Cranfield University

A Deep Bilstm Machine Learning Method For Flight Delay Prediction Classification, Desmond B. Bisandu Phd, Irene Moulitsas Phd

Journal of Aviation/Aerospace Education & Research

This paper proposes a classification approach for flight delays using Bidirectional Long Short-Term Memory (BiLSTM) and Long Short-Term Memory (LSTM) models. Flight delays are a major issue in the airline industry, causing inconvenience to passengers and financial losses to airlines. The BiLSTM and LSTM models, powerful deep learning techniques, have shown promising results in a classification task. In this study, we collected a dataset from the United States (US) Bureau of Transportation Statistics (BTS) of flight on-time performance information and used it to train and test the BiLSTM and LSTM models. We set three criteria for selecting highly important features …


An Explainable Deep Learning Prediction Model For Severity Of Alzheimer's Disease From Brain Images, Godwin O. Ekuma 2023 Missouri State University

An Explainable Deep Learning Prediction Model For Severity Of Alzheimer's Disease From Brain Images, Godwin O. Ekuma

MSU Graduate Theses

Deep Convolutional Neural Networks (CNNs) have become the go-to method for medical imaging classification on various imaging modalities for binary and multiclass problems. Deep CNNs extract spatial features from image data hierarchically, with deeper layers learning more relevant features for the classification application. The effectiveness of deep learning models are hampered by limited data sets, skewed class distributions, and the undesirable "black box" of neural networks, which decreases their understandability and usability in precision medicine applications. This thesis addresses the challenge of building an explainable deep learning model for a clinical application: predicting the severity of Alzheimer's disease (AD). AD …


Breast Density Classification Using Deep Learning, Conrad Thomas Testagrose 2023 University of North Florida

Breast Density Classification Using Deep Learning, Conrad Thomas Testagrose

UNF Graduate Theses and Dissertations

Breast density screenings are an accepted means to determine a patient's predisposed risk of breast cancer development. Although the direct correlation is not fully understood, breast cancer risk increases with higher levels of mammographic breast density. Radiologists visually assess a patient's breast density using mammogram images and assign a density score based on four breast density categories outlined by the Breast Imaging and Reporting Data Systems (BI-RADS). There have been efforts to develop automated tools that assist radiologists with increasing workloads and to help reduce the intra- and inter-rater variability between radiologists. In this thesis, I explored two deep-learning-based approaches …


Tenvr: Matlab-Based Toolbox For Environmental Research, Aleksandar I. Goranov, Rachel L. Sleighter, Dobromir A. Yordanov, Patrick G. Hatcher 2023 Old Dominion University

Tenvr: Matlab-Based Toolbox For Environmental Research, Aleksandar I. Goranov, Rachel L. Sleighter, Dobromir A. Yordanov, Patrick G. Hatcher

Chemistry & Biochemistry Faculty Publications

With the advancements in science and technology, datasets become larger and more multivariate, which warrants the need for programming tools for fast data processing and multivariate statistical analysis. Here, the MATLAB-based Toolbox for Environmental Research "TEnvR" (pronounced "ten-ver") is introduced. This novel toolbox includes 44 open-source codes for automated data analysis from a multitude of techniques, such as ultraviolet-visible, fluorescence, and nuclear magnetic resonance spectroscopies, as well as from ultrahigh resolution mass spectrometry. Provided are codes for processing data (e.g., spectral corrections, formula assignment), visualization of figures, calculation of metrics, multivariate statistics, and automated work-up of large datasets. TEnvR allows …


Machine Learning-Based Approaches For Predicting The Critical Temperature Of Superconductor, Pradip Dhakal 2023 University of Central Florida

Machine Learning-Based Approaches For Predicting The Critical Temperature Of Superconductor, Pradip Dhakal

Data Science and Data Mining

This paper focuses on utilizing multiple linear regression, lasso regression, and extreme gradient boosting algorithms to predict the critical temperature of the superconductor. The model will be evaluated using the mean square error and adjusted R-squared values, and the best model will be recommended for future work related to this study.


Variable Selection Using Lasso And Elastic Net Regression On High Dimensional Genetic Architecture Data Of Maize Flowering Time, Pradip Dhakal 2023 University of Central Florida

Variable Selection Using Lasso And Elastic Net Regression On High Dimensional Genetic Architecture Data Of Maize Flowering Time, Pradip Dhakal

Data Science and Data Mining

Variable selection is one of the key components in the machine learning area. This method reduces the unwanted and redundant predictors in the model, which prevents the overfitting situation. Since the model contains few significant predictors, the model is less likely to learn the trend from the noise. Further, the time to train the model reduces when we have only a few valuable variables.


Campus Safety Data Gathering, Classification, And Ranking Based On Clery-Act Reports, Walaa F. Abo Elenin 2023 Georgia Southern University

Campus Safety Data Gathering, Classification, And Ranking Based On Clery-Act Reports, Walaa F. Abo Elenin

Electronic Theses and Dissertations

Most existing campus safety rankings are based on criminal incident history with minimal or no consideration of campus security conditions and standard safety measures. Campus safety information published by universities/colleges is usually conceptual/qualitative and not quantitative and are based-on criminal records of these campuses. Thus, no explicit and trusted ranking method for these campuses considers the level of compliance with the standard safety measures. A quantitative safety measure is important to compare different campuses easily and to learn about specific campus safety conditions.

In this thesis, we utilize Clery-Act reports of campuses to automatically analyze their safety conditions and generate …


Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan 2023 University of Kentucky

Statistical Intervals For Neural Network And Its Relationship With Generalized Linear Model, Sheng Yuan

Theses and Dissertations--Statistics

Neural networks have experienced widespread adoption and have become integral in cutting-edge domains like computer vision, natural language processing, and various contemporary fields. However, addressing the statistical aspects of neural networks has been a persistent challenge, with limited satisfactory results. In my research, I focused on exploring statistical intervals applied to neural networks, specifically confidence intervals and tolerance intervals. I employed variance estimation methods, such as direct estimation and resampling, to assess neural networks and their performance under outlier scenarios. Remarkably, when outliers were present, the resampling method with infinitesimal jackknife estimation yielded confidence intervals that closely aligned with nominal …


High Dimensional Data Analysis: Variable Screening And Inference, lei fang 2023 University of Kentucky

High Dimensional Data Analysis: Variable Screening And Inference, Lei Fang

Theses and Dissertations--Statistics

This dissertation focuses on the problem of high dimensional data analysis, which arises in many fields including genomics, finance, and social sciences. In such settings, the number of features or variables is much larger than the number of observations, posing significant challenges to traditional statistical methods.

To address these challenges, this dissertation proposes novel methods for variable screening and inference. The first part of the dissertation focuses on variable screening, which aims to identify a subset of important variables that are strongly associated with the response variable. Specifically, we propose a robust nonparametric screening method to effectively select the predictors …


A Data Analysis On Mass Shootings In Amercia, James Hinkle, Patrick McCool 2023 Arcadia University

A Data Analysis On Mass Shootings In Amercia, James Hinkle, Patrick Mccool

Capstone Showcase

Mass shootings in America have been a recurring issue for years. In this project, we examine mass shootings that have occurred in the United States from 1966 to 2022. Through exploratory data analyses, we explore patterns and trends in shooting events, as well as various patterns in shooters, such as their mental health status, relationship status, social media usage, evidence of trauma in adulthood, and ongoing stressors during the time of the shooting. We also utilize natural language processing (NLP) tools to analyze text information in the dataset, such as the shooters' school performance, community involvement, and past signs of …


Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, LisaMarie Turk 2022 University of New Mexico

Applying Data Science And Machine Learning To Understand Health Care Transition For Adolescents And Emerging Adults With Special Health Care Needs, Lisamarie Turk

Nursing ETDs

A problem of classification places adolescents and emerging adults with special health care needs among the most at risk for poor or life-threatening health outcomes. This preliminary proof-of-concept study was conducted to determine if phenotypes of health care transition (HCT) for this vulnerable population could be established. Such phenotypes could support development of future studies that require data classifications as input. Mining of electronic health record data and cluster analysis were implemented to identify phenotypes. Subsequently, a machine learning concept model was developed for predicting acute care and medical condition severity. Three clusters were identified and described (Cluster 1, n …


Digital Commons powered by bepress