Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Data Science

PDF

Journal

Institution
Keyword
Publication Year
Publication

Articles 1 - 30 of 201

Full-Text Articles in Physical Sciences and Mathematics

Data Collector Selection Ranking-Based Method For Collaborative Multi-Tasks In Ubiquitous Environments, Belal Z. Hassan, Ahmed. A. A. Gad-Elrab, Mohamed S. Farag, S. E. Abu-Youssef Aug 2024

Data Collector Selection Ranking-Based Method For Collaborative Multi-Tasks In Ubiquitous Environments, Belal Z. Hassan, Ahmed. A. A. Gad-Elrab, Mohamed S. Farag, S. E. Abu-Youssef

Al-Azhar Bulletin of Science

In Ubiquitous Computing and the Internet of Things, the sensing and control of objects involve numerous devices collecting and transmitting data. However, connecting these devices without fostering collaboration leads to suboptimal system performance. As the number of connected sensing devices in Internet of Things increases, efficient task accomplishment through collaboration becomes imperative. This paper proposes a Data Collector Selection Method for Collaborative Multi-Tasks to address this challenge, considering task preferences and uncertainty in data collectors' contributions. The proposed method incorporates three key aspects: (1) Using Fuzzy Analytical Hierarchy Process to determine optimal weights for task preferences; (2) Ranking data collectors …


Predictive Analysis Of Local House Prices: Leveraging Machine Learning For Real Estate Valuation, Joey Hernandez, Danny Chang, Santiago Gutierrez, Paul Huggins May 2024

Predictive Analysis Of Local House Prices: Leveraging Machine Learning For Real Estate Valuation, Joey Hernandez, Danny Chang, Santiago Gutierrez, Paul Huggins

SMU Data Science Review

This paper presents a comprehensive study examining the real estate market potential in the dynamic urban landscapes of Frisco and Plano, Texas. Combining traditional real estate analysis with cutting-edge machine learning techniques, the study aims to predict home prices and assess investment feasibility. Leveraging these findings, the study proposes a strategic focus on predictive modeling and investment potential identification, emphasizing the continual refinement of machine learning models with updated data to accurately forecast changes in the real estate market. By harnessing the predictive power of these models, investors can identify high-growth areas and optimize their investment decisions, thus capitalizing on …


A Symbolic Approach To Nonlinear Time Series Analysis, Ranjan Karki, Nibhrat Lohia, Michael B. Schulte May 2024

A Symbolic Approach To Nonlinear Time Series Analysis, Ranjan Karki, Nibhrat Lohia, Michael B. Schulte

SMU Data Science Review

Current nonlinear time series methods such as neural networks forecast well. However, they act as a black box and are difficult to interpret, leaving the researchers and the audience with little insight into why the forecasts are the way they are. There is a need for a method that forecasts accurately while also being easy to interpret. This paper aims to develop a method to build an interpretable model for univariate and multivariate nonlinear time series data using wavelets and symbolic regression. The final method relies on multilayer perceptron (MLP) neural networks as a form of dimensionality reduction and the …


Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma May 2024

Intelligent Solutions For Retroactive Anomaly Detection And Resolution With Log File Systems, Derek G. Rogers, Chanvo Nguyen, Abhay Sharma

SMU Data Science Review

This paper explores the intricate challenges log files pose from data science and machine learning perspectives. Drawing inspiration from existing methods, LAnoBERT, PULL, LLMs, and the breadth of recent research, this paper aims to push the boundaries of machine learning for log file systems. Our study comprehensively examines the unique challenges presented in our problem setup, delineates the limitations of existing methods, and introduces innovative solutions. These contributions are organized to offer valuable insights, predictions, and actionable recommendations tailored for Microsoft's engineers working on log data analysis.


Baseball Decision-Making: Optimizing At-Bat Simulations, Varun Gopal, Krithika Kondakindi, Nibhrat Lohia, Morgan Williams May 2024

Baseball Decision-Making: Optimizing At-Bat Simulations, Varun Gopal, Krithika Kondakindi, Nibhrat Lohia, Morgan Williams

SMU Data Science Review

Pitch selection in baseball plays a crucial role, involving pitchers, catchers, and batters working together. This practice, dating back to early baseball, has seen teams try various methods to gain an advantage. This research aims to use reinforcement learning and pitch-by-pitch Statcast data to improve batting strategies. It also builds on previous statistical work (sabermetrics) to make better choices in pitch selection and plate discipline. The dataset used, including over 700,000 pitches for each full season and 200,000 pitches for the COVID-shortened 2020 season, encompasses a wealth of crucial metrics including pitch release point, velocity, and launch angle. This study …


Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford May 2024

Reevaluating Texas Energy Market Forecasts In The Wake Of Recent Extreme Weather Events, Robert A. Derner, Richard W. Butler Ii, Alexandria Neff, Adam R. Ruthford

SMU Data Science Review

This paper provides updated forecasts of energy demand in Texas and recognizes the impact of sustainable energy. It is important that the forecasts of the adoption of sustainable energy are reexamined after Winter Storm Uri crippled the Texas power grid and left many without power. This storm highlighted the issues the Texas power grid had and has continued to struggle with in supplying the state with energy. This paper will offer an overview of the relevant literature on the adoption of sustainable energy and relevant events that have occurred in the state of Texas that will give the reader the …


Multi-Class Emotion Classification With Xgboost Model Using Wearable Eeg Headband Data, James Khamthung, Nibhrat Lohia, Seement Srivastava May 2024

Multi-Class Emotion Classification With Xgboost Model Using Wearable Eeg Headband Data, James Khamthung, Nibhrat Lohia, Seement Srivastava

SMU Data Science Review

Electroencephalography (EEG) or brainwave signals serve as a valuable source for discerning human activities, thoughts, and emotions. This study explores the efficacy of EXtreme Gradient Boosting (XGBoost) models in sentiment classification using EEG signals, specifically those captured by the MUSE EEG headband. The MUSE device, equipped with four EEG electrodes (TP9, AF7, AF8, TP10), offers a cost-effective alternative to traditional EEG setups, which often utilize over 60 channels in laboratory-grade settings. Leveraging a dataset from previous MUSE research (Bird, J. et al., 2019), emotional states (positive, neutral, and negative) were observed in a male and a female participant, each for …


Building Effective Large Language Model Agents, Sydney Holder, Shreyash Taywade May 2024

Building Effective Large Language Model Agents, Sydney Holder, Shreyash Taywade

SMU Data Science Review

The advancement of large language models (LLMs) has significantly expanded the influence of artificial intelligence across various sectors. This paper explores building LLM agents to power applications and examines what is necessary to build an efficient and helpful AI assistant. The research investigates the core components necessary to create specialized agents, facilitate collaboration in problem-solving, and improve human task performance. The development and application of tools designed to augment the capabilities of LLM agents are also explored. The paper addresses the potential risks of the unknowns, such as hallucinations, which can compromise the success of agent-based solutions within LLM applications. …


Game Recommendation Analysis Using Steam Profiles And Reviews, Robert Blue, Luis Garcia, Jacob Turner May 2024

Game Recommendation Analysis Using Steam Profiles And Reviews, Robert Blue, Luis Garcia, Jacob Turner

SMU Data Science Review

Smaller game studios are at a disadvantage when it comes to getting their product noticed by users. This study aims to provide insights on how recommendation engines work so that these smaller studios can have their games noticed on Steam. Steam is one of the largest video game distribution services and they have a recommendation engine which promotes games to its user base. This study utilized user information such as number of games played, the type of games, and the hours played and created recommendation engines to identify the qualities in the game that are driving recommendations.


Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn May 2024

Leveraging Transformer Models For Genre Classification, Andreea C. Craus, Ben Berger, Yves Hughes, Hayley Horn

SMU Data Science Review

As the digital music landscape continues to expand, the need for effective methods to understand and contextualize the diverse genres of lyrical content becomes increasingly critical. This research focuses on the application of transformer models in the domain of music analysis, specifically in the task of lyric genre classification. By leveraging the advanced capabilities of transformer architectures, this project aims to capture intricate linguistic nuances within song lyrics, thereby enhancing the accuracy and efficiency of genre classification. The relevance of this project lies in its potential to contribute to the development of automated systems for music recommendation and genre-based playlist …


Context Aware Music Recommendation And Playlist Generation, Elias Mann May 2024

Context Aware Music Recommendation And Playlist Generation, Elias Mann

SMU Journal of Undergraduate Research

There are many reasons people listen to music, and the type of music is largely determined by what the listener may be doing while they listen. For example, one may listen to one type of music while commuting, another while exercising, and yet another while relaxing. Without access to the physiological state of the user, current music recommendation methods rely on collaborative filtering - recommending music based on what other similar users listen to - and content based filtering - recommending songs based on their similarities to songs the user already prefers. With the rise in popularity of smart devices …


Surmounting Challenges In Aggregating Results From Static Analysis Tools, Dr. Ann Marie Reinhold, Brittany Boles, A. Redempta Manzi Muneza, Thomas Mcelroy, Dr. Clemente Izurieta May 2024

Surmounting Challenges In Aggregating Results From Static Analysis Tools, Dr. Ann Marie Reinhold, Brittany Boles, A. Redempta Manzi Muneza, Thomas Mcelroy, Dr. Clemente Izurieta

Military Cyber Affairs

Aggregation poses a significant challenge for software practitioners because it requires a comprehensive and nuanced understanding of raw data from diverse sources. Suites of static-analysis tools (SATs) are commonly used to assess organizational security but simultaneously introduce significant challenges. Challenges include unique results, scales, configuration environments for each SAT execution, and incompatible formats between SAT outputs. Here, we document our experiences addressing these issues. We highlight the problem of relying on a single vendor's SAT version and offer a solution for aggregating findings across multiple SATs, aiming to enhance software security practices and deter threats early with robust defensive operations.


Gender Detection In Facial Images: A Comprehensive Cnn Analysis, Jose N T Ambrosio, Anas Hourani, Magdalene Moy Apr 2024

Gender Detection In Facial Images: A Comprehensive Cnn Analysis, Jose N T Ambrosio, Anas Hourani, Magdalene Moy

SACAD: John Heinrichs Scholarly and Creative Activity Days

This research investigates the construction of a robust gender detection system using facial features and Convolutional Neural Networks (CNNs), exploring the impact of different layer configurations on accuracy and computational efficiency. With a validation accuracy of 91%, findings illuminate the nuanced relationship between precision and computational resources, enriching discussions on facial recognition technologies.


Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan Apr 2024

Combating Financial Crimes With Unsupervised Learning Techniques: Clustering And Dimensionality Reduction For Anti-Money Laundering, Ahmed N. Bakry, Almohammady S. Alsharkawy, Mohamed S. Farag, Kamal R. Raslan

Al-Azhar Bulletin of Science

Anti-Money Laundering (AML) is a crucial task in ensuring the integrity of financial systems. One keychallenge in AML is identifying high-risk groups based on their behavior. Unsupervised learning, particularly clustering, is a promising solution for this task. However, the use of hundreds of features todescribe behavior results in a highdimensional dataset that negatively impacts clustering performance.In this paper, we investigate the effectiveness of combining clustering method agglomerative hierarchicalclustering with four dimensionality reduction techniques -Independent Component Analysis (ICA), andKernel Principal Component Analysis (KPCA), Singular Value Decomposition (SVD), Locality Preserving Projections (LPP)- to overcome the issue of high-dimensionality in AML data and …


Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali Apr 2024

Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali

Al-Azhar Bulletin of Science

One of the most recent developments in the fields of deep learning and machine learning is Graph Neural Networks (GNNs). GNNs core task is the feature aggregation stage, which is carried out over the node's neighbours without taking into account whether the features are relevant or not. Additionally, the majority of these existing node representation techniques only consider the network's topology structure while completely ignoring the centrality information. In this paper, a new technique for explaining graph features depending on four different feature selection approaches and centrality measures in order to identify the important nodes and relevant node features is …


Accurate Estimation Of Ethanol Content In Fruit Juices Using Cielab Color Space And Chemometrics Via Smartphone-Based Digital Image Colorimetry, Chairul Ichsan, Yasir Amrulloh, Desti Erviana Mar 2024

Accurate Estimation Of Ethanol Content In Fruit Juices Using Cielab Color Space And Chemometrics Via Smartphone-Based Digital Image Colorimetry, Chairul Ichsan, Yasir Amrulloh, Desti Erviana

Makara Journal of Science

This study aims to investigate the optimal color space and chemometric technique for digital image colorimetry to determine ethanol content (% v/v) in apple, orange, and grape juices, using potassium dichromate (K2Cr2O7) under acidic conditions. The accuracy of colorimetric–chemometric integration across various color spaces (RGB, HSV, CIELab, CMYK, CIELuv, CIEXYZ, and CIELch) was benchmarked against UV–Vis spectrophotometry using metrics such as coefficient of determination (R²), mean absolute percentage error (MAPE), and root–mean–squared error (RMSE). Various chemometric techniques (PLS, PCR, MLR, multivariable–SVR, and multivariable NN regression) were evaluated. Results demonstrate that combining the CIELab color …


Research On Boundary Reconstruction And Government Supervision Strategy For Digital Platform, Jichang Dong, Feiyang Zhan, Wei Li, Jinlu Guo, Ying Liu Mar 2024

Research On Boundary Reconstruction And Government Supervision Strategy For Digital Platform, Jichang Dong, Feiyang Zhan, Wei Li, Jinlu Guo, Ying Liu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Digital platform is the most important form of organization in the digital era. How to clarify the boundary between platform autonomy and government regulation so as to exert the order maintenance function of platforms effectively is the key issue in the region of the digital economy governance. This study firstly introduces the basic model of platform autonomy and the regulatory challenges it faces, basing on the background of the emergence of digital platform autonomy. Secondly, through a comparative analysis of the regulatory theories and legal policies of the digital platform autonomy in the European Union and the United States, this …


Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu Mar 2024

Research On Chinese Data Sovereignty Policy Based On Lda Model And Policy Instruments, Han Qiao, Junru Xu

Bulletin of Chinese Academy of Sciences (Chinese Version)

Data sovereignty has become an important component of national sovereignty in the dual context of the digital economy development and the overall national security concept. Major countries and regions are actively carrying out data sovereignty strategic deployment and engaging in fierce competition in data resources, data technology, and data rules. This work adopts the policy text analysis method to study China’s data sovereignty policy, and employs the LDA model and policy instruments to quantitatively analyze the process evolution and thematic characteristics of China’s data sovereignty policy. Drawing on these findings, this study comprehensively considers the global data sovereignty policy and …


The Impact Of Accessible Data On Cyberstalking, Elise Kwan Jan 2024

The Impact Of Accessible Data On Cyberstalking, Elise Kwan

The Journal of Purdue Undergraduate Research

No abstract provided.


Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown Jan 2024

Model Selection Through Cross-Validation For Supervised Learning Tasks With Manifold Data, Derek Brown

The Journal of Purdue Undergraduate Research

No abstract provided.


Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu Jan 2024

Machine Learning Of Big Data: A Gaussian Regression Model To Predict The Spatiotemporal Distribution Of Ground Ozone, Jerry Gu

The Journal of Purdue Undergraduate Research

Tracking pollution levels on the ground is important to the environment and public health. One of the pollutants of concern is ozone, which, at high concentrations, can cause respiratory and cardiovascular problems. The National Center for Atmospheric Research (NCAR) has published valuable ozone data obtained from ground-based sensors installed at selected locations. Because it is unfeasible to measure the exact ozone levels everywhere at any time, it would be valuable to predict the temporal-spatial distributions of ozone concentration based on existing data. This would help us better understand the patterns and trends in the data and make better decisions to …


A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan Jan 2024

A Computational Profile Of Invasive Lionfish In Belize: A New Insight On A Destructive Species, Joshua E. Balan

The Journal of Purdue Undergraduate Research

Since their discovery in the region in 2009, invasive Indonesian-native lionfish have been taking over the Belize Barrier Reef. As a result, populations of local species have dwindled as they are either eaten or outcompeted by the invaders. This has led to devastating losses ecologically and economically; massive industries in the local nations, such as fisheries and tourism, have suffered greatly. Attempting to combat this, local organizations, from nonprofits to ecotourism companies, have been manually spear-hunting them on scuba dives to cull the population. One such company, Reef Conservation Institute (ReefCI), operating out of Tom Owens Caye outside of Placencia, …


Comparison Of Support Vector Machine (Svm), K-Nearest Neighbor (K-Nn), And Stochastic Gradient Descent (Sgd) For Classifying Corn Leaf Disease Based On Histogram Of Oriented Gradients (Hog) Feature Extraction, Firdaus Solihin, Muhammad Syarief, Eka Mala Sari Rochman, Aeri Rachmad Dec 2023

Comparison Of Support Vector Machine (Svm), K-Nearest Neighbor (K-Nn), And Stochastic Gradient Descent (Sgd) For Classifying Corn Leaf Disease Based On Histogram Of Oriented Gradients (Hog) Feature Extraction, Firdaus Solihin, Muhammad Syarief, Eka Mala Sari Rochman, Aeri Rachmad

Elinvo (Electronics, Informatics, and Vocational Education)

Image classification involves categorizing an image's pixels into specific classes based on their unique characteristics. It has diverse applications in everyday life. One such application is the classification of diseases on corn leaves. Corn is a widely consumed staple food in Indonesia, and healthy corn plants are crucial for meeting market demands. Currently, disease identification in corn plants relies on manual checks, which are time-consuming and less effective. This research aims to automate disease identification on corn leaves using the Support Vector Machine (SVM), K-Nearest Neighbor (K-NN) with K=2, and Stochastic Gradient Descent (SGD) algorithms. The classification process utilizes the …


Classification Of Organic And Inorganic Waste Types Based On Neural Networks, Fatchul Arifin, M. Habiburrahman, Wahyu Ramadhani Gusti Dec 2023

Classification Of Organic And Inorganic Waste Types Based On Neural Networks, Fatchul Arifin, M. Habiburrahman, Wahyu Ramadhani Gusti

Elinvo (Electronics, Informatics, and Vocational Education)

Garbage is the residue of unused industrial production and household consumption. In Indonesia, waste is divided into 2 types, namely organic and inorganic waste. The two types of waste can be recycled in diverse ways, so they must be separated. So far, it is often difficult for the community to sort waste. This paper presents the process of recognizing and sorting waste automatically by utilizing Artificial Intelligence technology, especially Artificial Neural Networks (ANN). The ANN architecture used in this study consists of 4 layers. The number of neurons in each layer consists of 3 neurons in the input layer, 4 …


Soybean Collect Recommender Based On Distance And Productivity Cluster Using K-Means Clustering And Simple Addictive Weighting Method, Mega Wahyu Ningtyas, Feddy Setio Pribadi Dec 2023

Soybean Collect Recommender Based On Distance And Productivity Cluster Using K-Means Clustering And Simple Addictive Weighting Method, Mega Wahyu Ningtyas, Feddy Setio Pribadi

Elinvo (Electronics, Informatics, and Vocational Education)

Soybeans are an essential agricultural product that is one of the primary food sources in Indonesia, such as tempeh, tofu, soy milk, soy sauce, and other preparations. However, production yields, harvested land area, and soybean productivity in each district or city in Central Java Province vary widely. Differences in soybean productivity in each area are due to production factors such as area, use of fertilizers, seeds, and labor. This study tries to provide recommendations for soybean harvesting based on the distance and productivity of an area using K-means clustering and the simple addictive weighting method. In the Central Java Province, …


Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia Dec 2023

Reducing Food Scarcity: The Benefits Of Urban Farming, S.A. Claudell, Emilio Mejia

Journal of Nonprofit Innovation

Urban farming can enhance the lives of communities and help reduce food scarcity. This paper presents a conceptual prototype of an efficient urban farming community that can be scaled for a single apartment building or an entire community across all global geoeconomics regions, including densely populated cities and rural, developing towns and communities. When deployed in coordination with smart crop choices, local farm support, and efficient transportation then the result isn’t just sustainability, but also increasing fresh produce accessibility, optimizing nutritional value, eliminating the use of ‘forever chemicals’, reducing transportation costs, and fostering global environmental benefits.

Imagine Doris, who is …


Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler Dec 2023

Ohio Recovery Housing: Resident Risk And Outcomes Assessment, Elyjiah Potter, Bivin Sadler

SMU Data Science Review

Addiction and substance abuse disorder is a significant problem in the United States. Over the past two decades, the United States has faced a boom in substance abuse, which has resulted in an increase in death and disruption of families across the nation. The State of Ohio has been particularly hard hit by the crisis, with overdose rates nearly doubling the national average. Established in the mid 1970’s Sober Living Housing is an alcohol and substance use recovery model emphasizing personal responsibility, sober living, and community support. This model has been adopted by the Ohio Recovery Housing organization, which seeks …


Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam Dec 2023

Deep Learning Image Analysis To Isolate And Characterize Different Stages Of S-Phase In Human Cells, Kevin A. Boyd, Rudranil Mitra, John Santerre, Christopher L. Sansam

SMU Data Science Review

Abstract. This research used deep learning for image analysis by isolating and characterizing distinct DNA replication patterns in human cells. By leveraging high-resolution microscopy images of multiple cells stained with 5-Ethynyl-2′-deoxyuridine (EdU), a replication marker, this analysis utilized Convolutional Neural Networks (CNNs) to perform image segmentation and to provide robust and reliable classification results. First multiple cells in a field of focus were identified using a pretrained CNN called Cellpose. After identifying the location of each cell in the image a python script was created to crop out each cell into individual .tif files. After careful annotation, a CNN was …


Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun Dec 2023

Investigation Into A Practical Application Of Reinforcement Learning For The Stock Market, Philip Traxler, Sadik Aman, Will Rogers, Allyn Okun

SMU Data Science Review

A major problem of the financial industry is the ability to adapt their trading strategies at the same rate the market evolves. This paper proposes a solution using existing Reinforcement Learning libraries to help find new strategies at a practical scale. Using a wide domain of ticker symbols, an algorithm is trained in an environment that better represents reality. The supplied decision-making algorithm is tested using recorded data from the U.S stock market from 2000 through 2022. The results of this research show that existing techniques are statistically better than making decisions at random. With this result, this research shows …


Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre Dec 2023

Differentiation Of Human, Dog, And Cat Hair Fibers Using Dart Tofms And Machine Learning, Laura Ahumada, Erin R. Mcclure-Price, Chad Kwong, Edgard O. Espinoza, John Santerre

SMU Data Science Review

Hair is found in over 90% of crime scenes and has long been analyzed as trace evidence. However, recent reviews of traditional hair fiber analysis techniques, primarily morphological examination, have cast doubt on its reliability. To address these concerns, this study employed machine learning algorithms, specifically Linear Discriminant Analysis (LDA) and Random Forest, on Direct Analysis in Real Time time-of-flight mass spectra collected from human, cat, and dog hair samples. The objective was to develop a chemistry- and statistics-based classification method for unbiased taxonomic identification of hair. The results of the study showed that LDA and Random Forest were highly …