Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 11 of 11

Full-Text Articles in Artificial Intelligence and Robotics

Random Forest For High-Dimensional Data, George Ekow Quaye Aug 2024

Random Forest For High-Dimensional Data, George Ekow Quaye

Open Access Theses & Dissertations

The exponential growth of data has led to a rapid increase in high-dimensional datasets across various domains, presenting significant challenges in data analysis, particularly in predictive modeling tasks. Traditional Random Forest (RF), while robust, often struggles with datasets filled with numerous noisy or non-informative features, compromising both performance and accuracy. This study introduces an advanced algorithm, High-Dimensional Random Forests (HDRF), designed to address these challenges by integrating robust multivariate feature selection techniques directly into the decision tree construction process. Unlike standard RF, HDRF incorporates ridge regression-based variable screening at each decision split, enhancing its ability to identify and utilize the …


Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali Apr 2024

Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali

Al-Azhar Bulletin of Science

One of the most recent developments in the fields of deep learning and machine learning is Graph Neural Networks (GNNs). GNNs core task is the feature aggregation stage, which is carried out over the node's neighbours without taking into account whether the features are relevant or not. Additionally, the majority of these existing node representation techniques only consider the network's topology structure while completely ignoring the centrality information. In this paper, a new technique for explaining graph features depending on four different feature selection approaches and centrality measures in order to identify the important nodes and relevant node features is …


Biomarker Identification For Breast Cancer Types Using Feature Selection And Explainable Ai Methods, David E. La Rosa Giraud Jan 2023

Biomarker Identification For Breast Cancer Types Using Feature Selection And Explainable Ai Methods, David E. La Rosa Giraud

Honors Undergraduate Theses

This paper investigates the impact the LASSO, mRMR, SHAP, and Reinforcement Feature Selection techniques on random forest models for the breast cancer subtypes markers ER, HER2, PR, and TN as well as identifying a small subset of biomarkers that could potentially cause the disease and explain them using explainable AI techniques. This is important because in areas such as healthcare understanding why the model makes a specific decision is important it is a diagnostic of an individual which requires reliable AI. Another contribution is using feature selection methods to identify a small subset of biomarkers capable of predicting if a …


Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen Apr 2021

Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen

Research Collection School Of Computing and Information Systems

Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier …


Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii Jan 2021

Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii

Masters Theses

“As the medical world becomes increasingly intertwined with the tech sphere, machine learning on medical datasets and mathematical models becomes an attractive application. This research looks at the predictive capabilities of neural networks and other machine learning algorithms, and assesses the validity of several feature selection strategies to reduce the negative effects of high dataset dimensionality. Our results indicate that several feature selection methods can maintain high validation and test accuracy on classification tasks, with neural networks performing best, for both single class and multi-class classification applications. This research also evaluates a proof-of-concept application of a deep-Q-learning network (DQN) to …


Network Traffic Based Botnet Detection Using Machine Learning, Anand Ravindra Vishwakarma May 2020

Network Traffic Based Botnet Detection Using Machine Learning, Anand Ravindra Vishwakarma

Master's Projects

The field of information and computer security is rapidly developing in today’s world as the number of security risks is continuously being explored every day. The moment a new software or a product is launched in the market, a new exploit or vulnerability is exposed and exploited by the attackers or malicious users for different motives. Many attacks are distributed in nature and carried out by botnets that cause widespread disruption of network activity by carrying out DDoS (Distributed Denial of Service) attacks, email spamming, click fraud, information and identity theft, virtual deceit and distributed resource usage for cryptocurrency mining. …


Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice Jan 2019

Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice

All Master's Theses

The classification of stellar spectra is a fundamental task in stellar astrophysics. There have been many explorations into the automated classification of stellar spectra but few that involve the Sloan Digital Sky Survey (SDSS). Stellar spectra from the SDSS are applied to standard classification methods such as K-Nearest Neighbors, Random Forest, and Support Vector Machine to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using standard Feature Selection methods such as Chi-Squared and Fisher score and with domain-specific astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify …


A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Aug 2010

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Dr. Huanjing Wang

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …


A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Aug 2010

A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …


Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya Dec 2009

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya

Dr. Huanjing Wang

A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of …


High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Nov 2009

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Dr. Huanjing Wang

Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …