Open Access. Powered by Scholars. Published by Universities.®
Artificial Intelligence and Robotics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Discipline
- Institution
- Publication
- Publication Type
Articles 1 - 11 of 11
Full-Text Articles in Artificial Intelligence and Robotics
Random Forest For High-Dimensional Data, George Ekow Quaye
Random Forest For High-Dimensional Data, George Ekow Quaye
Open Access Theses & Dissertations
The exponential growth of data has led to a rapid increase in high-dimensional datasets across various domains, presenting significant challenges in data analysis, particularly in predictive modeling tasks. Traditional Random Forest (RF), while robust, often struggles with datasets filled with numerous noisy or non-informative features, compromising both performance and accuracy. This study introduces an advanced algorithm, High-Dimensional Random Forests (HDRF), designed to address these challenges by integrating robust multivariate feature selection techniques directly into the decision tree construction process. Unlike standard RF, HDRF incorporates ridge regression-based variable screening at each decision split, enhancing its ability to identify and utilize the …
Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali
Graph Neural Network Guided By Feature Selection And Centrality Measures For Node Classification On Homophilic And Heterophily Graphs, Asmaa M. Mahmoud, Heba F. Eid, Abeer S. Desuky, Hoda A. Ali
Al-Azhar Bulletin of Science
One of the most recent developments in the fields of deep learning and machine learning is Graph Neural Networks (GNNs). GNNs core task is the feature aggregation stage, which is carried out over the node's neighbours without taking into account whether the features are relevant or not. Additionally, the majority of these existing node representation techniques only consider the network's topology structure while completely ignoring the centrality information. In this paper, a new technique for explaining graph features depending on four different feature selection approaches and centrality measures in order to identify the important nodes and relevant node features is …
Biomarker Identification For Breast Cancer Types Using Feature Selection And Explainable Ai Methods, David E. La Rosa Giraud
Biomarker Identification For Breast Cancer Types Using Feature Selection And Explainable Ai Methods, David E. La Rosa Giraud
Honors Undergraduate Theses
This paper investigates the impact the LASSO, mRMR, SHAP, and Reinforcement Feature Selection techniques on random forest models for the breast cancer subtypes markers ER, HER2, PR, and TN as well as identifying a small subset of biomarkers that could potentially cause the disease and explain them using explainable AI techniques. This is important because in areas such as healthcare understanding why the model makes a specific decision is important it is a diagnostic of an individual which requires reliable AI. Another contribution is using feature selection methods to identify a small subset of biomarkers capable of predicting if a …
Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen
Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen
Research Collection School Of Computing and Information Systems
Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier …
Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii
Neural Network Supervised And Reinforcement Learning For Neurological, Diagnostic, And Modeling Problems, Donald Wunsch Iii
Masters Theses
“As the medical world becomes increasingly intertwined with the tech sphere, machine learning on medical datasets and mathematical models becomes an attractive application. This research looks at the predictive capabilities of neural networks and other machine learning algorithms, and assesses the validity of several feature selection strategies to reduce the negative effects of high dataset dimensionality. Our results indicate that several feature selection methods can maintain high validation and test accuracy on classification tasks, with neural networks performing best, for both single class and multi-class classification applications. This research also evaluates a proof-of-concept application of a deep-Q-learning network (DQN) to …
Network Traffic Based Botnet Detection Using Machine Learning, Anand Ravindra Vishwakarma
Network Traffic Based Botnet Detection Using Machine Learning, Anand Ravindra Vishwakarma
Master's Projects
The field of information and computer security is rapidly developing in today’s world as the number of security risks is continuously being explored every day. The moment a new software or a product is launched in the market, a new exploit or vulnerability is exposed and exploited by the attackers or malicious users for different motives. Many attacks are distributed in nature and carried out by botnets that cause widespread disruption of network activity by carrying out DDoS (Distributed Denial of Service) attacks, email spamming, click fraud, information and identity theft, virtual deceit and distributed resource usage for cryptocurrency mining. …
Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice
Classification Of Stars From Redshifted Stellar Spectra Utilizing Machine Learning, Michael J. Brice
All Master's Theses
The classification of stellar spectra is a fundamental task in stellar astrophysics. There have been many explorations into the automated classification of stellar spectra but few that involve the Sloan Digital Sky Survey (SDSS). Stellar spectra from the SDSS are applied to standard classification methods such as K-Nearest Neighbors, Random Forest, and Support Vector Machine to automatically classify the spectra. Stellar spectra are high dimensional data and the dimensionality is reduced using standard Feature Selection methods such as Chi-Squared and Fisher score and with domain-specific astronomical knowledge because classifiers work in low dimensional space. These methods are utilized to classify …
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Dr. Huanjing Wang
One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Computer Science Faculty Publications
One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …
Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya
Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya
Dr. Huanjing Wang
A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of …
High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Dr. Huanjing Wang
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …