Open Access. Powered by Scholars. Published by Universities.®
Artificial Intelligence and Robotics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Publication
- Publication Type
Articles 1 - 5 of 5
Full-Text Articles in Artificial Intelligence and Robotics
Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen
Homophily Outlier Detection In Non-Iid Categorical Data, Guansong Pang, Longbing Cao, Ling Chen
Research Collection School Of Computing and Information Systems
Most of existing outlier detection methods assume that the outlier factors (i.e., outlierness scoring measures) of data entities (e.g., feature values and data objects) are Independent and Identically Distributed (IID). This assumption does not hold in real-world applications where the outlierness of different entities is dependent on each other and/or taken from different probability distributions (non-IID). This may lead to the failure of detecting important outliers that are too subtle to be identified without considering the non-IID nature. The issue is even intensified in more challenging contexts, e.g., high-dimensional data with many noisy features. This work introduces a novel outlier …
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Dr. Huanjing Wang
One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
A Comparative Study Of Filter-Based Feature Ranking Techniques, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Computer Science Faculty Publications
One factor that affects the success of machine learning is the presence of irrelevant or redundant information in the training data set. Filter-based feature ranking techniques (rankers) rank the features according to their relevance to the target attribute and we choose the most relevant features to build classification models subsequently. In order to evaluate the effectiveness of different feature ranking techniques, a commonly used method is to assess the classification performance of models built with the respective selected feature subsets in terms of a given performance metric (e.g., classification accuracy or misclassification rate). Since a given performance metric usually can …
Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya
Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya
Dr. Huanjing Wang
A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of …
High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao
Dr. Huanjing Wang
Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …