Open Access. Powered by Scholars. Published by Universities.®

Business Intelligence Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 2 of 2

Full-Text Articles in Business Intelligence

Tree-Based Algorithm For Stable And Efficient Data Clustering, Hasan Aljabbouli, Abdullah Albizri, Antoine Harfouche Sep 2020

Tree-Based Algorithm For Stable And Efficient Data Clustering, Hasan Aljabbouli, Abdullah Albizri, Antoine Harfouche

Department of Information Management and Business Analytics Faculty Scholarship and Creative Works

The K-means algorithm is a well-known and widely used clustering algorithm due to its simplicity and convergence properties. However, one of the drawbacks of the algorithm is its instability. This paper presents improvements to the K-means algorithm using a K-dimensional tree (Kd-tree) data structure. The proposed Kd-tree is utilized as a data structure to enhance the choice of initial centers of the clusters and to reduce the number of the nearest neighbor searches required by the algorithm. The developed framework also includes an efficient center insertion technique leading to an incremental operation that overcomes the instability problem of the K-means …


A Hybrid Data Mining Approach For Identifying The Temporal Effects Of Variables Associated With Breast Cancer Survival, Serhat Simsek, Ugur Kursuncu, Eyyub Kibis, Musheera Anisabdellatif, Ali Dag Jan 2020

A Hybrid Data Mining Approach For Identifying The Temporal Effects Of Variables Associated With Breast Cancer Survival, Serhat Simsek, Ugur Kursuncu, Eyyub Kibis, Musheera Anisabdellatif, Ali Dag

Department of Information Management and Business Analytics Faculty Scholarship and Creative Works

Predicting breast cancer survival is crucial for practitioners to determine possible outcomes and make better treatment plans for the patients. In this study, a hybrid data mining based methodology was constructed to differentiate the variables whose importance for survival change over time. Therefore, the importance of variables was determined for three different time periods (i.e. one, five, and ten years). To conduct such an analysis, the most parsimonious models were constructed by employing one regression analysis method—Least Absolute Shrinkage and Selection Operator (LASSO), and one metaheuristic optimization method, namely a Genetic Algorithm (GA). Due to the high imbalance between the …