Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Other

University of Texas at Arlington

Data analysis

Articles 1 - 1 of 1

Full-Text Articles in Physical Sciences and Mathematics

Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak May 2020

Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak

Computer Science and Engineering Theses

K-Nearest Neighbors (KNN) has remained one of the most popular methods for supervised machine learning tasks. However, its performance often depends on the characteristics of the dataset and on appropriate feature scaling. In this thesis, characteristics of a dataset that make it suitable for being used within KNN are explored. As part of this, two new measures for dataset dispersion, called mean neighborhood target variance (MNTV), and mean neighborhood target entropy (MNTE) are developed to help determine the performance we expect while using KNN regressors and classifiers, respectively. It is empirically demonstrated that these measures of dispersion can be indicative …