Open Access. Powered by Scholars. Published by Universities.®
![Digital Commons Network](http://assets.bepress.com/20200205/img/dcn/DCsunburst.png)
Physical Sciences and Mathematics Commons™
Open Access. Powered by Scholars. Published by Universities.®
- Publication Type
Articles 1 - 2 of 2
Full-Text Articles in Physical Sciences and Mathematics
Big Data With Cloud Computing: Discussions And Challenges, Amanpreet Kaur Sandhu
Big Data With Cloud Computing: Discussions And Challenges, Amanpreet Kaur Sandhu
Big Data Mining and Analytics
With the recent advancements in computer technologies, the amount of data available is increasing day by day. However, excessive amounts of data create great challenges for users. Meanwhile, cloud computing services provide a powerful environment to store large volumes of data. They eliminate various requirements, such as dedicated space and maintenance of expensive computer hardware and software. Handling big data is a time-consuming task that requires large computational clusters to ensure successful data storage and processing. In this work, the definition, classification, and characteristics of big data are discussed, along with various cloud services, such as Microsoft Azure, Google Cloud, …
Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak
Randomized And Evolutionary Approaches To Dataset Characterization, Feature Weighting, And Sampling In K-Nearest Neighbors, Suryoday Basak
Computer Science and Engineering Theses
K-Nearest Neighbors (KNN) has remained one of the most popular methods for supervised machine learning tasks. However, its performance often depends on the characteristics of the dataset and on appropriate feature scaling. In this thesis, characteristics of a dataset that make it suitable for being used within KNN are explored. As part of this, two new measures for dataset dispersion, called mean neighborhood target variance (MNTV), and mean neighborhood target entropy (MNTE) are developed to help determine the performance we expect while using KNN regressors and classifiers, respectively. It is empirically demonstrated that these measures of dispersion can be indicative …