Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Numerical Analysis and Scientific Computing

2019

Feature selection

Articles 1 - 2 of 2

Full-Text Articles in Physical Sciences and Mathematics

Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai Dec 2019

Noise Clipping Algorithm Based On Relative Contribution Rate, Shuoyu Liu, Yueming Dai

Journal of System Simulation

Abstract: This paper presents a class noise cutting algorithm (Class noise cutting, CNC) based on relative contribution rate. The algorithm calculates the relative contribution rate of features to the theme. The most valuable feature set is selected by using features distinguish rating. The corresponding candidate categories for each feature are selected, to reduece the candidate category set, improves the classification accuracy, and speed up the response speed of the classifier. Compared with another ECN noise cutting algorithm (Eliminating the class whose), CNC-has higher accuracy and because of its simpler feature dimension dictionary and better candidate category set, the response …


Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez Jan 2019

Distributed Multi-Label Learning On Apache Spark, Jorge Gonzalez Lopez

Theses and Dissertations

This thesis proposes a series of multi-label learning algorithms for classification and feature selection implemented on the Apache Spark distributed computing model. Five approaches for determining the optimal architecture to speed up multi-label learning methods are presented. These approaches range from local parallelization using threads to distributed computing using independent or shared memory spaces. It is shown that the optimal approach performs hundreds of times faster than the baseline method. Three distributed multi-label k nearest neighbors methods built on top of the Spark architecture are proposed: an exact iterative method that computes pair-wise distances, an approximate tree-based method that indexes …