Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Singapore Management University

Research Collection School Of Computing and Information Systems

2017

Machine Learning: Data Mining

Articles 1 - 2 of 2

Full-Text Articles in Engineering

Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu Aug 2017

Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu

Research Collection School Of Computing and Information Systems

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors …


Embedding-Based Representation Of Categorical Data By Hierarchical Value Coupling Learning, Songlei Jian, Longbing Cao, Guansong Pang, Kai Lu, Hang Gao Aug 2017

Embedding-Based Representation Of Categorical Data By Hierarchical Value Coupling Learning, Songlei Jian, Longbing Cao, Guansong Pang, Kai Lu, Hang Gao

Research Collection School Of Computing and Information Systems

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster …