Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Data mining

Research Collection School Of Computing and Information Systems

Theory and Algorithms

Publication Year

Articles 1 - 3 of 3

Full-Text Articles in Computer Sciences

Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng Jul 2020

Probabilistic Value Selection For Space Efficient Model, Gunarto Sindoro Njoo, Baihua Zheng, Kuo-Wei Hsu, Wen-Chih Peng

Research Collection School Of Computing and Information Systems

An alternative to current mainstream preprocessing methods is proposed: Value Selection (VS). Unlike the existing methods such as feature selection that removes features and instance selection that eliminates instances, value selection eliminates the values (with respect to each feature) in the dataset with two purposes: reducing the model size and preserving its accuracy. Two probabilistic methods based on information theory's metric are proposed: PVS and P + VS. Extensive experiments on the benchmark datasets with various sizes are elaborated. Those results are compared with the existing preprocessing methods such as feature selection, feature transformation, and instance selection methods. Experiment results …


Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan Jul 2018

Efficient Representative Subset Selection Over Sliding Windows, Yanhao Wang, Yuchen Li, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Representative subset selection (RSS) is an important tool for users to draw insights from massive datasets. Existing literature models RSS as submodular maximization to capture the "diminishing returns" property of representativeness, but often only has a single constraint, which limits its applications to many real-world problems. To capture the recency issue and support various constraints, we formulate dynamic RSS as maximizing submodular functions subject to general d -knapsack constraints (SMDK) over sliding windows. We propose a KnapWindow framework (KW) for SMDK. KW utilizes KnapStream (KS) for SMDK in append-only streams as a subroutine. It maintains a sequence of checkpoints and …


Machine Learning In Wireless Sensor Networks: Algorithms, Strategies, And Applications, Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, Hwee-Pink Tan Apr 2014

Machine Learning In Wireless Sensor Networks: Algorithms, Strategies, And Applications, Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, Hwee-Pink Tan

Research Collection School Of Computing and Information Systems

Wireless sensor networks (WSNs) monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in WSNs. The advantages and disadvantages of each proposed algorithm are …