Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Software Engineering

Research Collection School Of Computing and Information Systems

Machine learning

Articles 1 - 28 of 28

Full-Text Articles in Physical Sciences and Mathematics

Constructing Cyber-Physical System Testing Suites Using Active Sensor Fuzzing, Fan. Zhang, Qianmei. Wu, Bohan. Xuan, Yuqi. Chen, Wei. Lin, Christopher M. Poskitt, Jun Sun, Binbin. Chen Oct 2023

Constructing Cyber-Physical System Testing Suites Using Active Sensor Fuzzing, Fan. Zhang, Qianmei. Wu, Bohan. Xuan, Yuqi. Chen, Wei. Lin, Christopher M. Poskitt, Jun Sun, Binbin. Chen

Research Collection School Of Computing and Information Systems

Cyber-physical systems (CPSs) automating critical public infrastructure face a pervasive threat of attack, motivating research into different types of countermeasures. Assessing the effectiveness of these countermeasures is challenging, however, as benchmarks are difficult to construct manually, existing automated testing solutions often make unrealistic assumptions, and blindly fuzzing is ineffective at finding attacks due to the enormous search spaces and resource requirements. In this work, we propose active sensor fuzzing , a fully automated approach for building test suites without requiring any a prior knowledge about a CPS. Our approach employs active learning techniques. Applied to a real-world water treatment system, …


Experimental Comparison Of Features, Analyses, And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Naing Tun Yan, David Lo, Lingxiao Jiang, Christoph Bienert Sep 2023

Experimental Comparison Of Features, Analyses, And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Naing Tun Yan, David Lo, Lingxiao Jiang, Christoph Bienert

Research Collection School Of Computing and Information Systems

Android malware detection has been an active area of research. In the past decade, several machine learning-based approaches based on different types of features that may characterize Android malware behaviors have been proposed. The usually-analyzed features include API usages and sequences at various abstraction levels (e.g., class and package), extracted using static or dynamic analysis. Additionally, features that characterize permission uses, native API calls and reflection have also been analyzed. Initial works used conventional classifiers such as Random Forest to learn on those features. In recent years, deep learning-based classifiers such as Recurrent Neural Network have been explored. Considering various …


Right To Know, Right To Refuse: Towards Ui Perception-Based Automated Fine-Grained Permission Controls For Android Apps, Vikas Kumar Malviya, Chee Wei Leow, Ashok Kasthuri, Naing Tun Yan, Lwin Khin Shar, Lingxiao Jiang Oct 2022

Right To Know, Right To Refuse: Towards Ui Perception-Based Automated Fine-Grained Permission Controls For Android Apps, Vikas Kumar Malviya, Chee Wei Leow, Ashok Kasthuri, Naing Tun Yan, Lwin Khin Shar, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

It is the basic right of a user to know how the permissions are used within the Android app’s scope and to refuse the app if granted permissions are used for the activities other than specified use which can amount to malicious behavior. This paper proposes an approach and a vision to automatically model the permissions necessary for Android apps from users’ perspective and enable fine-grained permission controls by users, thus facilitating users in making more well-informed and flexible permission decisions for different app functionalities, which in turn improve the security and data privacy of the App and enforce apps …


Automated Reverse Engineering Of Role-Based Access Control Policies Of Web Applications, Ha Thanh Le, Lwin Khin Shar, Domenico Bianculli, Lionel C. Briand, Cu Duy Nguyen Feb 2022

Automated Reverse Engineering Of Role-Based Access Control Policies Of Web Applications, Ha Thanh Le, Lwin Khin Shar, Domenico Bianculli, Lionel C. Briand, Cu Duy Nguyen

Research Collection School Of Computing and Information Systems

Access control (AC) is an important security mechanism used in software systems to restrict access to sensitive resources. Therefore, it is essential to validate the correctness of AC implementations with respect to policy specifications or intended access rights. However, in practice, AC policy specifications are often missing or poorly documented; in some cases, AC policies are hard-coded in business logic implementations. This leads to difficulties in validating the correctness of policy implementations and detecting AC defects.In this paper, we present a semi-automated framework for reverse-engineering of AC policies from Web applications. Our goal is to learn and recover role-based access …


A Survey On Deep Learning For Software Engineering, Yanming Yang, Xin Xia, David Lo Jan 2022

A Survey On Deep Learning For Software Engineering, Yanming Yang, Xin Xia, David Lo

Research Collection School Of Computing and Information Systems

In 2006, Geoffrey Hinton proposed the concept of training "Deep Neural Networks (DNNs)" and an improved model training method to break the bottleneck of neural network development. More recently, the introduction of AlphaGo in 2016 demonstrated the powerful learning ability of deep learning and its enormous potential. Deep learning has been increasingly used to develop state-of-the-art software engineering (SE) research tools due to its ability to boost performance for various SE tasks. There are many factors, e.g., deep learning model selection, internal structure differences, and model optimization techniques, that may have an impact on the performance of DNNs applied in …


Predictive Models In Software Engineering: Challenges And Opportunities, Yanming Yang, Xin Xia, David Lo, Tingting Bi, John C. Grundy, Xiaohu Yang Jan 2022

Predictive Models In Software Engineering: Challenges And Opportunities, Yanming Yang, Xin Xia, David Lo, Tingting Bi, John C. Grundy, Xiaohu Yang

Research Collection School Of Computing and Information Systems

Predictive models are one of the most important techniques that are widely applied in many areas of software engineering. There have been a large number of primary studies that apply predictive models and that present well-performed studies in various research domains, including software requirements, software design and development, testing and debugging, and software maintenance. This article is a first attempt to systematically organize knowledge in this area by surveying a body of 421 papers on predictive models published between 2009 and 2020. We describe the key models and approaches used, classify the different models, summarize the range of key application …


Binary Classifiers For Noisy Datasets: A Comparative Study Of Existing Quantum Machine Learning Frameworks And Some New Approaches, Nikolaos Schetakis, Davit Aghamalyan, Paul Robert Griffin, Michael Boguslavsky Nov 2021

Binary Classifiers For Noisy Datasets: A Comparative Study Of Existing Quantum Machine Learning Frameworks And Some New Approaches, Nikolaos Schetakis, Davit Aghamalyan, Paul Robert Griffin, Michael Boguslavsky

Research Collection School Of Computing and Information Systems

This technology offer is a quantum machine learning algorithm applied to binary classification models for noisy datasets which are prevalent in financial and other datasets. By combining hybrid-neural networks, quantum parametric circuits, and data re-uploading we have improved the classification of non-convex 2-dimensional figures by understanding learning stability as noise increases in the dataset. The metric we use for assessing the performance of our quantum classifiers is the area under the receiver operator curve (ROC AUC). We are interested to collaborate with partners with use cases for binary classification of noisy data. Also, as quantum technology is still insufficient for …


Revman: Revenue-Aware Multi-Task Online Insurance Recommendation, Yu Li, Yi Zhang, Lu Gan, Gengwei Hong, Zimu Zhou, Qiang Li Feb 2021

Revman: Revenue-Aware Multi-Task Online Insurance Recommendation, Yu Li, Yi Zhang, Lu Gan, Gengwei Hong, Zimu Zhou, Qiang Li

Research Collection School Of Computing and Information Systems

Online insurance is a new type of e-commerce with exponential growth. An effective recommendation model that maximizes the total revenue of insurance products listed in multiple customized sales scenarios is crucial for the success of online insurance business. Prior recommendation models are ineffective because they fail to characterize the complex relatedness of insurance products in multiple sales scenarios and maximize the overall conversion rate rather than the total revenue. Even worse, it is impractical to collect training data online for total revenue maximization due to the business logic of online insurance. We propose RevMan, a Revenue-aware Multi-task Network for online …


Machine Learning Integrated Design For Additive Manufacturing, Jingchao Jiang, Yi Xiong, Zhiyuan Zhang, David W. Rosen Nov 2020

Machine Learning Integrated Design For Additive Manufacturing, Jingchao Jiang, Yi Xiong, Zhiyuan Zhang, David W. Rosen

Research Collection School Of Computing and Information Systems

For improving manufacturing efficiency and minimizing costs, design for additive manufacturing (AM) has been accordingly proposed. The existing design for AM methods are mainly surrogate model based. Due to the increasingly available data nowadays, machine learning (ML) has been applied to medical diagnosis, image processing, prediction, classification, learning association, etc. A variety of studies have also been carried out to use machine learning for optimizing the process parameters of AM with corresponding objectives. In this paper, a ML integrated design for AM framework is proposed, which takes advantage of ML that can learn the complex relationships between the design and …


Experimental Comparison Of Features And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Wei Minn Oct 2020

Experimental Comparison Of Features And Classifiers For Android Malware Detection, Lwin Khin Shar, Biniam Fisseha Demissie, Mariano Ceccato, Wei Minn

Research Collection School Of Computing and Information Systems

Android platform has dominated the smart phone market for years now and, consequently, gained a lot of attention from attackers. Malicious apps (malware) pose a serious threat to the security and privacy of Android smart phone users. Available approaches to detect mobile malware based on machine learning rely on features extracted with static analysis or dynamic analysis techniques. Dif- ferent types of machine learning classi ers (such as support vector machine and random forest) deep learning classi ers (based on deep neural networks) are then trained on extracted features, to produce models that can be used to detect mobile malware. …


Objsim: Efficient Testing Of Cyber-Physical Systems, Jun Sun, Zijiang Yang Jul 2020

Objsim: Efficient Testing Of Cyber-Physical Systems, Jun Sun, Zijiang Yang

Research Collection School Of Computing and Information Systems

Cyber-physical systems (CPSs) play a critical role in automating public infrastructure and thus attract wide range of attacks. Assessing the effectiveness of defense mechanisms is challenging as realistic sets of attacks to test them against are not always available. In this short paper, we briefly describe smart fuzzing, an automated, machine learning guided technique for systematically producing test suites of CPS network attacks. Our approach uses predictive ma- chine learning models and meta-heuristic search algorithms to guide the fuzzing of actuators so as to drive the CPS into different unsafe physical states. The approach has been proven effective on two …


A Machine Learning Approach For Vulnerability Curation, Yang Chen, Andrew E. Santosa, Ming Yi Ang, Abhishek Sharma, Asankhaya Sharma, David Lo Jun 2020

A Machine Learning Approach For Vulnerability Curation, Yang Chen, Andrew E. Santosa, Ming Yi Ang, Abhishek Sharma, Asankhaya Sharma, David Lo

Research Collection School Of Computing and Information Systems

Software composition analysis depends on database of open-source library vulerabilities, curated by security researchers using various sources, such as bug tracking systems, commits, and mailing lists. We report the design and implementation of a machine learning system to help the curation by by automatically predicting the vulnerability-relatedness of each data item. It supports a complete pipeline from data collection, model training and prediction, to the validation of new models before deployment. It is executed iteratively to generate better models as new input data become available. We use self-training to significantly and automatically increase the size of the training dataset, opportunistically …


Automated Identification Of Libraries From Vulnerability Data, Chen Yang, Andrew Santosa, Asankhaya Sharma, David Lo May 2020

Automated Identification Of Libraries From Vulnerability Data, Chen Yang, Andrew Santosa, Asankhaya Sharma, David Lo

Research Collection School Of Computing and Information Systems

Software Composition Analysis (SCA) has gained traction in recent years with a number of commercial offerings from various companies. SCA involves vulnerability curation process where a group of security researchers, using various data sources, populate a database of open-source library vulnerabilities, which is used by a scanner to inform the end users of vulnerable libraries used by their applications. One of the data sources used is the National Vulnerability Database (NVD). The key challenge faced by the security researchers here is in figuring out which libraries are related to each of the reported vulnerability in NVD. In this article, we …


Are The Code Snippets What We Are Searching For? A Benchmark And An Empirical Study On Code Search With Natural-Language Queries, Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen Feb 2020

Are The Code Snippets What We Are Searching For? A Benchmark And An Empirical Study On Code Search With Natural-Language Queries, Shuhan Yan, Hang Yu, Yuting Chen, Beijun Shen

Research Collection School Of Computing and Information Systems

Code search methods, especially those that allow programmers to raise queries in a natural language, plays an important role in software development. It helps to improve programmers' productivity by returning sample code snippets from the Internet and/or source-code repositories for their natural-language queries. Meanwhile, there are many code search methods in the literature that support natural-language queries. Difficulties exist in recognizing the strengths and weaknesses of each method and choosing the right one for different usage scenarios, because (1) the implementations of those methods and the datasets for evaluating them are usually not publicly available, and (2) some methods leverage …


Learning-Guided Network Fuzzing For Testing Cyber-Physical System Defences, Yuqi Chen, Christopher M. Poskitt, Jun Sun, Sridhar Adepu, Fan Zhang Jan 2020

Learning-Guided Network Fuzzing For Testing Cyber-Physical System Defences, Yuqi Chen, Christopher M. Poskitt, Jun Sun, Sridhar Adepu, Fan Zhang

Research Collection School Of Computing and Information Systems

The threat of attack faced by cyber-physical systems (CPSs), especially when they play a critical role in automating public infrastructure, has motivated research into a wide variety of attack defence mechanisms. Assessing their effectiveness is challenging, however, as realistic sets of attacks to test them against are not always available. In this paper, we propose smart fuzzing, an automated, machine learning guided technique for systematically finding 'test suites' of CPS network attacks, without requiring any knowledge of the system's control programs or physical processes. Our approach uses predictive machine learning models and metaheuristic search algorithms to guide the fuzzing of …


How Does Machine Learning Change Software Development Practices?, Zhiyuan Wan, Xin Xia, David Lo, Gail C. Murphy Aug 2019

How Does Machine Learning Change Software Development Practices?, Zhiyuan Wan, Xin Xia, David Lo, Gail C. Murphy

Research Collection School Of Computing and Information Systems

Adding an ability for a system to learn inherently adds uncertainty into the system. Given the rising popularity of incorporating machine learning into systems, we wondered how the addition alters software development practices. We performed a mixture of qualitative and quantitative studies with 14 interviewees and 342 survey respondents from 26 countries across four continents to elicit significant differences between the development of machine learning systems and the development of non-machine-learning systems. Our study uncovers significant differences in various aspects of software engineering (e.g., requirements, design, testing, and process) and work characteristics (e.g., skill variety, problem solving and task identity). …


Deepreview: Automatic Code Review Using Deep Multi-Instance Learning, Hengyi Li, Shuting Shi, Ferdian Thung, Xuan Huo, Bowen Xu, Ming Li, David Lo Apr 2019

Deepreview: Automatic Code Review Using Deep Multi-Instance Learning, Hengyi Li, Shuting Shi, Ferdian Thung, Xuan Huo, Bowen Xu, Ming Li, David Lo

Research Collection School Of Computing and Information Systems

Code review, an inspection of code changes in order to identify and fix defects before integration, is essential in Software Quality Assurance (SQA). Code review is a time-consuming task since the reviewers need to understand, analysis and provide comments manually. To alleviate the burden of reviewers, automatic code review is needed. However, this task has not been well studied before. To bridge this research gap, in this paper, we formalize automatic code review as a multi-instance learning task that each change consisting of multiple hunks is regarded as a bag, and each hunk is described as an instance. We propose …


Identifying Elderlies At Risk Of Becoming More Depressed With Internet-Of-Things, Jiajue Ou, Huiguang Liang, Hwee Xian Tan Jul 2018

Identifying Elderlies At Risk Of Becoming More Depressed With Internet-Of-Things, Jiajue Ou, Huiguang Liang, Hwee Xian Tan

Research Collection School Of Computing and Information Systems

Depression in the elderly is common and dangerous. Current methods to monitor elderly depression, however, are costly, time-consuming and inefficient. In this paper, we present a novel depression-monitoring system that infers an elderly’s changes in depression level based on his/her activity patterns, extracted from wireless sensor data. To do so, we build predictive models to learn the relationship between depression level changes and behaviors using historical data. We also deploy the system for a group of elderly, in their homes, and run the experiments for more than one year. Our experimental study gives encouraging results, suggesting that our IoT system …


Learning From Mutants: Using Code Mutation To Learn And Monitor Invariants Of A Cyber-Physical System, Yuqi Chen, Christopher M. Poskitt, Jun Sun May 2018

Learning From Mutants: Using Code Mutation To Learn And Monitor Invariants Of A Cyber-Physical System, Yuqi Chen, Christopher M. Poskitt, Jun Sun

Research Collection School Of Computing and Information Systems

Cyber-physical systems (CPS) consist of sensors, actuators, and controllers all communicating over a network; if any subset becomes compromised, an attacker could cause significant damage. With access to data logs and a model of the CPS, the physical effects of an attack could potentially be detected before any damage is done. Manually building a model that is accurate enough in practice, however, is extremely difficult. In this paper, we propose a novel approach for constructing models of CPS automatically, by applying supervised machine learning to data traces obtained after systematically seeding their software components with faults ("mutants"). We demonstrate the …


Anomaly Detection For A Water Treatment System Using Unsupervised Machine Learning, Jun Inoue, Yoriyuki Yamagata, Yuqi Chen, Christopher M. Poskitt, Jun Sun Nov 2017

Anomaly Detection For A Water Treatment System Using Unsupervised Machine Learning, Jun Inoue, Yoriyuki Yamagata, Yuqi Chen, Christopher M. Poskitt, Jun Sun

Research Collection School Of Computing and Information Systems

In this paper, we propose and evaluate the application of unsupervised machine learning to anomaly detection for a Cyber-Physical System (CPS). We compare two methods: Deep Neural Networks (DNN) adapted to time series data generated by a CPS, and one-class Support Vector Machines (SVM). These methods are evaluated against data from the Secure Water Treatment (SWaT) testbed, a scaled-down but fully operational raw water purification plant. For both methods, we first train detectors using a log generated by SWaT operating under normal conditions. Then, we evaluate the performance of both methods using a log generated by SWaT operating under 36 …


Inferring Spread Of Readers’ Emotion Affected By Online News, Agus Sulistya, Ferdian Thung, David Lo Sep 2017

Inferring Spread Of Readers’ Emotion Affected By Online News, Agus Sulistya, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Depending on the reader, A news article may be viewed from many different perspectives, thus triggering different (and possibly contradicting) emotions. In this paper, we formulate a problem of predicting readers’ emotion distribution affected by a news article. Our approach analyzes affective annotations provided by readers of news articles taken from a non-English online news site. We create a new corpus from the annotated articles, and build a domain-specific emotion lexicon and word embedding features. We finally construct a multi-target regression model from a set of features extracted from online news articles. Our experiments show that by combining lexicon and …


Sugarmate: Non-Intrusive Blood Glucose Monitoring With Smartphones, Weixi Gu, Yuxun Zhou, Zimu Zhou, Xi Liu, Han Zou, Pei Zhang, Costas J. Spanos, Lin Zhang Sep 2017

Sugarmate: Non-Intrusive Blood Glucose Monitoring With Smartphones, Weixi Gu, Yuxun Zhou, Zimu Zhou, Xi Liu, Han Zou, Pei Zhang, Costas J. Spanos, Lin Zhang

Research Collection School Of Computing and Information Systems

Inferring abnormal glucose events such as hyperglycemia and hypoglycemia is crucial for the health of both diabetic patients and non-diabetic people. However, regular blood glucose monitoring can be invasive and inconvenient in everyday life. We present SugarMate, a first smartphone-based blood glucose inference system as a temporary alternative to continuous blood glucose monitors (CGM) when they are uncomfortable or inconvenient to wear. In addition to the records of food, drug and insulin intake, it leverages smartphone sensors to measure physical activities and sleep quality automatically. Provided with the imbalanced and often limited measurements, a challenge of SugarMate is the inference …


Employing Smartwatch For Enhanced Password Authentication, Bing Chang, Ximing Liu, Yingjiu Li, Pingjian Wang, Wen-Tao Zhu, Zhan Wang Jun 2017

Employing Smartwatch For Enhanced Password Authentication, Bing Chang, Ximing Liu, Yingjiu Li, Pingjian Wang, Wen-Tao Zhu, Zhan Wang

Research Collection School Of Computing and Information Systems

This paper presents an enhanced password authentication scheme by systematically exploiting the motion sensors in a smartwatch. We extract unique features from the sensor data when a smartwatch bearer types his/her password (or PIN), and train certain machine learning classifiers using these features. We then implement smartwatch-aided password authentication using the classifiers. Our scheme is user-friendly since it does not require users to perform any additional actions when typing passwords or PINs other than wearing smartwatches. We conduct a user study involving 51 participants on the developed prototype so as to evaluate its feasibility and performance. Experimental results show that …


Collective Personalized Change Classification With Multiobjective Search, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang Dec 2016

Collective Personalized Change Classification With Multiobjective Search, Xin Xia, David Lo, Xinyu Wang, Xiaohu Yang

Research Collection School Of Computing and Information Systems

Many change classification techniques have been proposed to identify defect-prone changes. These techniques consider all developers' historical change data to build a global prediction model. In practice, since developers have their own coding preferences and behavioral patterns, which causes different defect patterns, a separate change classification model for each developer can help to improve performance. Jiang, Tan, and Kim refer to this problem as personalized change classification, and they propose PCC+ to solve this problem. A software project has a number of developers; for a developer, building a prediction model not only based on his/her change data, but also on …


Machine Learning In Wireless Sensor Networks: Algorithms, Strategies, And Applications, Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, Hwee-Pink Tan Apr 2014

Machine Learning In Wireless Sensor Networks: Algorithms, Strategies, And Applications, Mohammad Abu Alsheikh, Shaowei Lin, Dusit Niyato, Hwee-Pink Tan

Research Collection School Of Computing and Information Systems

Wireless sensor networks (WSNs) monitor dynamic environments that change rapidly over time. This dynamic behavior is either caused by external factors or initiated by the system designers themselves. To adapt to such conditions, sensor networks often adopt machine learning techniques to eliminate the need for unnecessary redesign. Machine learning also inspires many practical solutions that maximize resource utilization and prolong the lifespan of the network. In this paper, we present an extensive literature review over the period 2002-2013 of machine learning methods that were used to address common issues in WSNs. The advantages and disadvantages of each proposed algorithm are …


Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao Nov 2011

Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Software process evaluation is essential to improve software development and the quality of software products in an organization. Conventional approaches based on manual qualitative evaluations (e.g., artifacts inspection) are deficient in the sense that (i) they are time-consuming, (ii) they suffer from the authority constraints, and (iii) they are often subjective. To overcome these limitations, this paper presents a novel semi-automated approach to software process evaluation using machine learning techniques. In particular, we formulate the problem as a sequence classification task, which is solved by applying machine learning algorithms. Based on the framework, we define a new quantitative indicator to …


Context-Aware Statistical Debugging: From Bug Predictors To Faulty Control Flow Paths, Lingxiao Jiang, Zhendong Su Nov 2007

Context-Aware Statistical Debugging: From Bug Predictors To Faulty Control Flow Paths, Lingxiao Jiang, Zhendong Su

Research Collection School Of Computing and Information Systems

Effective bug localization is important for realizing automated debugging. One attractive approach is to apply statistical techniques on a collection of evaluation profiles of program properties to help localize bugs. Previous research has proposed various specialized techniques to isolate certain program predicates as bug predictors. However, because many bugs may not be directly associated with these predicates, these techniques are often ineffective in localizing bugs. Relevant control flow paths that may contain bug locations are more informative than stand-alone predicates for discovering and understanding bugs. In this paper, we propose an approach to automatically generate such faulty control flow paths …


On Machine Learning Methods For Chinese Document Classification, Ji He, Ah-Hwee Tan, Chew-Lim Tan May 2003

On Machine Learning Methods For Chinese Document Classification, Ji He, Ah-Hwee Tan, Chew-Lim Tan

Research Collection School Of Computing and Information Systems

This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly …