Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Series

Machine learning

Institution
Publication Year
Publication

Articles 31 - 60 of 60

Full-Text Articles in Physical Sciences and Mathematics

Doppler Radar-Based Non-Contact Health Monitoring For Obstructive Sleep Apnea Diagnosis: A Comprehensive Review, Vinh Phuc Tran, Adel Ali Al-Jumaily, Syed Mohammed Shamsul Islam Jan 2019

Doppler Radar-Based Non-Contact Health Monitoring For Obstructive Sleep Apnea Diagnosis: A Comprehensive Review, Vinh Phuc Tran, Adel Ali Al-Jumaily, Syed Mohammed Shamsul Islam

Research outputs 2014 to 2021

Today’s rapid growth of elderly populations and aging problems coupled with the prevalence of obstructive sleep apnea (OSA) and other health related issues have affected many aspects of society. This has led to high demands for a more robust healthcare monitoring, diagnosing and treatments facilities. In particular to Sleep Medicine, sleep has a key role to play in both physical and mental health. The quality and duration of sleep have a direct and significant impact on people’s learning, memory, metabolism, weight, safety, mood, cardio-vascular health, diseases, and immune system function. The gold-standard for OSA diagnosis is the overnight sleep monitoring …


Automated Trading Systems Statistical And Machine Learning Methods And Hardware Implementation: A Survey, Boming Huang, Yuziang Huan, Li Da Xu, Lirong Zheng, Zhuo Zou Jan 2019

Automated Trading Systems Statistical And Machine Learning Methods And Hardware Implementation: A Survey, Boming Huang, Yuziang Huan, Li Da Xu, Lirong Zheng, Zhuo Zou

Information Technology & Decision Sciences Faculty Publications

Automated trading, which is also known as algorithmic trading, is a method of using a predesigned computer program to submit a large number of trading orders to an exchange. It is substantially a real-time decision-making system which is under the scope of Enterprise Information System (EIS). With the rapid development of telecommunication and computer technology, the mechanisms underlying automated trading systems have become increasingly diversified. Considerable effort has been exerted by both academia and trading firms towards mining potential factors that may generate significantly higher profits. In this paper, we review studies on trading systems built using various methods and …


Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher May 2018

Evaluating Sequence Discovery Systems In An Abstraction-Aware Manner, Eoin Rogers, Robert J. Ross, John D. Kelleher

Conference papers

Activity discovery is a challenging machine learning problem where we seek to uncover new or altered behavioural patterns in sensor data. In this paper we motivate and introduce a novel approach to evaluating activity discovery systems. Pre-annotated ground truths, often used to evaluate the performance of such systems on existing datasets, may exist at different levels of abstraction to the output of the output produced by the system. We propose a method for detecting and dealing with this situation, allowing for useful ground truth comparisons. This work has applications for activity discovery, and also for related fields. For example, it …


Anatomy Of Online Hate: Developing A Taxonomy And Machine Learning Models For Identifying And Classifying Hate In Online News Media, Joni Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyu Jung, Haewoon Kwak, Haewoon Kwak, Bernard J. Jansen Jan 2018

Anatomy Of Online Hate: Developing A Taxonomy And Machine Learning Models For Identifying And Classifying Hate In Online News Media, Joni Salminen, Hind Almerekhi, Milica Milenkovic, Soon-Gyu Jung, Haewoon Kwak, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

Online social media platforms generally attempt to mitigate hateful expressions, as these comments can be detrimental to the health of the community. However, automatically identifying hateful comments can be challenging. We manually label 5,143 hateful expressions posted to YouTube and Facebook videos among a dataset of 137,098 comments from an online news media. We then create a granular taxonomy of different types and targets of online hate and train machine learning models to automatically detect and classify the hateful comments in the full dataset. Our contribution is twofold: 1) creating a granular taxonomy for hateful online comments that includes both …


Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer Oct 2017

Vungle Inc. Improves Monetization Using Big-Data Analytics, Bert De Reyck, Ioannis Fragkos, Yael Gruksha-Cockayne, Casey Lichtendahl, Hammond Guerin, Andre Kritzer

Research Collection Lee Kong Chian School Of Business

The advent of big data has created opportunities for firms to customize their products and services to unprecedented levels of granularity. Using big data to personalize an offering in real time, however, remains a major challenge. In the mobile advertising industry, once a customer enters the network, an ad-serving decision must be made in a matter of milliseconds. In this work, we describe the design and implementation of an ad-serving algorithm that incorporates machine-learning methods to make personalized ad-serving decisions within milliseconds. We developed this algorithm for Vungle Inc., one of the largest global mobile ad networks. Our approach also …


Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van Jul 2017

Deep Learning On Lie Groups For Skeleton-Based Action Recognition, Zhiwu Huang, C. Wan, T. Probst, Gool L. Van

Research Collection School Of Computing and Information Systems

In recent years, skeleton-based action recognition has become a popular 3D classification problem. State-of-the-art methods typically first represent each motion sequence as a high-dimensional trajectory on a Lie group with an additional dynamic time warping, and then shallowly learn favorable Lie group features. In this paper we incorporate the Lie group structure into a deep network architecture to learn more appropriate Lie group features for 3D action recognition. Within the network structure, we design rotation mapping layers to transform the input Lie group features into desirable ones, which are aligned better in the temporal domain. To reduce the high feature …


Soal: Second-Order Online Active Learning, Shuji Hao, Peilin Zhao, Jing Lu, Steven C. H. Hoi, Chunyan Miao, Chi Zhang Feb 2017

Soal: Second-Order Online Active Learning, Shuji Hao, Peilin Zhao, Jing Lu, Steven C. H. Hoi, Chunyan Miao, Chi Zhang

Research Collection School Of Computing and Information Systems

This paper investigates the problem of online active learning for training classification models from sequentially arriving data. This is more challenging than conventional online learning tasks since the learner not only needs to figure out how to effectively update the classifier but also needs to decide when is the best time to query the label of an incoming instance given limited label budget. The existing online active learning approaches are often based on first-order online learning methods which generally fall short in slow convergence rate and suboptimal exploitation of available information when querying the labeled data. To overcome the limitations, …


A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth Jan 2017

A Novel Approach For Classifying Gene Expression Data Using Topic Modeling, Soon Jye Kho, Himi Yalamanchili, Michael L. Raymer, Amit Sheth

Kno.e.sis Publications

Understanding the role of differential gene expression in cancer etiology and cellular process is a complex problem that continues to pose a challenge due to sheer number of genes and inter-related biological processes involved. In this paper, we employ an unsupervised topic model, Latent Dirichlet Allocation (LDA) to mitigate overfitting of high-dimensionality gene expression data and to facilitate understanding of the associated pathways. LDA has been recently applied for clustering and exploring genomic data but not for classification and prediction. Here, we proposed to use LDA inclustering as well as in classification of cancer and healthy tissues using lung cancer …


Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo Nov 2016

Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo

Research Collection School Of Computing and Information Systems

Social media provides a convenient way for customers to express their feedback to companies. Identifying different types of customers based on their feedback behavior can help companies to maintain their customers. In this paper, we use a machine learning approach to predict a customer’s feedback behavior based on her first feedback tweet. First, we identify a few categories of customers based on their feedback frequency and the sentiment of the feedback. We identify three main categories: spiteful, one-off, and kind. Next, we build a model to predict the category of a customer given her first feedback. We use profile and …


Soft Confidence-Weighted Learning, Jialei Wang, Peilin Zhao, Hoi, Steven C. H. Sep 2016

Soft Confidence-Weighted Learning, Jialei Wang, Peilin Zhao, Hoi, Steven C. H.

Research Collection School Of Computing and Information Systems

Online learning plays an important role in many big datamining problems because of its high efficiency and scalability. In theliterature, many online learning algorithms using gradient information havebeen applied to solve online classification problems. Recently, more effectivesecond-order algorithms have been proposed, where the correlation between thefeatures is utilized to improve the learning efficiency. Among them,Confidence-Weighted (CW) learning algorithms are very effective, which assumethat the classification model is drawn from a Gaussian distribution, whichenables the model to be effectively updated with the second-order informationof the data stream. Despite being studied actively, these CW algorithms cannothandle nonseparable datasets and noisy datasets very …


Real Time Activity Recognition Of Treadmill Usage Via Machine Learning, Nathan Blank, Matt Buckner, Christian Owen, Anna Scott Aug 2016

Real Time Activity Recognition Of Treadmill Usage Via Machine Learning, Nathan Blank, Matt Buckner, Christian Owen, Anna Scott

Rose-Hulman Undergraduate Research Publications

Our objective is to provide real-time classification of treadmill usage patterns based on accelerometer and magnetometer measurements. We collected data from treadmills in the Rose-Hulman Student Recreation Center (SRC) using Shimmer3 sensor units. We identified useful data features and classifiers for predicting treadmill usage patterns. We also prototyped a proof of concept wireless, real-time classification system.


Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee Jul 2016

Where Is The Goldmine? Finding Promising Business Locations Through Facebook Data Analytics, Jovian Lin, Richard Oentaryo, Ee-Peng Lim, Casey Vu, Adrian Vu, Agus Kwee

Research Collection School Of Computing and Information Systems

If you were to open your own cafe, would you not want to effortlessly identify the most suitable location to set up your shop? Choosing an optimal physical location is a critical decision for numerous businesses, as many factors contribute to the final choice of the location. In this paper, we seek to address the issue by investigating the use of publicly available Facebook Pages data-which include user "check-ins", types of business, and business locations-to evaluate a user-selected physical location with respect to a type of business. Using a dataset of 20,877 food businesses in Singapore, we conduct analysis of …


A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu Jan 2016

A Comparison Of Fundamental Network Formation Principles Between Offline And Online Friends On Twitter, Felicia Natali, Feida Zhu

Research Collection School Of Computing and Information Systems

We investigate the differences between how some of the fundamental principles of network formation apply among offline friends and how they apply among online friends on Twitter. We consider three fundamental principles of network formation proposed by Schaefer et al.: reciprocity, popularity, and triadic closure. Overall, we discover that these principles mainly apply to offline friends on Twitter. Based on how these principles apply to offline versus online friends, we formulate rules to predict offline friendship on Twitter. We compare our algorithm with popular machine learning algorithms and Xiewei’s random walk algorithm. Our algorithm beats the machine learning algorithms on …


Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany Jan 2016

Harnessing The Power Of Text Mining For The Detection Of Abusive Content In Social Media, Hao Chen, Susan Mckeever, Sarah Jane Delany

Conference papers

Abstract The issues of cyberbullying and online harassment have gained considerable coverage in the last number of years. Social media providers need to be able to detect abusive content both accurately and efficiently in order to protect their users. Our aim is to investigate the application of core text mining techniques for the automatic detection of abusive content across a range of social media sources include blogs, forums, media-sharing, Q&A and chat - using datasets from Twitter, YouTube, MySpace, Kongregate, Formspring and Slashdot. Using supervised machine learning, we compare alternative text representations and dimension reduction approaches, including feature selection and …


Learning Relative Similarity From Data Streams: Active Online Learning Approaches, Shuji Hao, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao Oct 2015

Learning Relative Similarity From Data Streams: Active Online Learning Approaches, Shuji Hao, Peilin Zhao, Steven C. H. Hoi, Chunyan Miao

Research Collection School Of Computing and Information Systems

Relative similarity learning, as an important learning scheme for information retrieval, aims to learn a bi-linear similarity function from a collection of labeled instance-pairs, and the learned function would assign a high similarity value for a similar instance-pair and a low value for a dissimilar pair. Existing algorithms usually assume the labels of all the pairs in data streams are always made available for learning. However, this is not always realistic in practice since the number of possible pairs is quadratic to the number of instances in the database, and manually labeling the pairs could be very costly and time …


Reliable Patch Trackers: Robust Visual Tracking By Exploiting Reliable Patches, Yang Li, Jianke Zhu, Steven C. H. Hoi Jun 2015

Reliable Patch Trackers: Robust Visual Tracking By Exploiting Reliable Patches, Yang Li, Jianke Zhu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Most modern trackers typically employ a bounding box given in the first frame to track visual objects, where their tracking results are often sensitive to the initialization. In this paper, we propose a new tracking method, Reliable Patch Trackers (RPT), which attempts to identify and exploit the reliable patches that can be tracked effectively through the whole tracking process. Specifically, we present a tracking reliability metric to measure how reliably a patch can be tracked, where a probability model is proposed to estimate the distribution of reliable patches under a sequential Monte Carlo framework. As the reliable patches distributed over …


Use Of A High-Value Social Audience Index For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond. Chiong Feb 2015

Use Of A High-Value Social Audience Index For Target Audience Identification On Twitter, Siaw Ling Lo, David Cornforth, Raymond. Chiong

Research Collection School Of Computing and Information Systems

With the large and growing user base of social media, it is not an easy feat to identify potential customers for business. This is mainly due to the challenge of extracting commercially viable contents from the vast amount of free-form conversations. In this paper, we analyse the Twitter content of an account owner and its list of followers through various text mining methods and segment the list of followers via an index. We have termed this index as the High-Value Social Audience (HVSA) index. This HVSA index enables a company or organisation to devise their marketing and engagement plan according …


Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain Aug 2014

Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is …


Retrieval-Based Face Annotation By Weak Label Regularized Local Coordinate Coding, Dayong Wang, Steven C. H. Hoi, Ying He, Jianke Zhu, Mei Tao, Jiebo Luo Mar 2014

Retrieval-Based Face Annotation By Weak Label Regularized Local Coordinate Coding, Dayong Wang, Steven C. H. Hoi, Ying He, Jianke Zhu, Mei Tao, Jiebo Luo

Research Collection School Of Computing and Information Systems

Auto face annotation, which aims to detect human faces from a facial image and assign them proper human names, is a fundamental research problem and beneficial to many real-world applications. In this work, we address this problem by investigating a retrieval-based annotation scheme of mining massive web facial images that are freely available over the Internet. In particular, given a facial image, we first retrieve the top n similar instances from a large-scale web facial image database using content-based image retrieval techniques, and then use their labels for auto annotation. Such a scheme has two major challenges: 1) how to …


Online Portfolio Selection: A Survey, Bin Li, Steven C. H. Hoi Jan 2014

Online Portfolio Selection: A Survey, Bin Li, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Online portfolio selection is a fundamental problem in computational finance, which has been extensively studied across several research communities, including finance, statistics, artificial intelligence, machine learning, and data mining. This article aims to provide a comprehensive survey and a structural understanding of online portfolio selection techniques published in the literature. From an online machine learning perspective, we first formulate online portfolio selection as a sequential decision problem, and then we survey a variety of state-of-the-art approaches, which are grouped into several major categories, including benchmarks, Follow-the-Winner approaches, Follow-the-Loser approaches, Pattern-Matching--based approaches, and Meta-Learning Algorithms. In addition to the problem formulation …


Mining Weakly Labeled Web Facial Images For Search-Based Face Annotation, Dayong Wang, Steven C. H. Hoi, Ying He, Jianke Zhu Jan 2014

Mining Weakly Labeled Web Facial Images For Search-Based Face Annotation, Dayong Wang, Steven C. H. Hoi, Ying He, Jianke Zhu

Research Collection School Of Computing and Information Systems

This paper investigates a framework of search-based face annotation (SBFA) by mining weakly labeled facial images that are freely available on the World Wide Web (WWW). One challenging problem for search-based face annotation scheme is how to effectively perform annotation by exploiting the list of most similar facial images and their weak labels that are often noisy and incomplete. To tackle this problem, we propose an effective unsupervised label refinement (ULR) approach for refining the labels of web facial images using machine learning techniques. We formulate the learning problem as a convex optimization and develop effective optimization algorithms to solve …


Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao Nov 2011

Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Software process evaluation is essential to improve software development and the quality of software products in an organization. Conventional approaches based on manual qualitative evaluations (e.g., artifacts inspection) are deficient in the sense that (i) they are time-consuming, (ii) they suffer from the authority constraints, and (iii) they are often subjective. To overcome these limitations, this paper presents a novel semi-automated approach to software process evaluation using machine learning techniques. In particular, we formulate the problem as a sequence classification task, which is solved by applying machine learning algorithms. Based on the framework, we define a new quantitative indicator to …


Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin Oct 2011

Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

An effective relevance feedback solution plays a key role in interactive intelligent 3D object retrieval systems. In this work, we investigate the relevance feedback problem for interactive intelligent 3D object retrieval, with the focus on studying effective machine learning algorithms for improving the user's interaction in the retrieval task. One of the key challenges is to learn appropriate kernel similarity measure between 3D objects through the relevance feedback interaction with users. We address this challenge by presenting a novel framework of Active multiple kernel learning (AMKL), which exploits multiple kernel learning techniques for relevance feedback in interactive 3D object retrieval. …


Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin Oct 2011

Active Multiple Kernel Learning For Interactive 3d Object Retrieval Systems, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

An effective relevance feedback solution plays a key role in interactive intelligent 3D object retrieval systems. In this work, we investigate the relevance feedback problem for interactive intelligent 3D object retrieval, with the focus on studying effective machine learning algorithms for improving the user's interaction in the retrieval task. One of the key challenges is to learn appropriate kernel similarity measure between 3D objects through the relevance feedback interaction with users. We address this challenge by presenting a novel framework of Active multiple kernel learning (AMKL), which exploits multiple kernel learning techniques for relevance feedback in interactive 3D object retrieval. …


Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard May 2011

Empirical Methods For Predicting Student Retention- A Summary From The Literature, Matt Bogard

Economics Faculty Publications

The vast majority of the literature related to the empirical estimation of retention models includes a discussion of the theoretical retention framework established by Bean, Braxton, Tinto, Pascarella, Terenzini and others (see Bean, 1980; Bean, 2000; Braxton, 2000; Braxton et al, 2004; Chapman and Pascarella, 1983; Pascarell and Ternzini, 1978; St. John and Cabrera, 2000; Tinto, 1975) This body of research provides a starting point for the consideration of which explanatory variables to include in any model specification, as well as identifying possible data sources. The literature separates itself into two major camps including research related to the hypothesis testing …


Intentional Learning Agent Architecture, Budhitama Subagdja, Liz Sonenberg, Iyad Rahwan Jun 2009

Intentional Learning Agent Architecture, Budhitama Subagdja, Liz Sonenberg, Iyad Rahwan

Research Collection School Of Computing and Information Systems

Dealing with changing situations is a major issue in building agent systems. When the time is limited, knowledge is unreliable, and resources are scarce, the issue becomes more challenging. The BDI (Belief-Desire-Intention) agent architecture provides a model for building agents that addresses that issue. The model can be used to build intentional agents that are able to reason based on explicit mental attitudes, while behaving reactively in changing circumstances. However, despite the reactive and deliberative features, a classical BDI agent is not capable of learning. Plans as recipes that guide the activities of the agent are assumed to be static. …


Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan May 2007

Learning To Classify E-Mail, Irena Koprinska, Josiah Poon, James Clark, Jason Yuk Hin Chan

Research Collection School Of Computing and Information Systems

In this paper we study supervised and semi-supervised classification of e-mails. We consider two tasks: filing e-mails into folders and spam e-mail filtering. Firstly, in a supervised learning setting, we investigate the use of random forest for automatic e-mail filing into folders and spam e-mail filtering. We show that random forest is a good choice for these tasks as it runs fast on large and high dimensional databases, is easy to tune and is highly accurate, outperforming popular algorithms such as decision trees, support vector machines and naive Bayes. We introduce a new accurate feature selector with linear time complexity. …


Dynamically Optimized Context In Recommender Systems, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang May 2005

Dynamically Optimized Context In Recommender Systems, Ghim-Eng Yap, Ah-Hwee Tan, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

Traditional approaches to recommender systems have not taken into account situational information when making recommendations, and this seriously limits the relevance of the results. This paper advocates context-awareness as a promising approach to enhance the performance of recommenders, and introduces a mechanism to realize this approach. We present a framework that separates the contextual concerns from the actual recommendation module, so that contexts can be readily shared across applications. More importantly, we devise a learning algorithm to dynamically identify the optimal set of contexts for a specific recommendation task and user. An extensive series of experiments has validated that our …


Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch Jan 2004

Using Symbolic Knowledge In The Umls To Disambiguate Words In Small Datasets With A Naive Bayes Classifier, Gondy Leroy, Thomas C. Rindflesch

CGU Faculty Publications and Research

Current approaches to word sense disambiguation use and combine various machine-learning techniques. Most refer to characteristics of the ambiguous word and surrounding words and are based on hundreds of examples. Unfortunately, developing large training sets is time-consuming. We investigate the use of symbolic knowledge to augment machine-learning techniques for small datasets. UMLS semantic types assigned to concepts found in the sentence and relationships between these semantic types form the knowledge base. A naïve Bayes classifier was trained for 15 words with 100 examples for each. The most frequent sense of a word served as the baseline. The effect of increasingly …


On Machine Learning Methods For Chinese Document Classification, Ji He, Ah-Hwee Tan, Chew-Lim Tan May 2003

On Machine Learning Methods For Chinese Document Classification, Ji He, Ah-Hwee Tan, Chew-Lim Tan

Research Collection School Of Computing and Information Systems

This paper reports our comparative evaluation of three machine learning methods, namely k Nearest Neighbor (kNN), Support Vector Machines (SVM), and Adaptive Resonance Associative Map (ARAM) for Chinese document categorization. Based on two Chinese corpora, a series of controlled experiments evaluated their learning capabilities and efficiency in mining text classification knowledge. Benchmark experiments showed that their predictive performance were roughly comparable, especially on clean and well organized data sets. While kNN and ARAM yield better performances than SVM on small and clean data sets, SVM and ARAM significantly outperformed kNN on noisy data. Comparing efficiency, kNN was notably more costly …