Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

Research Collection School Of Computing and Information Systems

Series

Classification

Articles 1 - 24 of 24

Full-Text Articles in Physical Sciences and Mathematics

Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo Mar 2022

Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Programming videos on the Internet are valuable resources for learning programming skills. To find relevant videos, developers typically search online video platforms (e.g., YouTube) with keywords on topics they wish to learn. Developers often look for live-coding screencasts, in which the videos’ authors perform live coding. Yet, not all programming videos are livecoding screencasts. In this work, we develop a tool named PSFinder to identify live-coding screencasts. PSFinder leverages a classifier to identify whether a video frame contains an IDE window. It uses a sampling strategy to pick a number of frames from an input video, runs the classifer on …


Can We Classify Cashless Payment Solution Implementations At The Country Level?, Dennis Ng, Robert J. Kauffman, Paul Robert Griffin Mar 2021

Can We Classify Cashless Payment Solution Implementations At The Country Level?, Dennis Ng, Robert J. Kauffman, Paul Robert Griffin

Research Collection School Of Computing and Information Systems

This research commentary proposes a 3-D implementation classification framework to assist service providers and business leaders in understanding the kinds of contexts in which more or less successful cashless payment solutions are observed at point-of-sale (PoS) settings. Three constructs characterize the framework: the digitalization of the local implementation environment; the relative novelty of a given payment technology solution in a country at a specific point in time; and the development status of the country’s national infrastructure. The framework is motivated by a need to support cross-country research in this domain. We analyze eight country mini-cases based on an eight-facet (2 …


A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi Aug 2020

A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online …


Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin Nov 2018

Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin

Research Collection School Of Computing and Information Systems

Education institutions collect feedback from students upon course completion and analyse it to improve curriculum design, delivery methodology and students' learning experience. A large part of feedback comes in the form textual comments, which pose a challenge in quantifying and deriving insights. In this paper, we present a novel approach of the Latent Dirichlet Allocation (LDA) model to address this difficulty in handling textual student feedback. The analysis of quantitative part of student feedback provides generalratings and helps to identify aspects of the teaching that are successful and those that can improve. The reasons for the failure or success, however, …


Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi Feb 2018

Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, …


Crowdsensing And Analyzing Micro-Event Tweets For Public Transportation Insights, Thoong Hoang, Pei Hua (Xu Peihua) Cher, Philips Kokoh Prasetyo, Ee-Peng Lim Feb 2017

Crowdsensing And Analyzing Micro-Event Tweets For Public Transportation Insights, Thoong Hoang, Pei Hua (Xu Peihua) Cher, Philips Kokoh Prasetyo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Efficient and commuter friendly public transportation system is a critical part of a thriving and sustainable city. As cities experience fast growing resident population, their public transportation systems will have to cope with more demands for improvements. In this paper, we propose a crowdsensing and analysis framework to gather and analyze realtime commuter feedback from Twitter. We perform a series of text mining tasks identifying those feedback comments capturing bus related micro-events; extracting relevant entities; and, predicting event and sentiment labels. We conduct a series of experiments involving more than 14K labeled tweets. The experiments show that incorporating domain knowledge …


On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim Nov 2016

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling …


Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain Aug 2014

Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is …


Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin Mar 2014

Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in …


Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi Oct 2013

Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous …


Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi Jul 2013

Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Multiple kernel learning (MKL) is a promising family of machine learning algorithms using multiple kernel functions for various challenging data mining tasks. Conventional MKL methods often formulate the problem as an optimization task of learning the optimal combinations of both kernels and classifiers, which usually results in some forms of challenging optimization tasks that are often difficult to be solved. Different from the existing MKL methods, in this paper, we investigate a boosting framework of MKL for classification tasks, i.e., we adopt boosting to solve a variant of MKL problem, which avoids solving the complicated optimization tasks. Specifically, we present …


Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Feb 2013

Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang

Research Collection School Of Computing and Information Systems

Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and …


Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi Dec 2012

Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Both cost-sensitive classification and online learning have been extensively studied in data mining and machine learning communities, respectively. However, very limited study addresses an important intersecting problem, that is, “Cost-Sensitive Online Classification". In this paper, we formally study this problem, and propose a new framework for Cost-Sensitive Online Classification by directly optimizing cost-sensitive measures using online gradient descent techniques. Specifically, we propose two novel cost-sensitive online classification algorithms, which are designed to directly optimize two well-known cost-sensitive measures: (i) maximization of weighted sum of sensitivity and specificity, and (ii) minimization of weighted misclassification cost. We analyze the theoretical bounds of …


Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin Aug 2012

Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin

Research Collection School Of Computing and Information Systems

Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. …


Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain Oct 2011

Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last …


Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin May 2011

Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. …


A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan May 2009

A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for …


Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu Dec 2006

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …


A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim Nov 2006

A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …


Learning The Unified Kernel Machines For Classification, Steven C. H. Hoi, Michael R. Lyu, Edward Y. Chang Aug 2006

Learning The Unified Kernel Machines For Classification, Steven C. H. Hoi, Michael R. Lyu, Edward Y. Chang

Research Collection School Of Computing and Information Systems

Kernel machines have been shown as the state-of-the-art learning techniques for classification. In this paper, we propose a novel general framework of learning the Unified Kernel Machines (UKM) from both labeled and unlabeled data. Our proposed framework integrates supervised learning, semi-supervised kernel learning, and active learning in a unified solution. In the suggested framework, we particularly focus our attention on designing a new semi-supervised kernel learning method, i.e., Spectral Kernel Learning (SKL), which is built on the principles of kernel target alignment and unsupervised kernel design. Our algorithm is related to an equivalent quadratic programming problem that can be efficiently …


Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan Apr 2006

Fisa: Feature-Based Instance Selection For Imbalanced Text Classification, Aixin Sun, Ee Peng Lim, Boualem Benatallah, Mahbub Hassan

Research Collection School Of Computing and Information Systems

Support Vector Machines (SVM) classifiers are widely used in text classification tasks and these tasks often involve imbalanced training. In this paper, we specifically address the cases where negative training documents significantly outnumber the positive ones. A generic algorithm known as FISA (Feature-based Instance Selection Algorithm), is proposed to select only a subset of negative training documents for training a SVM classifier. With a smaller carefully selected training set, a SVM classifier can be more efficiently trained while delivering comparable or better classification accuracy. In our experiments on the 20-Newsgroups dataset, using only 35% negative training examples and 60% learning …


Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa Dec 2005

Webarc: Website Archival Using A Structured Approach, Ee Peng Lim, Maria Marissa

Research Collection School Of Computing and Information Systems

Website archival refers to the task of monitoring and storing snapshots of website(s) for future retrieval and analysis. This task is particularly important for websites that have content changing over time with older information constantly overwritten by newer one. In this paper, we propose WEBARC as a set of software tools to allow users to construct a logical structure for a website to be archived. Classifiers are trained to. determine relevant web pages and their categories, and subsequently used in website downloading. The archival schedule can be specified and executed by a scheduler. A website viewer is also developed to …


Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava Oct 2004

Blocking Reduction Strategies In Hierarchical Text Classification, Ee Peng Lim, Aixin Sun, Wee-Keong Ng, Jaideep Srivastava

Research Collection School Of Computing and Information Systems

One common approach in hierarchical text classification involves associating classifiers with nodes in the category tree and classifying text documents in a top-down manner. Classification methods using this top-down approach can scale well and cope with changes to the category trees. However, all these methods suffer from blocking which refers to documents wrongly rejected by the classifiers at higher-levels and cannot be passed to the classifiers at lower-levels. We propose a classifier-centric performance measure known as blocking factor to determine the extent of the blocking. Three methods are proposed to address the blocking problem, namely, threshold reduction, restricted voting, and …


Robust Classification Of Event-Related Potential For Brain-Computer Interface, Manoj Thulasidas Sep 2004

Robust Classification Of Event-Related Potential For Brain-Computer Interface, Manoj Thulasidas

Research Collection School Of Computing and Information Systems

We report the implementation of a text input application (speller) based on the P300 event related potential. We obtain high accuracies by using an SVM classifier and a novel feature. These techniques enable us to maintain fast performance without sacrificing the accuracy, thus making the speller usable in an online mode. In order to further improve the usability, we perform various studies on the data with a view to minimizing the training time required. We present data collected from nine healthy subjects, along with the high accuracies (of the order of 95% or more) measured online. We show that the …