Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 37

Full-Text Articles in Physical Sciences and Mathematics

Fine-Grained In-Context Permission Classification For Android Apps Using Control-Flow Graph Embedding, Vikas Kumar Malviya, Naing Tun Yan, Chee Wei Leow, Ailys Xynyn Tee, Lwin Khin Shar, Lingxiao Jiang Sep 2023

Fine-Grained In-Context Permission Classification For Android Apps Using Control-Flow Graph Embedding, Vikas Kumar Malviya, Naing Tun Yan, Chee Wei Leow, Ailys Xynyn Tee, Lwin Khin Shar, Lingxiao Jiang

Research Collection School Of Computing and Information Systems

Android is the most popular operating system for mobile devices nowadays. Permissions are a very important part of Android security architecture. Apps frequently need the users’ permission, but many of them only ask for it once—when the user uses the app for the first time—and then they keep and abuse the given permissions. Longing to enhance Android permission security and users’ private data protection is the driving factor behind our approach to explore fine-grained contextsensitive permission usage analysis and thereby identify misuses in Android apps. In this work, we propose an approach for classifying the fine-grained permission uses for each …


Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo Mar 2022

Efficient Search Of Live-Coding Screencasts From Online Videos, Chengran Yang, Ferdian Thung, David Lo

Research Collection School Of Computing and Information Systems

Programming videos on the Internet are valuable resources for learning programming skills. To find relevant videos, developers typically search online video platforms (e.g., YouTube) with keywords on topics they wish to learn. Developers often look for live-coding screencasts, in which the videos’ authors perform live coding. Yet, not all programming videos are livecoding screencasts. In this work, we develop a tool named PSFinder to identify live-coding screencasts. PSFinder leverages a classifier to identify whether a video frame contains an IDE window. It uses a sampling strategy to pick a number of frames from an input video, runs the classifer on …


Can We Classify Cashless Payment Solution Implementations At The Country Level?, Dennis Ng, Robert J. Kauffman, Paul Robert Griffin Mar 2021

Can We Classify Cashless Payment Solution Implementations At The Country Level?, Dennis Ng, Robert J. Kauffman, Paul Robert Griffin

Research Collection School Of Computing and Information Systems

This research commentary proposes a 3-D implementation classification framework to assist service providers and business leaders in understanding the kinds of contexts in which more or less successful cashless payment solutions are observed at point-of-sale (PoS) settings. Three constructs characterize the framework: the digitalization of the local implementation environment; the relative novelty of a given payment technology solution in a country at a specific point in time; and the development status of the country’s national infrastructure. The framework is motivated by a need to support cross-country research in this domain. We analyze eight country mini-cases based on an eight-facet (2 …


A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi Aug 2020

A Unified Framework For Sparse Online Learning, Peilin Zhao, Dayong Wong, Pengcheng Wu, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The amount of data in our society has been exploding in the era of big data. This article aims to address several open challenges in big data stream classification. Many existing studies in data mining literature follow the batch learning setting, which suffers from low efficiency and poor scalability. To tackle these challenges, we investigate a unified online learning framework for the big data stream classification task. Different from the existing online data stream classification techniques, we propose a unified Sparse Online Classification (SOC) framework. Based on SOC, we derive a second-order online learning algorithm and a cost-sensitive sparse online …


Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin Nov 2018

Latent Dirichlet Allocation For Textual Student Feedback Analysis, Swapna Gottipati, Venky Shankararaman, Jeff Lin

Research Collection School Of Computing and Information Systems

Education institutions collect feedback from students upon course completion and analyse it to improve curriculum design, delivery methodology and students' learning experience. A large part of feedback comes in the form textual comments, which pose a challenge in quantifying and deriving insights. In this paper, we present a novel approach of the Latent Dirichlet Allocation (LDA) model to address this difficulty in handling textual student feedback. The analysis of quantitative part of student feedback provides generalratings and helps to identify aspects of the teaching that are successful and those that can improve. The reasons for the failure or success, however, …


Categorizing The Content Of Github Readme Files, Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, David Lo Oct 2018

Categorizing The Content Of Github Readme Files, Gede Artha Azriadi Prana, Christoph Treude, Ferdian Thung, Thushari Atapattu, David Lo

Research Collection School Of Computing and Information Systems

README files play an essential role in shaping a developer’s first impression of a software repository and in documenting the software project that the repository hosts. Yet, we lack a systematic understanding of the content of a typical README file as well as tools that can process these files automatically. To close this gap, we conduct a qualitative study involving the manual annotation of 4,226 README file sections from 393 randomly sampled GitHub repositories and we design and evaluate a classifier and a set of features that can categorize these sections automatically. We find that information discussing the ‘What’ and …


Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi Feb 2018

Unified Locally Linear Classifiers With Diversity-Promoting Anchor Points, Chenghao Liu, Teng Zhang, Peilin Zhao, Jianling Sun, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Locally Linear Support Vector Machine (LLSVM) has been actively used in classification tasks due to its capability of classifying nonlinear patterns. However, existing LLSVM suffers from two drawbacks: (1) a particular and appropriate regularization for LLSVM has not yet been addressed; (2) it usually adopts a three-stage learning scheme composed of learning anchor points by clustering, learning local coding coordinates by a predefined coding scheme, and finally learning for training classifiers. We argue that this decoupled approaches oversimplifies the original optimization problem, resulting in a large deviation due to the disparate purpose of each step. To address the first issue, …


Automatic Loop-Invariant Generation And Refinement Through Selective Sampling, Jiaying Li, Jun Sun, Li Li, Quang Loc Le, Shang-Wei Lin Nov 2017

Automatic Loop-Invariant Generation And Refinement Through Selective Sampling, Jiaying Li, Jun Sun, Li Li, Quang Loc Le, Shang-Wei Lin

Research Collection School Of Computing and Information Systems

Automatic loop-invariant generation is important in program analysis and verification. In this paper, we propose to generate loop-invariants automatically through learning and verification. Given a Hoare triple of a program containing a loop, we start with randomly testing the program, collect program states at run-time and categorize them based on whether they satisfy the invariant to be discovered. Next, classification techniques are employed to generate a candidate loop-invariant automatically. Afterwards, we refine the candidate through selective sampling so as to overcome the lack of sufficient test cases. Only after a candidate invariant cannot be improved further through selective sampling, we …


Scalable Online Kernel Learning, Jing Lu Nov 2017

Scalable Online Kernel Learning, Jing Lu

Dissertations and Theses Collection (Open Access)

One critical deficiency of traditional online kernel learning methods is their increasing and unbounded number of support vectors (SV’s), making them inefficient and non-scalable for large-scale applications. Recent studies on budget online learning have attempted to overcome this shortcoming by bounding the number of SV’s. Despite being extensively studied, budget algorithms usually suffer from several drawbacks.
First of all, although existing algorithms attempt to bound the number of SV’s at each iteration, most of them fail to bound the number of SV’s for the final averaged classifier, which is commonly used for online-to-batch conversion. To solve this problem, we propose …


Crowdsensing And Analyzing Micro-Event Tweets For Public Transportation Insights, Thoong Hoang, Pei Hua (Xu Peihua) Cher, Philips Kokoh Prasetyo, Ee-Peng Lim Feb 2017

Crowdsensing And Analyzing Micro-Event Tweets For Public Transportation Insights, Thoong Hoang, Pei Hua (Xu Peihua) Cher, Philips Kokoh Prasetyo, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Efficient and commuter friendly public transportation system is a critical part of a thriving and sustainable city. As cities experience fast growing resident population, their public transportation systems will have to cope with more demands for improvements. In this paper, we propose a crowdsensing and analysis framework to gather and analyze realtime commuter feedback from Twitter. We perform a series of text mining tasks identifying those feedback comments capturing bus related micro-events; extracting relevant entities; and, predicting event and sentiment labels. We conduct a series of experiments involving more than 14K labeled tweets. The experiments show that incorporating domain knowledge …


On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim Nov 2016

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling …


Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Should I Follow This Fault Localization Tool's Output? Automated Prediction Of Fault Localization Effectiveness, Tien-Duy B. Le, David Lo, Ferdian Thung Oct 2015

Should I Follow This Fault Localization Tool's Output? Automated Prediction Of Fault Localization Effectiveness, Tien-Duy B. Le, David Lo, Ferdian Thung

Research Collection School Of Computing and Information Systems

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers …


Active Semi-Supervised Approach For Checking App Behavior Against Its Description, Ma Siqi, Shaowei Wang, David Lo, Deng, Robert H., Cong Sun Jul 2015

Active Semi-Supervised Approach For Checking App Behavior Against Its Description, Ma Siqi, Shaowei Wang, David Lo, Deng, Robert H., Cong Sun

Research Collection School Of Computing and Information Systems

Mobile applications are popular in recent years. They are often allowed to access and modify users' sensitive data. However, many mobile applications are malwares that inappropriately use these sensitive data. To detect these malwares, Gorla et al. Propose CHABADA which compares app behaviors against its descriptions. Data about known malwares are not used in their work, which limits its effectiveness. In this work, we extend the work by Gorla et al. By proposing an active and semi-supervised approach for detecting malwares. Different from CHABADA, our approach will make use of both known benign and malicious apps to predict other malicious …


Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain Aug 2014

Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is …


Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin Mar 2014

Online Feature Selection And Its Applications, Jialei Wang, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

Feature selection is an important technique for data mining. Despite its importance, most studies of feature selection are restricted to batch learning. Unlike traditional batch learning methods, online learning represents a promising family of efficient and scalable machine learning algorithms for large-scale applications. Most existing studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of online feature selection (OFS) in …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi Oct 2013

Predictive Handling Of Asynchronous Concept Drifts In Distributed Environments, Hock Hee Ang, Vivek Gopalkrishnan, Indre Zliobaite, Mykola Pechenizkiy, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

In a distributed computing environment, peers collaboratively learn to classify concepts of interest from each other. When external changes happen and their concepts drift, the peers should adapt to avoid increase in misclassification errors. The problem of adaptation becomes more difficult when the changes are asynchronous, i.e., when peers experience drifts at different times. We address this problem by developing an ensemble approach, PINE, that combines reactive adaptation via drift detection, and proactive handling of upcoming changes via early warning and adaptation across the peers. With empirical study on simulated and real-world data sets, we show that PINE handles asynchronous …


Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo Sep 2013

Will Fault Localization Work For These Failures? An Automated Approach To Predict Effectiveness Of Fault Localization Tools, Tien-Duy B. Le, David Lo

Research Collection School Of Computing and Information Systems

Debugging is a crucial yet expensive activity to improve the reliability of software systems. To reduce debugging cost, various fault localization tools have been proposed. A spectrum-based fault localization tool often outputs an ordered list of program elements sorted based on their likelihood to be the root cause of a set of failures (i.e., their suspiciousness scores). Despite the many studies on fault localization, unfortunately, however, for many bugs, the root causes are often low in the ordered list. This potentially causes developers to distrust fault localization tools. Recently, Parnin and Orso highlight in their user study that many debuggers …


An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu Sep 2013

An Investigation Of Decision Analytic Methodologies For Stress Identification, Yong Deng, Chao-Hsien Chu, Huayou Si, Qixun Zhang, Zhonghai Wu

Research Collection School Of Computing and Information Systems

In modern society, more and more people are suffering from some type of stress. Monitoring and timely detecting of stress level will be very valuable for the person to take counter measures. In this paper, we investigate the use of decision analytics methodologies to detect stress. We present a new feature selection method based on the principal component analysis (PCA), compare three feature selection methods, and evaluate five information fusion methods for stress detection. A driving stress data set created by the MIT Media lab is used to evaluate the relative performance of these methods. Our study show that the …


Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi Jul 2013

Mkboost: A Framework Of Multiple Kernel Boosting, Hao Xia, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Multiple kernel learning (MKL) is a promising family of machine learning algorithms using multiple kernel functions for various challenging data mining tasks. Conventional MKL methods often formulate the problem as an optimization task of learning the optimal combinations of both kernels and classifiers, which usually results in some forms of challenging optimization tasks that are often difficult to be solved. Different from the existing MKL methods, in this paper, we investigate a boosting framework of MKL for classification tasks, i.e., we adopt boosting to solve a variant of MKL problem, which avoids solving the complicated optimization tasks. Specifically, we present …


Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang Feb 2013

Online Multiple Kernel Classification, Steven C. H. Hoi, Rong Jin, Peilin Zhao, Tianbao Yang

Research Collection School Of Computing and Information Systems

Although both online learning and kernel learning have been studied extensively in machine learning, there is limited effort in addressing the intersecting research problems of these two important topics. As an attempt to fill the gap, we address a new research problem, termed Online Multiple Kernel Classification (OMKC), which learns a kernel-based prediction function by selecting a subset of predefined kernel functions in an online learning fashion. OMKC is in general more challenging than typical online learning because both the kernel classifiers and the subset of selected kernels are unknown, and more importantly the solutions to the kernel classifiers and …


Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi Dec 2012

Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Both cost-sensitive classification and online learning have been extensively studied in data mining and machine learning communities, respectively. However, very limited study addresses an important intersecting problem, that is, “Cost-Sensitive Online Classification". In this paper, we formally study this problem, and propose a new framework for Cost-Sensitive Online Classification by directly optimizing cost-sensitive measures using online gradient descent techniques. Specifically, we propose two novel cost-sensitive online classification algorithms, which are designed to directly optimize two well-known cost-sensitive measures: (i) maximization of weighted sum of sensitivity and specificity, and (ii) minimization of weighted misclassification cost. We analyze the theoretical bounds of …


Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin Aug 2012

Online Feature Selection For Mining Big Data, Steven C. H. Hoi, Jialei Wang, Peilin Zhao, Rong Jin

Research Collection School Of Computing and Information Systems

Most studies of online learning require accessing all the attributes/features of training instances. Such a classical setting is not always appropriate for real-world applications when data instances are of high dimensionality or the access to it is expensive to acquire the full set of attributes/features. To address this limitation, we investigate the problem of Online Feature Selection (OFS) in which the online learner is only allowed to maintain a classifier involved a small and fixed number of features. The key challenge of Online Feature Selection is how to make accurate prediction using a small and fixed number of active features. …


Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain Oct 2011

Collaborative Online Learning Of User Generated Content, Guangxia Li, Kuiyu Chang, Steven C. H. Hoi, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online classification of user generated content, with the goal of efficiently learning to categorize content generated by individual user. This problem is challenging due to several reasons. First, the huge amount of user generated content demands a highly efficient and scalable classification solution. Second, the categories are typically highly imbalanced, i.e., the number of samples from a particular useful class could be far and few between compared to some others (majority class). In some applications like spam detection, identification of the minority class often has significantly greater value than that of the majority class. Last …


Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin May 2011

Double Updating Online Learning, Peilin Zhao, Steven C. H. Hoi, Rong Jin

Research Collection School Of Computing and Information Systems

In most kernel based online learning algorithms, when an incoming instance is misclassified, it will be added into the pool of support vectors and assigned with a weight, which often remains unchanged during the rest of the learning process. This is clearly insufficient since when a new support vector is added, we generally expect the weights of the other existing support vectors to be updated in order to reflect the influence of the added support vector. In this paper, we propose a new online learning method, termed Double Updating Online Learning, or DUOL for short, that explicitly addresses this problem. …


Towards Google Challenge: Combining Contextual And Social Information For Web Video Categorization, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo Oct 2009

Towards Google Challenge: Combining Contextual And Social Information For Web Video Categorization, Xiao Wu, Wan-Lei Zhao, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Web video categorization is a fundamental task for web video search. In this paper, we explore the Google challenge from a new perspective by combing contextual and social information under the scenario of social web. The semantic meaning of text (title and tags), video relevance from related videos, and user interest induced from user videos, are integrated to robustly determine the video category. Experiments on YouTube videos demonstrate the effectiveness of the proposed solution. The performance reaches 60% improvement compared to the traditional text based classifiers.


A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan May 2009

A Novel Framework For Efficient Automated Singer Identification In Large Music Databases, Jialie Shen, John Shepherd, Bin Cui, Kian-Lee Tan

Research Collection School Of Computing and Information Systems

Over the past decade, there has been explosive growth in the availability of multimedia data, particularly image, video, and music. Because of this, content-based music retrieval has attracted attention from the multimedia database and information retrieval communities. Content-based music retrieval requires us to be able to automatically identify particular characteristics of music data. One such characteristic, useful in a range of applications, is the identification of the singer in a musical piece. Unfortunately, existing approaches to this problem suffer from either low accuracy or poor scalability. In this article, we propose a novel scheme, called Hybrid Singer Identifier (HSI), for …


Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu Dec 2006

Towards Effective Content-Based Music Retrieval With Multiple Acoustic Feature Combination, Jialie Shen, John Shepherd, Ann H. H. Ngu

Research Collection School Of Computing and Information Systems

In this paper, we present a new approach to constructing music descriptors to support efficient content-based music retrieval and classification. The system applies multiple musical properties combined with a hybrid architecture based on principal component analysis (PCA) and a multilayer perceptron neural network. This architecture enables straightforward incorporation of multiple musical feature vectors, based on properties such as timbral texture, pitch, and rhythm structure, into a single low-dimensioned vector that is more effective for classification than the larger individual feature vectors. The use of supervised training enables incorporation of human musical perception that further enhances the classification process. We compare …


A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim Nov 2006

A Model For Anticipatory Event Detection, Qi He, Kuiyu Chang, Ee Peng Lim

Research Collection School Of Computing and Information Systems

Event detection is a very important area of research that discovers new events reported in a stream of text documents. Previous research in event detection has largely focused on finding the first story and tracking the events of a specific topic. A topic is simply a set of related events defined by user supplied keywords with no associated semantics and little domain knowledge. We therefore introduce the Anticipatory Event Detection (AED) problem: given some user preferred event transition in a topic, detect the occurence of the transition for the stream of news covering the topic. We confine the events to …