Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 9 of 9

Full-Text Articles in Physical Sciences and Mathematics

Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North Dec 2014

Drip - Data Rich, Information Poor: A Concise Synopsis Of Data Mining, Muhammad Obeidat, Max North, Lloyd Burgess, Sarah North

Faculty and Research Publications

As production of data is exponentially growing with a drastically lower cost, the importance of data mining required to extract and discover valuable information is becoming more paramount. To be functional in any business or industry, data must be capable of supporting sound decision-making and plausible prediction. The purpose of this paper is concisely but broadly to provide a synopsis of the technology and theory of data mining, providing an enhanced comprehension of the methods by which massive data can be transferred into meaningful information.


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam Oct 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Roy Ka Wei Lee, Tin Seong Kam

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain Aug 2014

Collaborative Online Multitask Learning, Guangxia Li, Steven C. H. Hoi, Kuiyu Chang, Wenting Liu, Ramesh Jain

Research Collection School Of Computing and Information Systems

We study the problem of online multitask learning for solving multiple related classification tasks in parallel, aiming at classifying every sequence of data received by each task accurately and efficiently. One practical example of online multitask learning is the micro-blog sentiment detection on a group of users, which classifies micro-blog posts generated by each user into emotional or non-emotional categories. This particular online learning task is challenging for a number of reasons. First of all, to meet the critical requirements of online applications, a highly efficient and scalable classification solution that can make immediate predictions with low learning cost is …


Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang Jun 2014

Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang

Research Collection School Of Computing and Information Systems

With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user feedback to improve user satisfaction, but suffer from the absence of effective user review analytics tools. To facilitate mobile app developers discover the most “informative” user reviews from a large and rapidly increasing pool of user reviews, we present “AR-Miner” — a novel computational framework for App Review Mining, which performs comprehensive analytics from raw user reviews by (i) first extracting informative user reviews by …


On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo Apr 2014

On Finding The Point Where There Is No Return: Turning Point Mining On Game Data, Wei Gong, Ee Peng Lim, Feida Zhu, Achananuparp Palakorn, David Lo

Research Collection School Of Computing and Information Systems

Gaming expertise is usually accumulated through playing or watching many game instances, and identifying critical moments in these game instances called turning points. Turning point rules (shorten as TPRs) are game patterns that almost always lead to some irreversible outcomes. In this paper, we formulate the notion of irreversible outcome property which can be combined with pattern mining so as to automatically extract TPRs from any given game datasets. We specifically extend the well-known PrefixSpan sequence mining algorithm by incorporating the irreversible outcome property. To show the usefulness of TPRs, we apply them to Tetris, a popular game. We mine …


On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen Mar 2014

On Predicting User Affiliations Using Social Features In Online Social Networks, Minh Thap Nguyen

Dissertations and Theses Collection (Open Access)

User profiling such as user affiliation prediction in online social network is a challenging task, with many important applications in targeted marketing and personalized recommendation. The research task here is to predict some user affiliation attributes that suggest user participation in different social groups.


A Computational Approach To Qualitative Analysis In Large Textual Datasets, Michael Evans Feb 2014

A Computational Approach To Qualitative Analysis In Large Textual Datasets, Michael Evans

Dartmouth Scholarship

In this paper I introduce computational techniques to extend qualitative analysis into the study of large textual datasets. I demonstrate these techniques by using probabilistic topic modeling to analyze a broad sample of 14,952 documents published in major American newspapers from 1980 through 2012. I show how computational data mining techniques can identify and evaluate the significance of qualitatively distinct subjects of discussion across a wide range of public discourse. I also show how examining large textual datasets with computational methods can overcome methodological limitations of conventional qualitative methods, such as how to measure the impact of particular cases on …


Geospatial Data Pre-Processing On Watershed Datasets: A Gis Approach, Sreedhar Nallan, Leisa Armstrong, Barry Croke, Amiya K. Tripathy Jan 2014

Geospatial Data Pre-Processing On Watershed Datasets: A Gis Approach, Sreedhar Nallan, Leisa Armstrong, Barry Croke, Amiya K. Tripathy

Research outputs 2014 to 2021

Spatial data mining helps to identify interesting patterns from the spatial data sets. However, geo spatial data requires substantial data pre-processing before data can be interrogated further using data mining techniques. Multi-dimensional spatial data has been used to explain the spatial analysis and SOLAP for pre-processing data. This paper examines some of the methods for pre-processing of the data using Arc GIS 10.2 and Spatial Analyst with a case study dataset of a watershed.


Detecting Click Fraud In Online Advertising: A Data Mining Approach, Richard Oentaryo, Ee Peng Lim, Michael Finegold, David Lo, Feida Zhu, Clifton Phua, Eng-Yeow Cheu, Ghim-Eng Yap, Kelvin Sim, Kasun Perera, Bijay Neupane, Mustafa Faisal, Zeyar Aung, Wei Lee Woon, Wei Chen, Dhaval Patel, Daniel Berrar Jan 2014

Detecting Click Fraud In Online Advertising: A Data Mining Approach, Richard Oentaryo, Ee Peng Lim, Michael Finegold, David Lo, Feida Zhu, Clifton Phua, Eng-Yeow Cheu, Ghim-Eng Yap, Kelvin Sim, Kasun Perera, Bijay Neupane, Mustafa Faisal, Zeyar Aung, Wei Lee Woon, Wei Chen, Dhaval Patel, Daniel Berrar

Research Collection School Of Computing and Information Systems

Click fraud - the deliberate clicking on advertisements with no real interest on the product or service offered - is one of the most daunting problems in online advertising. Building an elective fraud detection method is thus pivotal for online advertising businesses. We organized a Fraud Detection in Mobile Advertising (FDMA) 2012 Competition, opening the opportunity for participants to work on real-world fraud data from BuzzCity Pte. Ltd., a global mobile advertising company based in Singapore. In particular, the task is to identify fraudulent publishers who generate illegitimate clicks, and distinguish them from normal publishers. The competition was held from …