Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences

Singapore Management University

Series

Data Mining

Articles 1 - 7 of 7

Full-Text Articles in Physical Sciences and Mathematics

Automated Query Reformulation For Efficient Search Based On Query Logs From Stack Overflow, Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, Xiang Chen May 2021

Automated Query Reformulation For Efficient Search Based On Query Logs From Stack Overflow, Kaibo Cao, Chunyang Chen, Sebastian Baltes, Christoph Treude, Xiang Chen

Research Collection School Of Computing and Information Systems

As a popular Q&A site for programming, Stack Overflow is a treasure for developers. However, the amount of questions and answers on Stack Overflow make it difficult for developers to efficiently locate the information they are looking for. There are two gaps leading to poor search results: the gap between the user's intention and the textual query, and the semantic gap between the query and the post content. Therefore, developers have to constantly reformulate their queries by correcting misspelled words, adding limitations to certain programming languages or platforms, etc. As query reformulation is tedious for developers, especially for novices, we …


Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw Aug 2017

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a …


Recommendation Vs Sentiment Analysis: A Text-Driven Latent Factor Model For Rating Prediction With Cold-Start Awareness, Kaisong Song, Wei Gao, Shi Feng Feng, Daling Wang, Kam-Fai Wong, Chengqi Zhang Aug 2017

Recommendation Vs Sentiment Analysis: A Text-Driven Latent Factor Model For Rating Prediction With Cold-Start Awareness, Kaisong Song, Wei Gao, Shi Feng Feng, Daling Wang, Kam-Fai Wong, Chengqi Zhang

Research Collection School Of Computing and Information Systems

Review rating prediction is an important research topic. The problem was approached from either the perspective of recommender systems (RS) or that of sentiment analysis (SA). Recent SA research using deep neural networks (DNNs) has realized the importance of user and product interaction for better interpreting the sentiment of reviews. However, the complexity of DNN models in terms of the scale of parameters is very high, and the performance is not always satisfying especially when user-product interaction is sparse. In this paper, we propose a simple, extensible RS-based model, called Text-driven Latent Factor Model (TLFM), to capture the semantics of …


Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li May 2016

Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li

Research Collection School Of Computing and Information Systems

The configuration of a system determines the system behavior and wrong configuration settings can adversely impact system's availability, performance, and correctness. We refer to these wrong configuration settings as configuration bugs. The importance of configuration bugs has prompted many researchers to study it, and past studies can be grouped into three categories: detection, localization, and fixing of configuration bugs. In the work, we focus on the detection of configuration bugs, in particular, we follow the line-of-work that tries to predict if a bug report is caused by a wrong configuration setting. Automatically prediction of whether a bug is a configuration …


Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth Apr 2015

Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth

Research Collection School Of Computing and Information Systems

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results …


Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Tin Seong Kam, Roy Ka Wei Lee Mar 2014

Time-Series Data Mining In Transportation: A Case Study On Singapore Public Train Commuter Travel Patterns, Tin Seong Kam, Roy Ka Wei Lee

Research Collection School Of Computing and Information Systems

The adoption of smart cards technologies and automated data collection systems (ADCS) in transportation domain had provided public transport planners opportunities to amass a huge and continuously increasing amount of time-series data about the behaviors and travel patterns of commuters. However the explosive growth of temporal related databases has far outpaced the transport planners’ ability to interpret these data using conventional statistical techniques, creating an urgent need for new techniques to support the analyst in transforming the data into actionable information and knowledge. This research study thus explores and discusses the potential use of time-series data mining, a relatively new …


Learning Image‐Text Associations, Tao Jiang, Ah-Hwee Tan Feb 2009

Learning Image‐Text Associations, Tao Jiang, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Web information fusion can be defined as the problem of collating and tracking information related to specific topics on the World Wide Web. Whereas most existing work on Web information fusion has focused on text-based multidocument summarization, this paper concerns the topic of image and text association, a cornerstone of cross-media Web information fusion. Specifically, we present two learning methods for discovering the underlying associations between images and texts based on small training data sets. The first method based on vague transformation measures the information similarity between the visual features and the textual features through a set of predefined domain-specific …