Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Research Collection School Of Computing and Information Systems

Computer Engineering

Data Mining

Publication Year

Articles 1 - 4 of 4

Full-Text Articles in Physical Sciences and Mathematics

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw Aug 2017

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a …


Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li May 2016

Efspredictor: Predicting Configuration Bugs With Ensemble Feature Selection, Bowen Xu, David Lo, Xin Xia, Ashish Sureka, Shanping Li

Research Collection School Of Computing and Information Systems

The configuration of a system determines the system behavior and wrong configuration settings can adversely impact system's availability, performance, and correctness. We refer to these wrong configuration settings as configuration bugs. The importance of configuration bugs has prompted many researchers to study it, and past studies can be grouped into three categories: detection, localization, and fixing of configuration bugs. In the work, we focus on the detection of configuration bugs, in particular, we follow the line-of-work that tries to predict if a bug report is caused by a wrong configuration setting. Automatically prediction of whether a bug is a configuration …


Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth Apr 2015

Using Support Vector Machine Ensembles For Target Audience Classification On Twitter, Siaw Ling Lo, Raymond Chiong, David Cornforth

Research Collection School Of Computing and Information Systems

The vast amount and diversity of the content shared on social media can pose a challenge for any business wanting to use it to identify potential customers. In this paper, our aim is to investigate the use of both unsupervised and supervised learning methods for target audience classification on Twitter with minimal annotation efforts. Topic domains were automatically discovered from contents shared by followers of an account owner using Twitter Latent Dirichlet Allocation (LDA). A Support Vector Machine (SVM) ensemble was then trained using contents from different account owners of the various topic domains identified by Twitter LDA. Experimental results …


Learning Image‐Text Associations, Tao Jiang, Ah-Hwee Tan Feb 2009

Learning Image‐Text Associations, Tao Jiang, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Web information fusion can be defined as the problem of collating and tracking information related to specific topics on the World Wide Web. Whereas most existing work on Web information fusion has focused on text-based multidocument summarization, this paper concerns the topic of image and text association, a cornerstone of cross-media Web information fusion. Specifically, we present two learning methods for discovering the underlying associations between images and texts based on small training data sets. The first method based on vague transformation measures the information similarity between the visual features and the textual features through a set of predefined domain-specific …