Open Access. Powered by Scholars. Published by Universities.®

Digital Commons Network

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 37

Full-Text Articles in Entire DC Network

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman Dec 2015

Capstone Projects Mining System For Insights And Recommendations, Melvrivk Aik Chun Goh, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

In this paper, we present a classification based system to discover knowledge and trends in higher education students’ projects. Essentially, the educational capstone projects provide an opportunity for students to apply what they have learned and prepare themselves for industry needs. Therefore mining such projects gives insights of students’ experiences as well as industry project requirements and trends. In particular, we mine capstone projects executed by Information Systems students to discover patterns and insights related to people, organization, domain, industry needs and time. We build a capstone projects mining system (CPMS) based on classification models that leverage text mining, natural …


Modeling Social Media Content With Word Vectors For Recommendation, Ying Ding, Jing Jiang Dec 2015

Modeling Social Media Content With Word Vectors For Recommendation, Ying Ding, Jing Jiang

Research Collection School Of Computing and Information Systems

In social media, recommender systems are becoming more and more important. Different techniques have been designed for recommendations under various scenarios, but many of them do not use user-generated content, which potentially reflects users’ opinions and interests. Although a few studies have tried to combine user-generated content with rating or adoption data, they mostly reply on lexical similarity to calculate textual similarity. However, in social media, a diverse range of words is used. This renders the traditional ways of calculating textual similarity ineffective. In this work, we apply vector representation of words to measure the semantic similarity between text. We …


Deep Multimodal Learning For Affective Analysis And Retrieval, Lei Pang, Shiai Zhu, Chong-Wah Ngo Nov 2015

Deep Multimodal Learning For Affective Analysis And Retrieval, Lei Pang, Shiai Zhu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Social media has been a convenient platform for voicing opinions through posting messages, ranging from tweeting a short text to uploading a media file, or any combination of messages. Understanding the perceived emotions inherently underlying these user-generated contents (UGC) could bring light to emerging applications such as advertising and media analytics. Existing research efforts on affective computation are mostly dedicated to single media, either text captions or visual content. Few attempts for combined analysis of multiple media are made, despite that emotion can be viewed as an expression of multimodal experience. In this paper, we explore the learning of highly …


Vireo-Tno @ Trecvid 2015: Multimedia Event Detection, Hao Zhang, Yi-Jie Lu, Maaike De Boer, Frank Ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, Chong-Wah Ngo Nov 2015

Vireo-Tno @ Trecvid 2015: Multimedia Event Detection, Hao Zhang, Yi-Jie Lu, Maaike De Boer, Frank Ter Haar, Zhaofan Qiu, Klamer Schutte, Wessel Kraaij, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

This paper presents an overview and comparative analysis of our systems designed for the TRECVID 2015 [1] multimedia event detection (MED) task. We submitted 17 runs, of which 5 each for the zeroexample, 10-example and 100-example subtasks for the Pre-Specified (PS) event detection and 2 runs for the 10-example subtask for the Ad-Hoc (AH) event detection. We did not participate in the Interactive Run. This year we focus on three different parts of the MED task: 1) extending the size of our concept bank and combining it with improved dense trajectories; 2) exploring strategies for semantic query generation (SQG); and …


Towards Automatic Generation Of Security-Centric Descriptions For Android Apps, Mu Zhang, Yue Duan, Qian Feng, Heng Yin Oct 2015

Towards Automatic Generation Of Security-Centric Descriptions For Android Apps, Mu Zhang, Yue Duan, Qian Feng, Heng Yin

Research Collection School Of Computing and Information Systems

To improve the security awareness of end users, Android markets directly present two classes of literal app information: 1) permission requests and 2) textual descriptions. Unfortunately, neither can serve the needs. A permission list is not only hard to understand but also inadequate; textual descriptions provided by developers are not security-centric and are significantly deviated from the permissions. To fill in this gap, we propose a novel technique to automatically generate security-centric app descriptions, based on program analysis. We implement a prototype system, DESCRIBEME, and evaluate our system using both DroidBench and real-world Android apps. Experimental results demonstrate that DESCRIBEME …


Mood Self-Assessment On Smartphones, Le Minh Khue, Eng Lieh Ouh, Stan Jarzabek Oct 2015

Mood Self-Assessment On Smartphones, Le Minh Khue, Eng Lieh Ouh, Stan Jarzabek

Research Collection School Of Computing and Information Systems

Mood has been systematically studied by psychologists for over 100 years. As mood is a subjective feeling, any study of mood must take into account and accurately capture user’s perception of an experienced feeling. In last 40 years, a number of pen-andpaper mood self-assessment scales have been proposed. Typically, a person is asked to separately rate various dimensions of the experienced feeling (e.g., pleasure and arousal) or mood items (interested, agitated, excited, etc.) on numeric scales (e.g., between 0 and 10). These partial ratings are then combined into an overall mood rating (or into its positive and negative affect). Penand-paper …


Choosing Your Weapons: On Sentiment Analysis Tools For Software Engineering Research, Robbert Jongeling, Subhajit Datta, Alexander Serebrenik Oct 2015

Choosing Your Weapons: On Sentiment Analysis Tools For Software Engineering Research, Robbert Jongeling, Subhajit Datta, Alexander Serebrenik

Research Collection School Of Computing and Information Systems

Recent years have seen an increasing attention to social aspects of software engineering, including studies of emotions and sentiments experienced and expressed by the software developers. Most of these studies reuse existing sentiment analysis tools such as SentiStrength and NLTK. However, these tools have been trained on product reviews and movie reviews and, therefore, their results might not be applicable in the software engineering domain. In this paper we study whether the sentiment analysis tools agree with the sentiment recognized by human evaluators (as reported in an earlier study) as well as with each other. Furthermore, we evaluate the impact …


Analyzing Educational Comments For Topics And Sentiments: A Text Analytics Approach, Gokran Ila Nitin, Swapna Gottipati, Venky Shankararaman Oct 2015

Analyzing Educational Comments For Topics And Sentiments: A Text Analytics Approach, Gokran Ila Nitin, Swapna Gottipati, Venky Shankararaman

Research Collection School Of Computing and Information Systems

Universities collect qualitative and quantitative feedback from students upon course completion in order to improve course quality and students’ learning experience. Combining program-wide and module-specific questions, universities collect feedback from students on three main aspects of a course namely, teaching style, content, and learning experience. The feedback is collected through both qualitative comments and quantitative scores. Current methods for analyzing the student course evaluations are manual and majorly focus on quantitative feedback and fall short of an in-depth exploration of qualitative feedback. In this paper, we develop student feedback mining system (SFMS) which applies text analytics and opinion mining approach …


Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong Sep 2015

Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the …


Name List Only? Target Entity Disambiguation In Short Texts, Yixin Cao, Juanzi Li, Xiaofei Guo, Shuanhu Bai, Heng Ji, Jie Tang Sep 2015

Name List Only? Target Entity Disambiguation In Short Texts, Yixin Cao, Juanzi Li, Xiaofei Guo, Shuanhu Bai, Heng Ji, Jie Tang

Research Collection School Of Computing and Information Systems

Target entity disambiguation (TED), the task of identifying target entities of the same domain, has been recognized as a critical step in various important applications. In this paper, we propose a graphbased model called TremenRank to collectively identify target entities in short texts given a name list only. TremenRank propagates trust within the graph, allowing for an arbitrary number of target entities and texts using inverted index technology. Furthermore, we design a multi-layer directed graph to assign different trust levels to short texts for better performance. The experimental results demonstrate that our model outperforms state-of-the-art methods with an average gain …


Towards Opinion Summarization From Online Forums, Ding Ying, Jing Jiang Sep 2015

Towards Opinion Summarization From Online Forums, Ding Ying, Jing Jiang

Research Collection School Of Computing and Information Systems

Summarizing opinions expressed in online forums can potentially benefit many people. However, special characteristics of this problem may require changes to standard text summarization techniques. In this work, we present our initial attempt at extractive summarization of opinionated online forum threads. Given the nature of user generated content in online discussion forums, we hypothesize that besides relevance, text quality and subjectivity also play important roles in deciding which sentences are good summary sentences. We therefore construct an annotated corpus to facilitate our study of extractive summarization of online discussion forums. We define a set of features to capture relevance, text …


Evaluation And Improvement Of Procurement Process With Data Analytics, Melvin H. C. Tan, Wee Leong Lee Sep 2015

Evaluation And Improvement Of Procurement Process With Data Analytics, Melvin H. C. Tan, Wee Leong Lee

Research Collection School Of Computing and Information Systems

Analytics can be applied in procurement to benefit organizations beyond just prevention and detection of fraud. This study aims to demonstrate how advanced data mining techniques such as text mining and cluster analysis can be used to improve visibility of procurement patterns and provide decision-makers with insight to develop more efficient sourcing strategies, in terms of cost and effort. A case study of an organization’s effort to improve its procurement process is presented in this paper. The findings from this study suggest that opportunities exist for organizations to aggregate common goods and services among the purchases made under and across …


A Joint Model Of Product Properties, Aspects And Ratings For Online Reviews, Ding Ying, Jing Jiang Sep 2015

A Joint Model Of Product Properties, Aspects And Ratings For Online Reviews, Ding Ying, Jing Jiang

Research Collection School Of Computing and Information Systems

Product review mining is an important task that can benefit both businesses and consumers. Lately a number of models combining collaborative filtering and content analysis to model reviews have been proposed, among which the Hidden Factors as Topics (HFT) model is a notable one. In this work, we propose a new model on top of HFT to separate product properties and aspects. Product properties are intrinsic to certain products (e.g. types of cuisines of restaurants) whereas aspects are dimensions along which products in the same category can be compared (e.g. service quality of restaurants). Our proposed model explicitly separates the …


Did You Expect Your Users To Say This?: Distilling Unexpected Micro-Reviews For Venue Owners, Wen-Haw Chong, Bingtian Dai, Ee-Peng Lim Sep 2015

Did You Expect Your Users To Say This?: Distilling Unexpected Micro-Reviews For Venue Owners, Wen-Haw Chong, Bingtian Dai, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

With social media platforms such as Foursquare, users can now generate concise reviews, i.e. micro-reviews, about entities such as venues (or products). From the venue owner's perspective, analysing these micro-reviews will offer interesting insights, useful for event detection and customer relationship management. However not all micro-reviews are equally important, especially since a venue owner should already be familiar with his venue's primary aspects. Instead we envisage that a venue owner will be interested in micro-reviews that are unexpected to him. These can arise in many ways, such as users focusing on easily overlooked aspects (by the venue owner), making comparisons …


Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong Sep 2015

Using Content-Level Structures For Summarizing Microblog Repost Trees, Jing Li, Wei Gao, Zhongyu Wei, Baolin Peng, Kam-Fai Wong

Research Collection School Of Computing and Information Systems

A microblog repost tree provides strong clues on how an event described therein develops. To help social media users capture the main clues of events on microblogging sites, we propose a novel repost tree summarization framework by effectively differentiating two kinds of messages on repost trees called leaders and followers, which are derived from contentlevel structure information, i.e., contents of messages and the reposting relations. To this end, Conditional Random Fields (CRF) model is used to detect leaders across repost tree paths. We then present a variant of random-walk-based summarization model to rank and select salient messages based on the …


Tweet Sentiment: From Classification To Quantification, Wei Gao, Fabrizio Sebastiani Aug 2015

Tweet Sentiment: From Classification To Quantification, Wei Gao, Fabrizio Sebastiani

Research Collection School Of Computing and Information Systems

Sentiment classification has become a ubiquitous enabling technology in the Twittersphere, since classifying tweets according to the sentiment they convey towards a given entity (be it a product, a person, a political party, or a policy) has many applications in political science, social science, market research, and many others. In this paper we contend that most previous studies dealing with tweet sentiment classification (TSC) use a suboptimal approach. The reason is that the final goal of most such studies is not estimating the class label (e.g., Positive, Negative, or Neutral) of individual tweets, but estimating the relative frequency (a.k.a. "prevalence") …


Faitcrowd: Fine Grained Truth Discovery For Crowdsourced Data Aggregation, Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Jiawei Han Aug 2015

Faitcrowd: Fine Grained Truth Discovery For Crowdsourced Data Aggregation, Fenglong Ma, Yaliang Li, Qi Li, Minghui Qiu, Jing Gao, Shi Zhi, Lu Su, Bo Zhao, Jiawei Han

Research Collection School Of Computing and Information Systems

In crowdsourced data aggregation task, there exist conflicts in the answers provided by large numbers of sources on the same set of questions. The most important challenge for this task is to estimate source reliability and select answers that are provided by high-quality sources. Existing work solves this problem by simultaneously estimating sources' reliability and inferring questions' true answers (i.e., the truths). However, these methods assume that a source has the same reliability degree on all the questions, but ignore the fact that sources' reliability may vary significantly among different topics. To capture various expertise levels on different topics, we …


Gibberish, Assistant, Or Master? Using Tweets Linking To News For Extractive Single-Document Summarization, Zhongyu Wei, Wei Gao Aug 2015

Gibberish, Assistant, Or Master? Using Tweets Linking To News For Extractive Single-Document Summarization, Zhongyu Wei, Wei Gao

Research Collection School Of Computing and Information Systems

Single-document summarization is a challenging task. In this paper, we explore effective ways using the tweets linking to news for generating extractive summary of each document. We reveal the very basic value of tweets that can be utilized by regarding every tweet as a vote for candidate sentences. Base on such finding, we resort to unsupervised summarization models by leveraging the linking tweets to master the ranking of candidate extracts via random walk on a heterogeneous graph. The advantage is that we can use the linking tweets to opportunistically "supervise" the summarization with no need of reference summaries. Furthermore, we …


Topic Modeling With Document Relative Similarities, Jianguang Du, Jing Jiang, Dandan Song, Lejian Liao Jul 2015

Topic Modeling With Document Relative Similarities, Jianguang Du, Jing Jiang, Dandan Song, Lejian Liao

Research Collection School Of Computing and Information Systems

Topic modeling has been widely used in text mining. Previous topic models such as Latent Dirichlet Allocation (LDA) are successful in learning hidden topics but they do not take into account metadata of documents. To tackle this problem, many augmented topic models have been proposed to jointly model text and metadata. But most existing models handle only categorical and numerical types of metadata. We identify another type of metadata that can be more natural to obtain in some scenarios. These are relative similarities among documents. In this paper, we propose a general model that links LDA with constraints derived from …


Solar: Scalable Online Learning Algorithms For Ranking, Jialei Wang, Ji Wan, Yongdong Zhang, Steven C. H. Hoi Jul 2015

Solar: Scalable Online Learning Algorithms For Ranking, Jialei Wang, Ji Wan, Yongdong Zhang, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Traditional learning to rank methods learn ranking models from training data in a batch and offline learning mode, which suffers from some critical limitations, e.g., poor scalability as the model has to be retrained from scratch whenever new training data arrives. This is clearly nonscalable for many real applications in practice where training data often arrives sequentially and frequently. To overcome the limitations, this paper presents SOLAR- a new framework of Scalable Online Learning Algorithms for Ranking, to tackle the challenge of scalable learning to rank. Specifically, we propose two novel SOLAR algorithms and analyze their IR measure bounds theoretically. …


Using Tweets To Help Sentence Compression For News Highlights Generation, Zhongyu Wei, Yang Liu, Chen Li, Wei Gao Jul 2015

Using Tweets To Help Sentence Compression For News Highlights Generation, Zhongyu Wei, Yang Liu, Chen Li, Wei Gao

Research Collection School Of Computing and Information Systems

We explore using relevant tweets of a given news article to help sentence compression for generating compressive news highlights. We extend an unsupervised dependency-tree based sentence compression approach by incorporating tweet information to weight the tree edge in terms of informativeness and syntactic importance. The experimental results on a public corpus that contains both news articles and relevant tweets show that our proposed tweets guided sentence compression method can improve the summarization performance significantly compared to the baseline generic sentence compression method.


A Convolution Kernel Approach To Identifying Comparisons In Text, Maksim Tkachenko, Hady W. Lauw Jul 2015

A Convolution Kernel Approach To Identifying Comparisons In Text, Maksim Tkachenko, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Comparisons in text, such as in online reviews, serve as useful decision aids. In this paper, we focus on the task of identifying whether a comparison exists between a specific pair of entity mentions in a sentence. This formulation is transformative, as previous work only seeks to determine whether a sentence is comparative, which is presumptuous in the event the sentence mentions multiple entities and is comparing only some, not all, of them. Our approach leverages not only lexical features such as salient words, but also structural features expressing the relationships among words and entity mentions. To model these features …


A Hassle-Free Unsupervised Domain Adaptation Method Using Instance Similarity Features, Jianfei Yu, Jing Jiang Jul 2015

A Hassle-Free Unsupervised Domain Adaptation Method Using Instance Similarity Features, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

We present a simple yet effective unsupervised domain adaptation method that can be generally applied for different NLP tasks. Our method uses unlabeled target domain instances to induce a set of instance similarity features. These features are then combined with the original features to represent labeled source domain instances. Using three NLP tasks, we show that our method consistently out-performs a few baselines, including SCL, an existing general unsupervised domain adaptation method widely used in NLP. More importantly, our method is very easy to implement and incurs much less computational cost than SCL.


Qcri: Answer Selection For Community Question Answering - Experiment For Arabic And English, Massimo Nicosia, Simone Filice, Alberto Barron-Cedeno, Iman Saleh, Hamdy Mubarak, Wei Gao, Preslav Nakov, Giovanni Da San Martino, Alessandro Moschitti, Kareem Darwish, Lluis Marquz Marquz, Shafiq Joty, Walid Magdy Magdy Jun 2015

Qcri: Answer Selection For Community Question Answering - Experiment For Arabic And English, Massimo Nicosia, Simone Filice, Alberto Barron-Cedeno, Iman Saleh, Hamdy Mubarak, Wei Gao, Preslav Nakov, Giovanni Da San Martino, Alessandro Moschitti, Kareem Darwish, Lluis Marquz Marquz, Shafiq Joty, Walid Magdy Magdy

Research Collection School Of Computing and Information Systems

This paper describes QCRI’s participation in SemEval-2015 Task 3 “Answer Selection in Community Question Answering”, which targeted real-life Web forums, and was offered in both Arabic and English. We apply a supervised machine learning approach considering a manifold of features including among others word n-grams, text similarity, sentiment analysis, the presence of specific words, and the context of a comment. Our approach was the best performing one in the Arabic subtask and the third best in the two English subtasks


Flutcha: Using Fluency To Distinguish Humans From Computers, Kotaro Hara, Mohammad Taghi Hajiaghayi, Benjamin B. Benderson May 2015

Flutcha: Using Fluency To Distinguish Humans From Computers, Kotaro Hara, Mohammad Taghi Hajiaghayi, Benjamin B. Benderson

Research Collection School Of Computing and Information Systems

Improvements in image understanding technologies aremaking it possible for computers to pass traditionalCAPTCHA tests with high probability. This suggests theneed for new kinds of tasks that are easy to accomplishfor humans but remain difficult for computers. In thispaper, we introduce Fluency CAPTCHA (FluTCHA), anovel method to distinguish humans from computersusing the fact that humans are better than machines atimproving the fluency of sentences. We propose a wayto let users work on FluTCHA tests and simultaneouslycomplete useful linguistic tasks. Evaluation studiesdemonstrate the feasibility of using FluTCHA todistinguish humans from computers.


Rclinker: Automated Linking Of Issue Reports And Commits Leveraging Rich Contextual Information, Tien-Duy B. Le, Mario Linares Vasquez, David Lo, Denys Poshyvanyk May 2015

Rclinker: Automated Linking Of Issue Reports And Commits Leveraging Rich Contextual Information, Tien-Duy B. Le, Mario Linares Vasquez, David Lo, Denys Poshyvanyk

Research Collection School Of Computing and Information Systems

Links between issue reports and their corresponding commits in version control systems are often missing. However, these links are important for measuring the quality of a software system, predicting defects, and many other tasks. Several approaches have been designed to solve this problem by automatically linking bug reports to source code commits via comparison of textual information in commit messages and bug reports. Yet, the effectiveness of these techniques is oftentimes suboptimal when commit messages are empty or contain minimum information; this particular problem makes the process of recovering traceability links between commits and bug reports particularly challenging. In this …


Active Semi-Supervised Defect Categorization, Ferdian Thung, Xuan-Bach D. Le, David Lo May 2015

Active Semi-Supervised Defect Categorization, Ferdian Thung, Xuan-Bach D. Le, David Lo

Research Collection School Of Computing and Information Systems

Defects are inseparable part of software development and evolution. To better comprehend problems affecting a software system, developers often store historical defects and these defects can be categorized into families. IBM proposes Orthogonal Defect Categorization (ODC) which include various classifications of defects based on a number of orthogonal dimensions (e.g., symptoms and semantics of defects, root causes of defects, etc.). To help developers categorize defects, several approaches that employ machine learning have been proposed in the literature. Unfortunately, these approaches often require developers to manually label a large number of defect examples. In practice, manually labelling a large number of …


Characterizing Silent Users In Social Media Communities, Wei Gong, Ee-Peng Lim, Feida Zhu May 2015

Characterizing Silent Users In Social Media Communities, Wei Gong, Ee-Peng Lim, Feida Zhu

Research Collection School Of Computing and Information Systems

Silent users often constitute a significant proportion of an online user-generated content system. In the context of social media such as Twitter, users can opt to be silent all or most of the time. They are often called the invisible participants or lurkers. As lurkers contribute little to the online content, existing analysis often overlooks their presence and voices. However, we argue that understanding lurkers is important in many applications such as recommender systems, targeted advertising, and social sensing. This research therefore seeks to characterize lurkers in social media and propose methods to profile them. We examine 18 weeks of …


Advances In Knowledge Discovery And Data Mining Part Ii, Tru Cao, Ee Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Wai-Lok Cheung, Hiroshi Motoda May 2015

Advances In Knowledge Discovery And Data Mining Part Ii, Tru Cao, Ee Peng Lim, Zhi-Hua Zhou, Tu-Bao Ho, David Wai-Lok Cheung, Hiroshi Motoda

Research Collection School Of Computing and Information Systems

No abstract provided.


Multi-Roles Affiliation Model For General User Profiling, Lizi Liao, Heyan Huang, Yashen Wang Apr 2015

Multi-Roles Affiliation Model For General User Profiling, Lizi Liao, Heyan Huang, Yashen Wang

Research Collection School Of Computing and Information Systems

Online social networks release user attributes, which is important for many applications. Due to the sparsity of such user attributes online, many works focus on profiling user attributes automatically. However, in order to profile a specific user attribute, an unique model is built and such model usually does not fit other profiling tasks. In our work, we design a novel, flexible general user profiling model which naturally models users’ friendships with user attributes. Experiments show that our method simultaneously profile multiple attributes with better performance.