Open Access. Powered by Scholars. Published by Universities.®

Engineering Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 23 of 23

Full-Text Articles in Engineering

Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh Dec 2017

Proactive Sequential Resource (Re)Distribution For Improving Efficiency In Urban Environments, Supriyo Ghosh

Dissertations and Theses Collection (Open Access)

Due to the increasing population and lack of coordination, there is a mismatch in supply and demand of common resources (e.g., shared bikes, ambulances, taxis) in urban environments, which has deteriorated a wide variety of quality of life metrics such as success rate in issuing shared bikes, response times for emergency needs, waiting times in queues etc. Thus, in my thesis, I propose efficient algorithms that optimise the quality of life metrics by proactively redistributing the resources using intelligent operational (day-to-day) and strategic (long-term) decisions in the context of urban transportation and health & safety. For urban transportation, Bike Sharing …


Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen Dec 2017

Who Are Your Users? Comparing Media Professionals' Preconception Of Users To Data-Driven Personas, Lene Nielsen, Soon-Gyu Jung, Jisun An, Joni Salminen, Haewoon Kwak, Bernard J. Jansen

Research Collection School Of Computing and Information Systems

One of the reasons for using personas is to align user understandings across project teams and sites. As part of a larger persona study, at Al Jazeera English (AJE), we conducted 16 qualitative interviews with media producers, the end users of persona descriptions. We asked the participants about their understanding of a typical AJE media consumer, and the variety of answers shows that the understandings are not aligned and are built on a mix of own experiences, own self, assumptions, and data given by the company. The answers are sometimes aligned with the data-driven personas and sometimes not. The end …


Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao Nov 2017

Selective Value Coupling Learning For Detecting Outliers In High-Dimensional Categorical Data, Guansong Pang, Hongzuo Xu, Cao Longbing, Wentao Zhao

Research Collection School Of Computing and Information Systems

This paper introduces a novel framework, namely SelectVC and its instance POP, for learning selective value couplings (i.e., interactions between the full value set and a set of outlying values) to identify outliers in high-dimensional categorical data. Existing outlier detection methods work on a full data space or feature subspaces that are identified independently from subsequent outlier scoring. As a result, they are significantly challenged by overwhelming irrelevant features in high-dimensional data due to the noise brought by the irrelevant features and its huge search space. In contrast, SelectVC works on a clean and condensed data space spanned by selective …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifyingtrue values of data items of interest from conflicting multi-sourceddata. Although considerable research efforts have been conducted on thistopic, existing approaches generally assume every data item has exactlyone true value, which fails to reflect the real world where data items withmultiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items.SourceVote models the endorsement relations among sources by quantifyingtheir two-sided inter-source agreements. In particular, two graphs areconstructed to model inter-source relations. Then two aspects of sourcereliability are derived from these graphs and …


Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu Nov 2017

Sourcevote: Fusing Multi-Valued Data Via Inter-Source Agreements, Xiu Susie Fang, Quan Z. Sheng, Xianzhi Wang, Mahmoud Barhamgi, Lina Yao, Anne H.H. Ngu

Research Collection School Of Computing and Information Systems

Data fusion is a fundamental research problem of identifying true values of data items of interest from conflicting multi-sourced data. Although considerable research efforts have been conducted on this topic, existing approaches generally assume every data item has exactly one true value, which fails to reflect the real world where data items with multiple true values widely exist. In this paper, we propose a novel approach,SourceVote, to estimate value veracity for multi-valued data items. SourceVote models the endorsement relations among sources by quantifying their two-sided inter-source agreements. In particular, two graphs are constructed to model inter-source relations. Then two aspects …


Cross-Modal Recipe Retrieval With Rich Food Attributes, Jingjing Chen, Chong-Wah Ngo, Tat-Seng Chua Oct 2017

Cross-Modal Recipe Retrieval With Rich Food Attributes, Jingjing Chen, Chong-Wah Ngo, Tat-Seng Chua

Research Collection School Of Computing and Information Systems

Food is rich of visible (e.g., colour, shape) and procedural (e.g., cutting, cooking) attributes. Proper leveraging of these attributes, particularly the interplay among ingredients, cutting and cooking methods, for health-related applications has not been previously explored. This paper investigates cross-modal retrieval of recipes, specifically to retrieve a text-based recipe given a food picture as query. As similar ingredient composition can end up with wildly different dishes depending on the cooking and cutting procedures, the difficulty of retrieval originates from fine-grained recognition of rich attributes from pictures. With a multi-task deep learning model, this paper provides insights on the feasibility of …


Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun Sep 2017

Personalized Microtopic Recommendation On Microblogs, Yang Li, Jing Jiang, Ting Liu, Minghui Qiu, Xiaofei Sun

Research Collection School Of Computing and Information Systems

Microblogging services such as Sina Weibo and Twitter allow users to create tags explicitly indicated by the # symbol. In Sina Weibo, these tags are called microtopics, and in Twitter, they are called hashtags. In Sina Weibo, each microtopic has a designate page and can be directly visited or commented on. Recommending these microtopics to users based on their interests can help users efficiently acquire information. However, it is non-trivial to recommend microtopics to users to satisfy their information needs. In this article, we investigate the task of personalized microtopic recommendation, which exhibits two challenges. First, users usually do not …


Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu Aug 2017

Learning Homophily Couplings From Non-Iid Data For Joint Feature Selection And Noise-Resilient Outlier Detection, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu

Research Collection School Of Computing and Information Systems

This paper introduces a novel wrapper-based outlier detection framework (WrapperOD) and its instance (HOUR) for identifying outliers in noisy data (i.e., data with noisy features) with strong couplings between outlying behaviors. Existing subspace or feature selection-based methods are significantly challenged by such data, as their search of feature subset(s) is independent of outlier scoring and thus can be misled by noisy features. In contrast, HOUR takes a wrapper approach to iteratively optimize the feature subset selection and outlier scoring using a top-k outlier ranking evaluation measure as its objective function. HOUR learns homophily couplings between outlying behaviors (i.e., abnormal behaviors …


Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua Zheng Aug 2017

Indexing Metric Uncertain Data For Range Queries And Range Joins, Lu Chen, Yunjun Gao, Aoxiao Zhong, Christian S. Jensen, Gang Chen, Baihua Zheng

Research Collection School Of Computing and Information Systems

Range queries and range joins in metric spaces have applications in many areas, including GIS, computational biology, and data integration, where metric uncertain data exist in different forms, resulting from circumstances such as equipment limitations, high-throughput sequencing technologies, and privacy preservation. We represent metric uncertain data by using an object-level model and a bi-level model, respectively. Two novel indexes, the uncertain pivot B+-tree (UPB-tree) and the uncertain pivot B+-forest (UPB-forest), are proposed in order to support probabilistic range queries and range joins for a wide range of uncertain data types and similarity metrics. Both index structures use a small set …


Geometric Approaches For Top-K Queries [Tutorial], Kyriakos Mouratidis Aug 2017

Geometric Approaches For Top-K Queries [Tutorial], Kyriakos Mouratidis

Research Collection School Of Computing and Information Systems

Top-k processing is a well-studied problem with numerous applications that is becoming increasingly relevant with the growing availability of recommendation systems and decision-making software. The objective of this tutorial is twofold. First, we will delve into the geometric aspects of top-k processing. Second, we will cover complementary features to top-k queries, with strong practical relevance and important applications, that have a computational geometric nature. The tutorial will close with insights in the effect of dimensionality on the meaningfulness of top-k queries, and interesting similarities to nearest neighbor search.


Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu Aug 2017

Smartphone Sensing Meets Transport Data: A Collaborative Framework For Transportation Service Analytics, Yu Lu, Archan Misra, Wen Sun, Huayu Wu

Research Collection School Of Computing and Information Systems

We advocate for and introduce TRANSense, a framework for urban transportation service analytics that combines participatory smartphone sensing data with city-scale transportation-related transactional data (taxis, trains etc.). Our work is driven by the observed limitations of using each data type in isolation: (a) commonly-used anonymous city-scale datasets (such as taxi bookings and GPS trajectories) provide insights into the aggregate behavior of transport infrastructure, but fail to reveal individual-specific transport experiences (e.g., wait times in taxi queues); while (b) mobile sensing data can capture individual-specific commuting-related activities, but suffers from accuracy and energy overhead challenges due to usage artefacts and lack …


Pivot-Based Metric Indexing, Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, Keyu Yang Aug 2017

Pivot-Based Metric Indexing, Lu Chen, Yunjun Gao, Baihua Zheng, Christian S. Jensen, Hanyu Yang, Keyu Yang

Research Collection School Of Computing and Information Systems

The general notion of a metric space encompasses a diverse range of data types and accompanying similarity measures. Hence, metric search plays an important role in a wide range of settings, including multimedia retrieval, data mining, and data integration. With the aim of accelerating metric search, a collection of pivot-based indexing techniques for metric data has been proposed, which reduces the number of potentially expensive similarity comparisons by exploiting the triangle inequality for pruning and validation. However, no comprehensive empirical study of those techniques exists. Existing studies each offers only a narrower coverage, and they use different pivot selection strategies …


Time-Aware Conversion Prediction, Wendi Ji, Xiaoling Wang, Feida Zhu Aug 2017

Time-Aware Conversion Prediction, Wendi Ji, Xiaoling Wang, Feida Zhu

Research Collection School Of Computing and Information Systems

The importance of product recommendation has been well recognized as a central task in business intelligence for e-commerce websites. Interestingly, what has been less aware of is the fact that different products take different time periods for conversion. The “conversion” here refers to actually a more general set of pre-defined actions, including for example purchases or registrations in recommendation and advertising systems. The mismatch between the product’s actual conversion period and the application’s target conversion period has been the subtle culprit compromising many existing recommendation algorithms.The challenging question: what products should be recommended for a given time period to maximize …


Embedding-Based Representation Of Categorical Data By Hierarchical Value Coupling Learning, Songlei Jian, Longbing Cao, Guansong Pang, Kai Lu, Hang Gao Aug 2017

Embedding-Based Representation Of Categorical Data By Hierarchical Value Coupling Learning, Songlei Jian, Longbing Cao, Guansong Pang, Kai Lu, Hang Gao

Research Collection School Of Computing and Information Systems

Learning the representation of categorical data with hierarchical value coupling relationships is very challenging but critical for the effective analysis and learning of such data. This paper proposes a novel coupled unsupervised categorical data representation (CURE) framework and its instantiation, i.e., a coupled data embedding (CDE) method, for representing categorical data by hierarchical value-to-value cluster coupling learning. Unlike existing embedding- and similarity-based representation methods which can capture only a part or none of these complex couplings, CDE explicitly incorporates the hierarchical couplings into its embedding representation. CDE first learns two complementary feature value couplings which are then used to cluster …


Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw Aug 2017

Semantic Visualization For Short Texts With Word Embeddings, Van Minh Tuan Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Semantic visualization integrates topic modeling and visualization, such that every document is associated with a topic distribution as well as visualization coordinates on a low-dimensional Euclidean space. We address the problem of semantic visualization for short texts. Such documents are increasingly common, including tweets, search snippets, news headlines, or status updates. Due to their short lengths, it is difficult to model semantics as the word co-occurrences in such a corpus are very sparse. Our approach is to incorporate auxiliary information, such as word embeddings from a larger corpus, to supplement the lack of co-occurrences. This requires the development of a …


Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh Jul 2017

Mining Capstone Project Wikis For Knowledge Discovery, Swapna Gottipati, Venky Shankararaman, Melvrivk Goh

Research Collection School Of Computing and Information Systems

Wikis are widely used collaborative environments as sources of information and knowledge. The facilitate students to engage in collaboration and share information among members and enable collaborative learning. In particular, Wikis play an important role in capstone projects. Wikis aid in various project related tasks and aid to organize information and share. Mining project Wikis is critical to understand the students learning and latest trends in industry. Mining Wikis is useful to educationists and academicians for decision-making about how to modify the educational environment to improve student's learning. The main challenge is that the content or data in project Wikis …


Discovering Newsworthy Themes From Sequenced Data: A Step Towards Computational Journalism, Qi Fan, Yuchen Li, Dongxiang Zhang, Kian-Lee Tan Tan Jul 2017

Discovering Newsworthy Themes From Sequenced Data: A Step Towards Computational Journalism, Qi Fan, Yuchen Li, Dongxiang Zhang, Kian-Lee Tan Tan

Research Collection School Of Computing and Information Systems

Automatic discovery of newsworthy themes from sequenced data can relieve journalists from manually poring over a large amount of data in order to find interesting news. In this paper, we propose a novel k -Sketch query that aims to find k striking streaks to best summarize a subject. Our scoring function takes into account streak strikingness and streak coverage at the same time. We study the k -Sketch query processing in both offline and online scenarios, and propose various streak-level pruning techniques to find striking candidates. Among those candidates, we then develop approximate methods to discover the k most representative …


Sap: Improving Continuous Top-K Queries Over Streaming Data, Rui Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng, Guoren Wang Jun 2017

Sap: Improving Continuous Top-K Queries Over Streaming Data, Rui Zhu, Bin Wang, Xiaochun Yang, Baihua Zheng, Guoren Wang

Research Collection School Of Computing and Information Systems

Continuous top-k query over streaming data is a fundamental problem in database. In this paper, we focus on the sliding window scenario, where a continuous top-k query returns the top-k objects within each query window on the data stream. Existing algorithms support this type of queries via incrementally maintaining a subset of objects in the window and try to retrieve the answer from this subset as much as possible whenever the window slides. However, since all the existing algorithms are sensitive to query parameters and data distribution, they all suffer from expensive incremental maintenance cost. In this paper, we propose …


Is The Whole Greater Than The Sum Of Its Parts?, Liangyue Li, Hanghang Tong, Yong Wang, Conglei Shi, Nan Cao, Norbou Buchler Jun 2017

Is The Whole Greater Than The Sum Of Its Parts?, Liangyue Li, Hanghang Tong, Yong Wang, Conglei Shi, Nan Cao, Norbou Buchler

Research Collection School Of Computing and Information Systems

The PART-WHOLE relationship routinely finds itself in many disciplines, ranging from collaborative teams, crowdsourcing, autonomous systems to networked systems. From the algorithmic perspective, the existing work has primarily focused on predicting the outcomes of the whole and parts, by either separate models or linear joint models, which assume the outcome of the parts has a linear and independent effect on the outcome of the whole. In this paper, we propose a joint predictive method named PAROLE to simultaneously and mutually predict the part and whole outcomes. The proposed method offers two distinct advantages over the existing work. First (Model Generality), …


Discovering Your Selling Points: Personalized Social Influential Tags Exploration, Yuchen Li, Kian-Lee Tan, Ju Fan, Dongxiang Zhang May 2017

Discovering Your Selling Points: Personalized Social Influential Tags Exploration, Yuchen Li, Kian-Lee Tan, Ju Fan, Dongxiang Zhang

Research Collection School Of Computing and Information Systems

Social influence has attracted significant attention owing to the prevalence of social networks (SNs). In this paper, we study a new social influence problem, called personalized social influential tags exploration (PITEX), to help any user in the SN explore how she influences the network. Given a target user, it finds a size-k tag set that maximizes this user’s social influence. We prove the problem is NP-hard to be approximated within any constant ratio. To solve it, we introduce a sampling-based framework, which has an approximation ratio of 1−ǫ 1+ǫ with high probabilistic guarantee. To speedup the computation, we devise more …


A Data-Driven Approach For Benchmarking Energy Efficiency Of Warehouse Buildings, Wee Leong Lee, Kar Way Tan, Zui Young Lim May 2017

A Data-Driven Approach For Benchmarking Energy Efficiency Of Warehouse Buildings, Wee Leong Lee, Kar Way Tan, Zui Young Lim

Research Collection School Of Computing and Information Systems

This study proposes adata-driven approach for benchmarking energy efficiency of warehouse buildings.Our proposed approach provides an alternative to the limitation of existingbenchmarking approaches where a theoretical energy-efficient warehouse was usedas a reference. Our approach starts by defining the questions needed to capturethe characteristics of warehouses relating to energy consumption. Using an existingdata set of warehouse building containing various attributes, we first cluster theminto groups by their characteristics. The warehouses characteristics derivedfrom the cluster assignments along with their past annual energy consumptionare subsequently used to train a decision tree model. The decision tree providesa classification of what factors contribute to different …


Aspect Extraction From Product Reviews Using Category Hierarchy Information, Yifeng Yang, Chen Cen, Minghui Qiu, Forrest Sheng Bao Apr 2017

Aspect Extraction From Product Reviews Using Category Hierarchy Information, Yifeng Yang, Chen Cen, Minghui Qiu, Forrest Sheng Bao

Research Collection School Of Computing and Information Systems

Aspect extraction is a task to abstract the common properties of objects from corpora discussing them, such as reviews of products. Recent work on aspect extraction is leveraging the hierarchical relationship between products and their categories. However, such effort focuses on the aspects of child categories but ignores those from parent categories. Hence, we propose an LDA-based generative topic model inducing the two-layer categorical information (CAT-LDA), to balance the aspects of both a parent category and its child categories. Our hypothesis is that child categories inherit aspects from parent categories, controlled by the hierarchy between them. Experimental results on 5 …


Online Growing Neural Gas For Anomaly Detection In Changing Surveillance Scenes, Qianru Sun, Hong Liu, Tatsuya Harada Apr 2017

Online Growing Neural Gas For Anomaly Detection In Changing Surveillance Scenes, Qianru Sun, Hong Liu, Tatsuya Harada

Research Collection School Of Computing and Information Systems

Anomaly detection is still a challenging task for video surveillance due to complex environments and unpredictable human behaviors. Most existing approaches train offline detectors using manually labeled data and predefined parameters, and are hard to model changing scenes. This paper introduces a neural network based model called online Growing Neural Gas (online GNG) to perform an unsupervised learning. Unlike a parameter-fixed GNG, our model updates learning parameters continuously, for which we propose several online neighbor-related strategies. Specific operations, namely neuron insertion, deletion, learning rate adaptation and stopping criteria selection, get upgraded to online modes. In the anomaly detection stage, the …