Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 151

Full-Text Articles in Physical Sciences and Mathematics

Efficient Online Summarization Of Large-Scale Dynamic Networks, Qiang Qu, Siyuan Liu, Feida Zhu, Christian S. Jensen Dec 2016

Efficient Online Summarization Of Large-Scale Dynamic Networks, Qiang Qu, Siyuan Liu, Feida Zhu, Christian S. Jensen

Research Collection School Of Computing and Information Systems

Information diffusion in social networks is often characterized by huge participating communities and viral cascades of high dynamicity. To observe, summarize, and understand the evolution of dynamic diffusion processes in an informative and insightful way is a challenge of high practical value. However, few existing studies aim to summarize networks for interesting dynamic patterns. Dynamic networks raise new challenges not found in static settings, including time sensitivity, online interestingness evaluation, and summary traceability, which render existing techniques inadequate. We propose dynamic network summarization to summarize dynamic networks with millions of nodes by only capturing the few most interesting nodes or …


Careermapper: An Automated Resume Evaluation Tool, Vivian Lai, Kyong Jin Shim, Richard J. Oentaryo, Philips K. Prasetyo, Casey Vu, Ee-Peng Lim, David Lo Dec 2016

Careermapper: An Automated Resume Evaluation Tool, Vivian Lai, Kyong Jin Shim, Richard J. Oentaryo, Philips K. Prasetyo, Casey Vu, Ee-Peng Lim, David Lo

Research Collection School Of Computing and Information Systems

The advent of the Web brought about major changes in the way people search for jobs and companies look for suitable candidates. As more employers and recruitment firms turn to the Web for job candidate search, an increasing number of people turn to the Web for uploading and creating their online resumes. Resumes are often the first source of information about candidates and also the first item of evaluation in candidate selection. Thus, it is imperative that resumes are complete, free of errors and well-organized. We present an automated resume evaluation tool called 'CareerMapper'. Our tool is designed to conduct …


Validating Social Media Data For Automatic Persona Generation, Jisun An, Haewoon Kwak, Bernard J Jansen Dec 2016

Validating Social Media Data For Automatic Persona Generation, Jisun An, Haewoon Kwak, Bernard J Jansen

Research Collection School Of Computing and Information Systems

Using personas during interactive design has considerable potential for product and content development. Unfortunately, personas have typically been a fairly static technique. In this research, we validate an approach for creating personas in real time, based on analysis of actual social media data in an effort to automate the generation of personas. We validate that social media data can be implemented as an approach for automating generating personas in real time using actual YouTube social media data from a global media corporation that produces online digital content. Using the organization's YouTube channel, we collect demographic data, customer interactions, and topical …


Unsupervised Feature Selection For Outlier Detection By Modelling Hierarchical Value-Feature Couplings, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu Dec 2016

Unsupervised Feature Selection For Outlier Detection By Modelling Hierarchical Value-Feature Couplings, Guansong Pang, Longbing Cao, Ling Chen, Huan Liu

Research Collection School Of Computing and Information Systems

Proper feature selection for unsupervised outlier detection can improve detection performance but is very challenging due to complex feature interactions, the mixture of relevant features with noisy/redundant features in imbalanced data, and the unavailability of class labels. Little work has been done on this challenge. This paper proposes a novel Coupled Unsupervised Feature Selection framework (CUFS for short) to filter out noisy or redundant features for subsequent outlier detection in categorical data. CUFS quantifies the outlierness (or relevance) of features by learning and integrating both the feature value couplings and feature couplings. Such value-to-feature couplings capture intrinsic data characteristics and …


Cryptographic Reverse Firewall Via Malleable Smooth Projective Hash Functions, Rongmao Chen, Guomin Yang, Guomin Yang, Willy Susilo, Fuchun Guo, Mingwu Zhang Dec 2016

Cryptographic Reverse Firewall Via Malleable Smooth Projective Hash Functions, Rongmao Chen, Guomin Yang, Guomin Yang, Willy Susilo, Fuchun Guo, Mingwu Zhang

Research Collection School Of Computing and Information Systems

Motivated by the revelations of Edward Snowden, postSnowden cryptography has become a prominent research direction in recent years. In Eurocrypt 2015, Mironov and Stephens-Davidowitz proposed a novel concept named cryptographic reverse firewall (CRF) which can resist exfiltration of secret information from an arbitrarily compromised machine. In this work, we continue this line of research and present generic CRF constructions for several widely used cryptographic protocols based on a new notion named malleable smooth projective hash function. Our contributions can be summarized as follows. – We introduce the notion of malleable smooth projective hash function, which is an extension of the …


Designing A Datawarehousing And Business Analytics Course Using Experiential Learning Pedagogy, Gottipati Swapna, Venky Shankararaman Dec 2016

Designing A Datawarehousing And Business Analytics Course Using Experiential Learning Pedagogy, Gottipati Swapna, Venky Shankararaman

Research Collection School Of Computing and Information Systems

Experiential learning refers to learning from experience or learning by doing. Universities have explored various forms for implementing experiential learning such as apprenticeships, internships, cooperative education, practicums, service learning, job shadowing, fellowships and community activities. However, very little has been done in systematically trying to integrate experiential learning to the main stream academic curriculum. Over the last two years, at the authors’ university, a new program titled UNI-X was launched to achieve this. Combining academic curriculum with experiential learning pedagogy, provides a challenging environment for students to use their disciplinary knowledge and skills to tackle real world problems and issues …


Pairwise Relation Classification With Mirror Instances And A Combined Convolutional Neural Network, Jianfei Yu, Jing Jiang Dec 2016

Pairwise Relation Classification With Mirror Instances And A Combined Convolutional Neural Network, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

Relation classification is the task of classifying the semantic relations between entity pairs in text. Observing that existing work has not fully explored using different representations for relation instances, especially in order to better handle the asymmetry of relation types, in this paper, we propose a neural network based method for relation classification that combines the raw sequence and the shortest dependency path representations of relation instances and uses mirror instances to perform pairwise relation classification. We evaluate our proposed models on two widely used datasets: SemEval-2010 Task 8 and ACE-2005. The empirical results show that our combined model together …


From Footprint To Evidence: An Exploratory Study Of Mining Social Data For Credit Scoring, Guangming Guo, Feida Zhu, Enhong Chen, Qi Liu, Le Wu, Chu Guan Dec 2016

From Footprint To Evidence: An Exploratory Study Of Mining Social Data For Credit Scoring, Guangming Guo, Feida Zhu, Enhong Chen, Qi Liu, Le Wu, Chu Guan

Research Collection School Of Computing and Information Systems

With the booming popularity of online social networks like Twitter and Weibo, online user footprints are accumulating rapidly on the social web. Simultaneously, the question of how to leverage the large-scale user-generated social media data for personal credit scoring comes into the sight of both researchers and practitioners. It has also become a topic of great importance and growing interest in the P2P lending industry. However, compared with traditional financial data, heterogeneous social data presents both opportunities and challenges for personal credit scoring. In this article, we seek a deep understanding of how to learn users’ credit labels from social …


Answering Why-Not And Why Questions On Reverse Top-K Queries, Qing Liu, Yunjun Gao, Gang Chen, Baihua Zheng, Linlin Zhou Dec 2016

Answering Why-Not And Why Questions On Reverse Top-K Queries, Qing Liu, Yunjun Gao, Gang Chen, Baihua Zheng, Linlin Zhou

Research Collection School Of Computing and Information Systems

Why-not and why questions can be posed by database users to seek clarifications on unexpected query results. Specifically, why-not questions aim to explain why certain expected tuples are absent from the query results, while why questions try to clarify why certain unexpected tuples are present in the query results. This paper systematically explores the why-not and why questions on reverse top-k queries, owing to its importance in multi-criteria decision making. We first formalize why-not questions on reverse top-k queries, which try to include the missing objects in the reverse top-k query results, and then, we propose a unified framework called …


Iterated Random Oracle: A Universal Approach For Finding Loss In Security Reduction, Fuchun Guo, Willy Susilo, Yi Mu, Rongmao Chen, Jianchang Lai, Guomin Yang Dec 2016

Iterated Random Oracle: A Universal Approach For Finding Loss In Security Reduction, Fuchun Guo, Willy Susilo, Yi Mu, Rongmao Chen, Jianchang Lai, Guomin Yang

Research Collection School Of Computing and Information Systems

The indistinguishability security of a public-key cryptosystem can be reduced to a computational hard assumption in the random oracle model, where the solution to a computational hard problem is hidden in one of the adversary’s queries to the random oracle. Usually, there is a finding loss in finding the correct solution from the query set, especially when the decisional variant of the computational problem is also hard. The problem of finding loss must be addressed towards tight(er) reductions under this type. In EUROCRYPT 2008, Cash, Kiltz and Shoup proposed a novel approach using a trapdoor test that can solve the …


Zero++: Harnessing The Power Of Zero Appearances To Detect Anomalies In Large-Scale Data Sets, Guansong Pang, Kai Ming Ting, David Albrecht, Huidong Jin Dec 2016

Zero++: Harnessing The Power Of Zero Appearances To Detect Anomalies In Large-Scale Data Sets, Guansong Pang, Kai Ming Ting, David Albrecht, Huidong Jin

Research Collection School Of Computing and Information Systems

This paper introduces a new unsupervised anomaly detector called ZERO++ which employs the number of zero appearances in subspaces to detect anomalies in categorical data. It is unique in that it works in regions of subspaces that are not occupied by data; whereas existing methods work in regions occupied by data. ZERO++ examines only a small number of low dimensional subspaces to successfully identify anomalies. Unlike existing frequencybased algorithms, ZERO++ does not involve subspace pattern searching. We show that ZERO++ is better than or comparable with the state-of-the-art anomaly detection methods over a wide range of real-world categorical and numeric …


Cast2face: Assigning Character Names Onto Faces In Movie With Actor-Character Correspondence, Guangyu Gao, Mengdi Xu, Jialie Shen, Huangdong Ma, Shuicheng Yan Dec 2016

Cast2face: Assigning Character Names Onto Faces In Movie With Actor-Character Correspondence, Guangyu Gao, Mengdi Xu, Jialie Shen, Huangdong Ma, Shuicheng Yan

Research Collection School Of Computing and Information Systems

Automatically identifying characters in movies has attracted researchers' interest and led to several significant and interesting applications. However, due to the vast variation in character appearance as well as the weakness and ambiguity of available annotation, it is still a challenging problem. In this paper, we investigate this problem with the supervision of actor-character name correspondence provided by the movie cast. Our proposed framework, namely, Cast2Face, is featured by: 1) we restrict the assigned names within the set of character names in the cast; 2) for each character, by using the corresponding actor and movie name as keywords, we retrieve …


Data Exfiltration Detection And Prevention: Virtually Distributed Pomdps For Practically Safer Networks, Sara Marie Mc Carthy, Arunesh Sinha, Milind Tambe, Pratyusa Manadhata Nov 2016

Data Exfiltration Detection And Prevention: Virtually Distributed Pomdps For Practically Safer Networks, Sara Marie Mc Carthy, Arunesh Sinha, Milind Tambe, Pratyusa Manadhata

Research Collection School Of Computing and Information Systems

We address the challenge of detecting and addressing advanced persistent threats (APTs) in a computer network, focusing in particular on the challenge of detecting data exfiltration over Domain Name System (DNS) queries, where existing detection sensors are imperfect and lead to noisy observations about the network’s security state. Data exfiltration over DNS queries involves unauthorized transfer of sensitive data from an organization to a remote adversary through a DNS data tunnel to a malicious web domain. Given the noisy sensors, previous work has illustrated that standard approaches fail to satisfactorily rise to the challenge of detecting exfiltration attempts. Instead, we …


M(2)-Abks: Attribute-Based Multi-Keyword Search Over Encrypted Personal Health Records In Multi-Owner Setting, Yinbin Miao, Jianfeng Ma, Ximeng Liu, Fushan Wei, Zhiquan Liu, Xu An Wang Nov 2016

M(2)-Abks: Attribute-Based Multi-Keyword Search Over Encrypted Personal Health Records In Multi-Owner Setting, Yinbin Miao, Jianfeng Ma, Ximeng Liu, Fushan Wei, Zhiquan Liu, Xu An Wang

Research Collection School Of Computing and Information Systems

Online personal health record (PHR) is more inclined to shift data storage and search operations to cloud server so as to enjoy the elastic resources and lessen computational burden in cloud storage. As multiple patients' data is always stored in the cloud server simultaneously, it is a challenge to guarantee the confidentiality of PHR data and allow data users to search encrypted data in an efficient and privacy-preserving way. To this end, we design a secure cryptographic primitive called as attribute-based multi-keyword search over encrypted personal health records in multi-owner setting to support both fine-grained access control and multi-keyword search …


Landmark Reranking For Smart Travel Guide Systems By Combining And Analyzing Diverse Media, Junge Shen, Jialie Shen, Tao Mei, Xinbo Gao Nov 2016

Landmark Reranking For Smart Travel Guide Systems By Combining And Analyzing Diverse Media, Junge Shen, Jialie Shen, Tao Mei, Xinbo Gao

Research Collection School Of Computing and Information Systems

Advanced networking technologies and massive online social media have stimulated a booming growth of travel heterogeneous information in recent years. By employing such information, smart travel guide systems, such as landmark ranking systems, have been proposed to offer diverse online travel services. It is essential for a landmark ranking system to structure, analyze, and search the travel heterogeneous information to produce human-expected results. Therefore, currently the most fundamental yet challenging problems can be concluded: 1) how to fuse heterogeneous tourism information and 2) how to model landmark ranking. In this paper, a novel landmark search system is introduced based on …


Learning Sentence Embeddings With Auxiliary Tasks For Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang Nov 2016

Learning Sentence Embeddings With Auxiliary Tasks For Cross-Domain Sentiment Classification, Jianfei Yu, Jing Jiang

Research Collection School Of Computing and Information Systems

In this paper, we study cross-domain sentiment classification with neural network architectures. We borrow the idea from Structural Correspondence Learning and use two auxiliary tasks to help induce a sentence embedding that supposedly works well across domains for sentiment classification. We also propose to jointly learn this sentence embedding together with the sentiment classifier itself. Experiment results demonstrate that our proposed joint model outperforms several state-of-the-art methods on five benchmark datasets.


Summarization Of Egocentric Videos: A Comprehensive Survey, Ana Garcia Del Molino, Cheston Tan, Joo-Hwee Lim, Ah-Hwee Tan Nov 2016

Summarization Of Egocentric Videos: A Comprehensive Survey, Ana Garcia Del Molino, Cheston Tan, Joo-Hwee Lim, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

The introduction of wearable video cameras (e.g., GoPro) in the consumer market has promoted video life-logging, motivating users to generate large amounts of video data. This increasing flow of first-person video has led to a growing need for automatic video summarization adapted to the characteristics and applications of egocentric video. With this paper, we provide the first comprehensive survey of the techniques used specifically to summarize egocentric videos. We present a framework for first-person view summarization and compare the segmentation methods and selection algorithms used by the related work in the literature. Next, we describe the existing egocentric video datasets …


A Method Of Integrating Correlation Structures For A Generalized Recursive Route Choice Model, Tien Mai Nov 2016

A Method Of Integrating Correlation Structures For A Generalized Recursive Route Choice Model, Tien Mai

Research Collection School Of Computing and Information Systems

We propose a way to estimate a generalized recursive route choice model. The model generalizes other existing recursive models in the literature, i.e., (Fosgerau et al., 2013b; Mai et al., 2015c), while being more flexible since it allows the choice at each stage to be any member of the network multivariate extreme value (network MEV) model (Daly and Bierlaire, 2006). The estimation of the generalized model requires defining a contraction mapping and performing contraction iterations to solve the Bellman’s equation. Given the fact that the contraction mapping is defined based on the choice probability generating functions (CPGF) (Fosgerau et al., …


Hierarchical Visualization Of Video Search Results For Topic-Based Browsing, Yu-Gang Jiang, Jiajun Wang, Qiang Wang, Wei Liu, Chong-Wah Ngo Nov 2016

Hierarchical Visualization Of Video Search Results For Topic-Based Browsing, Yu-Gang Jiang, Jiajun Wang, Qiang Wang, Wei Liu, Chong-Wah Ngo

Research Collection School Of Computing and Information Systems

Existing video search engines return a ranked list of videos for each user query, which is not convenient for browsing the results of query topics that have multiple facets, such as the "early life," "personal life," and "presidency" of a query "Barack Obama." Organizing video search results into semantically structured hierarchies with nodes covering different topic facets can significantly improve the browsing efficiency for such queries. In this paper, we introduce a hierarchical visualization approach for video search result browsing, which can help users quickly understand the multiple facets of a query topic in a very well-organized manner. Given a …


Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo Nov 2016

Spiteful, One-Off, And Kind: Predicting Customer Feedback Behavior On Twitter, Agus Sulistya, Abhishek Sharma, David Lo

Research Collection School Of Computing and Information Systems

Social media provides a convenient way for customers to express their feedback to companies. Identifying different types of customers based on their feedback behavior can help companies to maintain their customers. In this paper, we use a machine learning approach to predict a customer’s feedback behavior based on her first feedback tweet. First, we identify a few categories of customers based on their feedback frequency and the sentiment of the feedback. We identify three main categories: spiteful, one-off, and kind. Next, we build a model to predict the category of a customer given her first feedback. We use profile and …


On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim Nov 2016

On Profiling Bots In Social Media, Richard J. Oentaryo, Arinto Murdopo, Philips K. Prasetyo, Ee Peng Lim

Research Collection School Of Computing and Information Systems

The popularity of social media platforms such as Twitter has led to the proliferation of automated bots, creating both opportunities and challenges in information dissemination, user engagements, and quality of services. Past works on profiling bots had been focused largely on malicious bots, with the assumption that these bots should be removed. In this work, however, we find many bots that are benign, and propose a new, broader categorization of bots based on their behaviors. This includes broadcast, consumption, and spam bots. To facilitate comprehensive analyses of bots and how they compare to human accounts, we develop a systematic profiling …


Cost Sensitive Online Multiple Kernel Classification, Doyen Sahoo, Peilin Zhao, Steven C. H. Hoi Nov 2016

Cost Sensitive Online Multiple Kernel Classification, Doyen Sahoo, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Learning from data streams has been an important open research problem in the era of big data analytics. This paper investigates supervised machine learning techniques for mining data streams with application to online anomaly detection. Unlike conventional machine learning tasks, machine learning from data streams for online anomaly detection has several challenges: (i) data arriving sequentially and increasing rapidly, (ii) highly class-imbalanced distributions; and (iii) complex anomaly patterns that could evolve dynamically.To tackle these challenges, we propose a novel Cost-Sensitive Online Multiple Kernel Classification (CSOMKC) scheme for comprehensively mining data streams and demonstrate its application to online anomaly detection. Specifically, …


Content Sampling, Household Informedness, And The Consumption Of Digital Information Goods, Ai Phuong Hoang, Robert J. Kauffman Nov 2016

Content Sampling, Household Informedness, And The Consumption Of Digital Information Goods, Ai Phuong Hoang, Robert J. Kauffman

Research Collection School Of Computing and Information Systems

Technology and media are delivering content that is transforming society. Providers must compete for consumer attention to sell their digital information goods effectively. This is challenging, since there is a high level of uncertainty associated with the consumption of such goods. Service providers often use free programming to share product information. We examine the effectiveness of content sampling strategy used for on-demand series dramas, a unique class of entertainment goods. The data were extracted from a large set of household video-on-demand (VoD) viewing records and combined with external data sources. We extended a propensity score matching (PSM) approach to handle …


Aspect-Based Helpfulness Prediction For Online Product Reviews, Yinfei Yang, Cen Chen, Forrest Sheng Bao Nov 2016

Aspect-Based Helpfulness Prediction For Online Product Reviews, Yinfei Yang, Cen Chen, Forrest Sheng Bao

Research Collection School Of Computing and Information Systems

Product reviews greatly influence purchase decisions in online shopping. A common burden of online shopping is that consumers have to search for the right answers through massive reviews, especially on popular products. Hence, estimating and predicting the helpfulness of reviews become important tasks to directly improve shopping experience. In this paper, we propose a new approach to helpfulness prediction by leveraging aspect analysis of reviews. Our hypothesis is that a helpful review will cover many aspects of a product at different emphasis levels. The first step to tackle this problem is to extract proper aspects. Because related products share common …


Attractiveness Versus Competition: Towards An Unified Model For User Visitation, Thanh-Nam Doan, Ee-Peng Lim Oct 2016

Attractiveness Versus Competition: Towards An Unified Model For User Visitation, Thanh-Nam Doan, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

Modeling user check-in behavior provides useful insights about venues as well as the users visiting them. These insights can be used in urban planning and recommender system applications. Unlike previous works that focus on modeling distance effect on user’s choice of check-in venues, this paper studies check-in behaviors affected by two venue-related factors, namely, area attractiveness and neighborhood competitiveness. The former refers to the ability of an area with multiple venues to collectively attract checkins from users, while the latter represents the ability of a venue to compete with its neighbors in the same area for check-ins. We first embark …


Tracking Virality And Susceptibility In Social Media, Tuan Anh Hoang, Ee-Peng Lim Oct 2016

Tracking Virality And Susceptibility In Social Media, Tuan Anh Hoang, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

In social media, the magnitude of information propagation hinges on the virality and susceptibility of users spreading and receiving the information respectively, as well as the virality of information items. These users' and items' behavioral factors evolve dynamically at the same time interacting with one another. Previous works however measure the factors statically and independently in a restricted case: each user has only a single adoption on each item, and/or users' exposure to items are observable. In this work, we investigate the inter-relationship among the factors and users' multiple adoptions on items to propose both new static and temporal models …


Inferring Links Between Concerns And Methods With Multi-Abstraction Vector Space Model, Yun Zhang, David Lo, Xin Xia, Tien-Duy B. Le, Giuseppe Scanniello, Jianling Sun Oct 2016

Inferring Links Between Concerns And Methods With Multi-Abstraction Vector Space Model, Yun Zhang, David Lo, Xin Xia, Tien-Duy B. Le, Giuseppe Scanniello, Jianling Sun

Research Collection School Of Computing and Information Systems

Concern localization refers to the process of locating code units that match a particular textual description. It takes as input textual documents such as bug reports and feature requests and outputs a list of candidate code units that are relevant to the bug reports or feature requests. Many information retrieval (IR) based concern localization techniques have been proposed in the literature. These techniques typically represent code units and textual descriptions as a bag of tokens at one level of abstraction, e.g., each token is a word, or each token is a topic. In this work, we propose a multi-abstraction concern …


Online Adaptive Passive-Aggressive Methods For Non-Negative Matrix Factorization And Its Applications, Chenghao Liu, Hoi, Steven C. H., Peilin Zhao, Jianling Sun, Ee-Peng Lim Oct 2016

Online Adaptive Passive-Aggressive Methods For Non-Negative Matrix Factorization And Its Applications, Chenghao Liu, Hoi, Steven C. H., Peilin Zhao, Jianling Sun, Ee-Peng Lim

Research Collection School Of Computing and Information Systems

This paper aims to investigate efficient and scalable machine learning algorithms for resolving Non-negative Matrix Factorization (NMF), which is important for many real-world applications, particularly for collaborative filtering and recommender systems. Unlike traditional batch learning methods, a recently proposed online learning technique named "NN-PA" tackles NMF by applying the popular Passive-Aggressive (PA) online learning, and found promising results. Despite its simplicity and high efficiency, NN-PA falls short in at least two critical limitations: (i) it only exploits the first-order information and thus may converge slowly especially at the beginning of online learning tasks; (ii) it is sensitive to some key …


Behavior Analysis In Social Networks: Challenges, Technologies, And Trends, Meng Wang, Ee-Peng Lim, Lei Li, Mehmet Orgun Oct 2016

Behavior Analysis In Social Networks: Challenges, Technologies, And Trends, Meng Wang, Ee-Peng Lim, Lei Li, Mehmet Orgun

Research Collection School Of Computing and Information Systems

The research on social networks has advanced significantly, which can be attributed to the prevalence of the online social websites and instant messaging systems as well as the popularity of mobile apps that support easy access to online social networks. These social networks are usually characterized by the complex network structures and rich contextual information. They now become the key platforms for, among others, content dissemination, professional networking, recommendation, alerting, and political campaigns. As online social network users perform activities on the social networks, they leave data traces of human behavior which allow the latter to be studied at scale. …


Arise-Pie: A People Information Integration Engine Over The Web, Vincent W. Zheng, Tao Hoang, Penghe Chen, Yuan Fang, Xiaoyan Yang Oct 2016

Arise-Pie: A People Information Integration Engine Over The Web, Vincent W. Zheng, Tao Hoang, Penghe Chen, Yuan Fang, Xiaoyan Yang

Research Collection School Of Computing and Information Systems

Searching for people information on the Web is a common practice in life. However, it is time consuming to search for such information manually. In this paper, we aim to develop an automatic people information search system, named ARISE-PIE. To build such a system, we tackle two major technical challenges: data harvesting and data integration. For data harvesting, we study how to leverage search engine to help crawl the relevant Web pages for a target entity; then we propose a novel learning to query model that can automatically select a set of "best" queries to maximize collective utility (e.g., precision …