Open Access. Powered by Scholars. Published by Universities.®

Computer Sciences Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 41

Full-Text Articles in Computer Sciences

Anomaly Detection Through Enhanced Sentiment Analysis On Social Media Data, Zhaoxia Wang, Victor Joo, Chuan Tong, Xin Xin, Hoong Chor Chin Dec 2014

Anomaly Detection Through Enhanced Sentiment Analysis On Social Media Data, Zhaoxia Wang, Victor Joo, Chuan Tong, Xin Xin, Hoong Chor Chin

Research Collection School Of Computing and Information Systems

Anomaly detection in sentiment analysis refers to detecting abnormal opinions, sentiment patterns or special temporal aspects of such patterns in a collection of data. The anomalies detected may be due to sudden sentiment changes hidden in large amounts of text. If these anomalies are undetected or poorly managed, the consequences may be severe, e.g. A business whose customers reveal negative sentiments and will no longer support the establishment. Social media platforms, such as Twitter, provide a vast source of information, which includes user feedback, opinion and information on most issues. Many organizations also leverage social media platforms to publish information …


High-Dimensional Data Stream Classification Via Sparse Online Learning, Dayong Wang, Pengcheng Wu, Peilin Zhao, Yue Wu, Chunyan Miao, Steven C. H. Hoi Dec 2014

High-Dimensional Data Stream Classification Via Sparse Online Learning, Dayong Wang, Pengcheng Wu, Peilin Zhao, Yue Wu, Chunyan Miao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

The amount of data in our society has been exploding in the era of big data today. In this paper, we address several open challenges of big data stream classification, including high volume, high velocity, high dimensionality, and high sparsity. Many existing studies in data mining literature solve data stream classification tasks in a batch learning setting, which suffers from poor efficiency and scalability when dealing with big data. To overcome the limitations, this paper investigates an online learning framework for big data stream classification tasks. Unlike some existing online data stream classification techniques that are often based on first-order …


Extracting Interest Tags From Twitter User Biographies, Ying Ding, Jing Jiang Dec 2014

Extracting Interest Tags From Twitter User Biographies, Ying Ding, Jing Jiang

Research Collection School Of Computing and Information Systems

Twitter, one of the most popular social media platforms, has been studied from different angles. One of the important sources of information in Twitter is users’ biographies, which are short self-introductions written by users in free form. Biographies often describe users’ background and interests. However, to the best of our knowledge, there has not been much work trying to extract information from Twitter biographies. In this work, we study how to extract information revealing users’ personal interests from Twitter biographies. A sequential labeling model is trained with automatically constructed labeled data. The popular patterns expressing user interests are extracted and …


Generative Modeling Of Entity Comparisons In Text, Maksim Tkachenko, Hady W. Lauw Nov 2014

Generative Modeling Of Entity Comparisons In Text, Maksim Tkachenko, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Users frequently rely on online reviews for decision making. In addition to allowing users to evaluate the quality of individual products, reviews also support comparison shopping. One key user activity is to compare two (or more) products based on a specific aspect. However, making a comparison across two different reviews, written by different authors, is not always equitable due to the different standards and preferences of individual authors. Therefore, we focus instead on comparative sentences, whereby two products are compared directly by a review author within a single sentence. We study the problem of comparative relation mining. Given a set …


A First Look At Global News Coverage Of Disasters By Using The Gdelt Dataset, Haewoon Kwak, Jisun. An Nov 2014

A First Look At Global News Coverage Of Disasters By Using The Gdelt Dataset, Haewoon Kwak, Jisun. An

Research Collection School Of Computing and Information Systems

In this work, we reveal the structure of global news coverage of disasters and its determinants by using a large-scale news coverage dataset collected by the GDELT (Global Data on Events, Location, and Tone) project that monitors news media in over 100 languages from the whole world. Significant variables in our hierarchical (mixed-effect) regression model, such as population, political stability, damage, and more, are well aligned with a series of previous research. However, we find strong regionalism in news geography, highlighting the necessity of comprehensive datasets for the study of global news coverage.


Modloc: Localizing Multiple Objects In Dynamic Indoor Environment, Xiaonan Guo, Dian Zhang, Kaishun Wu, Lionel M. Ni Nov 2014

Modloc: Localizing Multiple Objects In Dynamic Indoor Environment, Xiaonan Guo, Dian Zhang, Kaishun Wu, Lionel M. Ni

Research Collection School Of Computing and Information Systems

Radio frequency (RF) based technologies play an important role in indoor localization, since Radio Signal Strength (RSS) can be easily measured by various wireless devices without additional cost. Among these, radio map based technologies (also referred as fingerprinting technologies) are attractive due to high accuracy and easy deployment. However, these technologies have not been extensively applied on real environment for two fatal limitations. First, it is hard to localize multiple objects. When the number of target objects is unknown, constructing a radio map of multiple objects is almost impossible. Second, environment changes will generate different multipath signals and severely disturb …


Linguistic Analysis Of Toxic Behavior In An Online Video Game, Haewoon Kwak, Telefonica Nov 2014

Linguistic Analysis Of Toxic Behavior In An Online Video Game, Haewoon Kwak, Telefonica

Research Collection School Of Computing and Information Systems

In this paper we explore the linguistic components of toxic behavior by using crowdsourced data from over 590 thousand cases of accused toxic players in a popular match-based competition game, League of Legends. We perform a series of linguistic analyses to gain a deeper understanding of the role communication plays in the expression of toxic behavior. We characterize linguistic behavior of toxic players and compare it with that of typical players in an online competition game. We also find empirical support describing how a player transitions from typical to toxic behavior. Our findings can be helpful to automatically detect and …


Dynamic Clustering Of Contextual Multi-Armed Bandits, Trong T. Nguyen, Hady W. Lauw Nov 2014

Dynamic Clustering Of Contextual Multi-Armed Bandits, Trong T. Nguyen, Hady W. Lauw

Research Collection School Of Computing and Information Systems

With the prevalence of the Web and social media, users increasingly express their preferences online. In learning these preferences, recommender systems need to balance the trade-off between exploitation, by providing users with more of the "same", and exploration, by providing users with something "new" so as to expand the systems' knowledge. Multi-armed bandit (MAB) is a framework to balance this trade-off. Most of the previous work in MAB either models a single bandit for the whole population, or one bandit for each user. We propose an algorithm to divide the population of users into multiple clusters, and to customize the …


Online Passive Aggressive Active Learning And Its Applications, Jing Lu, Peilin Zhao, Steven C. H. Hoi Nov 2014

Online Passive Aggressive Active Learning And Its Applications, Jing Lu, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

We investigate online active learning techniques for classification tasks in data stream mining applications. Unlike traditional learning approaches (either batch or online learning) that often require to request the class label of each incoming instance, online active learning queries only a subset of informative incoming instances to update the classification model, which aims to maximize classification performance using minimal human labeling effort during the entire online stream data mining task. In this paper, we present a new family of algorithms for online active learning called Passive-Aggressive Active (PAA) learning algorithms by adapting the popular Passive-Aggressive algorithms in an online active …


Partisan Sharing: Facebook Evidence And Societal Consequences, Jisun An, Daniele Quercia, Jon Crowcroft Oct 2014

Partisan Sharing: Facebook Evidence And Societal Consequences, Jisun An, Daniele Quercia, Jon Crowcroft

Research Collection School Of Computing and Information Systems

The hypothesis of selective exposure assumes that people seek out information that supports their views and eschew information that conflicts with their beliefs, and that has negative consequences on our society. Few researchers have recently found counter evidence of selective exposure in social media: users are exposed to politically diverse articles. No work has looked at what happens after exposure, particularly how individuals react to such exposure, though. Users might well be exposed to diverse articles but share only the partisan ones. To test this, we study partisan sharing on Facebook: the tendency for users to predominantly share like-minded news …


Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi Oct 2014

Cost-Sensitive Online Classification, Jialei Wang, Peilin Zhao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Both cost-sensitive classification and online learning have been extensively studied in data mining and machine learning communities, respectively. However, very limited study addresses an important intersecting problem, that is, “Cost-Sensitive Online Classification". In this paper, we formally study this problem, and propose a new framework for Cost-Sensitive Online Classification by directly optimizing cost-sensitive measures using online gradient descent techniques. Specifically, we propose two novel cost-sensitive online classification algorithms, which are designed to directly optimize two well-known cost-sensitive measures: (i) maximization of weighted sum of sensitivity and specificity, and (ii) minimization of weighted misclassification cost. We analyze the theoretical bounds of …


Online Probabilistic Learning For Fuzzy Inference System, Richard Jayadi Oentaryo, Meng Joo Er, San Linn, Xiang Li Sep 2014

Online Probabilistic Learning For Fuzzy Inference System, Richard Jayadi Oentaryo, Meng Joo Er, San Linn, Xiang Li

Research Collection School Of Computing and Information Systems

Online learning is a key methodology for expert systems to gracefully cope with dynamic environments. In the context of neuro-fuzzy systems, research efforts have been directed toward developing online learning methods that can update both system structure and parameters on the fly. However, the current online learning approaches often rely on heuristic methods that lack a formal statistical basis and exhibit limited scalability in the face of large data stream. In light of these issues, we develop a new Sequential Probabilistic Learning for Adaptive Fuzzy Inference System (SPLAFIS) that synergizes the Bayesian Adaptive Resonance Theory (BART) and Rule-Wise Decoupled Extended …


Interestingness-Driven Diffussion Process Summarization In Dynamic Networks, Qiang Qu, Siyuan Liu, Christian Jensen, Feida Zhu, Christos Faloutsos Sep 2014

Interestingness-Driven Diffussion Process Summarization In Dynamic Networks, Qiang Qu, Siyuan Liu, Christian Jensen, Feida Zhu, Christos Faloutsos

Research Collection School Of Computing and Information Systems

The widespread use of social networks enables the rapid diffusion of information, e.g., news, among users in very large communities. It is a substantial challenge to be able to observe and understand such diffusion processes, which may be modeled as networks that are both large and dynamic. A key tool in this regard is data summarization. However, few existing studies aim to summarize graphs/networks for dynamics. Dynamic networks raise new challenges not found in static settings, including time sensitivity and the needs for online interestingness evaluation and summary traceability, which render existing techniques inapplicable. We study the topic of dynamic …


Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft Sep 2014

Sharing Political News: The Balancing Act Of Intimacy And Socialization In Selective Exposure, Jisun An, Daniele Quercia, Meeyoung Cha, Krishna Gummadi, Jon Crowcroft

Research Collection School Of Computing and Information Systems

One might think that, compared to traditional media, social media sites allow people to choose more freely what to read and what to share, especially for politically oriented news. However, reading and sharing habits originate from deeply ingrained behaviors that might be hard to change. To test the extent to which this is true, we propose a Political News Sharing (PoNS) model that holistically captures four key aspects of social psychology: gratification, selective exposure, socialization, and trust & intimacy. Using real instances of political news sharing in Twitter, we study the predictive power of these features. As one might expect, …


Semantic Visualization For Spherical Representation, Tuan M. V. Le, Hady W. Lauw Aug 2014

Semantic Visualization For Spherical Representation, Tuan M. V. Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Visualization of high-dimensional data such as text documents is widely applicable. The traditional means is to find an appropriate embedding of the high-dimensional representation in a low-dimensional visualizable space. As topic modeling is a useful form of dimensionality reduction that preserves the semantics in documents, recent approaches aim for a visualization that is consistent with both the original word space, as well as the semantic topic space. In this paper, we address the semantic visualization problem. Given a corpus of documents, the objective is to simultaneously learn the topic distributions as well as the visualization coordinates of documents. We propose …


Automated Prediction Of Glasgow Outcome Scale For Traumatic Brain Injury, Bolan Su, Thien Anh Dinh, A. K. Ambastha, Tianxia Gong, Tomi Silander, Shijian Lu, C. C. Tchoyoson Lim, Boon Chuan Pang, Cheng Kiang Lee, Tze-Yun Leong, Chew Lim Tan Aug 2014

Automated Prediction Of Glasgow Outcome Scale For Traumatic Brain Injury, Bolan Su, Thien Anh Dinh, A. K. Ambastha, Tianxia Gong, Tomi Silander, Shijian Lu, C. C. Tchoyoson Lim, Boon Chuan Pang, Cheng Kiang Lee, Tze-Yun Leong, Chew Lim Tan

Research Collection School Of Computing and Information Systems

Clinical features found in brain CT scan images are widely used in traumatic brain injury (TBI) as indicators for Glasgow Outcome Scale (GOS) prediction. However, due to the lack of automated methods to measure and quantify the CT scan image features, the computerized prediction of GOS in TBI has not been well studied. This paper introduces an automated GOS prediction system for traumatic brain CT images. Different from most existing systems that perform the prognosis based on pre-processed data, our system directly works on brain CT scan images based on the image features. Our system can also be extended to …


Online Multiple Kernel Regression, Doyen Sahoo, Steven C. H. Hoi, Bin Li Aug 2014

Online Multiple Kernel Regression, Doyen Sahoo, Steven C. H. Hoi, Bin Li

Research Collection School Of Computing and Information Systems

Kernel-based regression represents an important family of learning techniques for solving challenging regression tasks with non-linear patterns. Despite being studied extensively, most of the existing work suffers from two major drawbacks: (i) they are often designed for solving regression tasks in a batch learning setting, making them not only computationally inefficient and but also poorly scalable in real-world applications where data arrives sequentially; and (ii) they usually assume a fixed kernel function is given prior to the learning task, which could result in poor performance if the chosen kernel is inappropriate. To overcome these drawbacks, this paper presents a novel …


Diversity-Oriented Bi-Objective Hyper-Heuristics For Patrol Scheduling, Mustafa Misir, Hoong Chuin Lau Aug 2014

Diversity-Oriented Bi-Objective Hyper-Heuristics For Patrol Scheduling, Mustafa Misir, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

The patrol scheduling problem is concerned with assigning security teams to different stations for distinct time intervals while respecting a limited number of contractual constraints. The objective is to minimise the total distance travelled while maximising the coverage of the stations with respect to their security requirement levels. This paper introduces a hyper-heuristic strategy focusing on generating diverse solutions for a bi-objective patrol scheduling problem. While a variety of hyper-heuristics have been applied to a large suite of problem domains usually in the form of single-objective optimisation, we suggest an alternative approach for solving the patrol scheduling problem with two …


A Mathematical Model And Metaheuristics For Time Dependent Orienteering Problem, Aldy Gunawan, Zhi Yuan, Hoong Chuin Lau Aug 2014

A Mathematical Model And Metaheuristics For Time Dependent Orienteering Problem, Aldy Gunawan, Zhi Yuan, Hoong Chuin Lau

Research Collection School Of Computing and Information Systems

This paper presents a generalization of the Orienteering Problem, the Time-Dependent Orienteering Problem (TDOP) which is based on the real-life application of providing automatic tour guidance to a large leisure facility such as a theme park. In this problem, the travel time between two nodes depends on the time when the trip starts. We formulate the problem as an integer linear programming (ILP) model. We then develop various heuristics in a step by step fashion: greedy construction, local search and variable neighborhood descent, and two versions of iterated local search. The proposed metaheuristics were tested on modified benchmark instances, randomly …


Direct Neighbor Search, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang Aug 2014

Direct Neighbor Search, Jilian Zhang, Kyriakos Mouratidis, Hwee Hwa Pang

Research Collection School Of Computing and Information Systems

In this paper we study a novel query type, called direct neighbor query. Two objects in a dataset are direct neighbors (DNs) if a window selection may exclusively retrieve these two objects. Given a source object, a DN search computes all of its direct neighbors in the dataset. The DNs define a new type of affinity that differs from existing formulations (e.g., nearest neighbors, nearest surrounders, reverse nearest neighbors, etc.) and finds application in domains where user interests are expressed in the form of windows, i.e., multi-attribute range selections. Drawing on key properties of the DN relationship, we develop an …


Predicting The Popularity Of Web 2.0 Items Based On User Comments, Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, Kazunari Sugiyama Jul 2014

Predicting The Popularity Of Web 2.0 Items Based On User Comments, Xiangnan He, Ming Gao, Min-Yen Kan, Yiqun Liu, Kazunari Sugiyama

Research Collection School Of Computing and Information Systems

In the current Web 2.0 era, the popularity of Web resources fluctuates ephemerally, based on trends and social interest. As a result, content-based relevance signals are insufficient to meet users' constantly evolving information needs in searching for Web 2.0 items. Incorporating future popularity into ranking is one way to counter this. However, predicting popularity as a third party (as in the case of general search engines) is difficult in practice, due to their limited access to item view histories. To enable popularity prediction externally without excessive crawling, we propose an alternative solution by leveraging user comments, which are more accessible …


Understanding The Paradigm Shift To Computational Social Science In The Presence Of Big Data, Ray M. Chang, Robert J. Kauffman, Young Ok Kwon Jul 2014

Understanding The Paradigm Shift To Computational Social Science In The Presence Of Big Data, Ray M. Chang, Robert J. Kauffman, Young Ok Kwon

Research Collection School Of Computing and Information Systems

The era of big data has created new opportunities for researchers to achieve high relevance and impact amid changes and transformations in how we study social science phenomena. With the emergence of new data collection technologies, advanced data mining and analytics support, there seems to be fundamental changes that are occurring with the research questions we can ask, and the research methods we can apply. The contexts include social networks and blogs, political discourse, corporate announcements, digital journalism, mobile telephony, home entertainment, online gaming, financial services, online shopping, social advertising, and social commerce. The changing costs of data collection and …


Manifold Learning For Jointly Modeling Topic And Visualization, Tuan Minh Van Le, Hady W. Lauw Jul 2014

Manifold Learning For Jointly Modeling Topic And Visualization, Tuan Minh Van Le, Hady W. Lauw

Research Collection School Of Computing and Information Systems

Classical approaches to visualization directly reduce a document's high-dimensional representation into visualizable two or three dimensions, using techniques such as multidimensional scaling. More recent approaches consider an intermediate representation in topic space, between word space and visualization space, which preserves the semantics by topic modeling. We call the latter semantic visualization problem, as it seeks to jointly model topic and visualization. While previous approaches aim to preserve the global consistency, they do not consider the local consistency in terms of the intrinsic geometric structure of the document manifold. We therefore propose an unsupervised probabilistic model, called Semafore, which aims to …


Learning Relative Similarity By Stochastic Dual Coordinate Ascent, Pengcheng Wu, Ding Yi, Peilin Zhao, Chunyan Miao, Steven C. H. Hoi Jul 2014

Learning Relative Similarity By Stochastic Dual Coordinate Ascent, Pengcheng Wu, Ding Yi, Peilin Zhao, Chunyan Miao, Steven C. H. Hoi

Research Collection School Of Computing and Information Systems

Learning relative similarity from pairwise instances is an important problem in machine learning and has a wide range of applications. Despite being studied for years, some existing methods solved by Stochastic Gradient Descent (SGD) techniques generally suffer from slow convergence. In this paper, we investigate the application of Stochastic Dual Coordinate Ascent (SDCA) technique to tackle the optimization task of relative similarity learning by extending from vector to matrix parameters. Theoretically, we prove the optimal linear convergence rate for the proposed SDCA algorithm, beating the well-known sublinear convergence rate by the previous best metric learning algorithms. Empirically, we conduct extensive …


Soml: Sparse Online Metric Learning With Application To Image Retrieval, Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Ji Wan, Jintao Li Jul 2014

Soml: Sparse Online Metric Learning With Application To Image Retrieval, Xingyu Gao, Steven C. H. Hoi, Yongdong Zhang, Ji Wan, Jintao Li

Research Collection School Of Computing and Information Systems

Image similarity search plays a key role in many multimedia applications, where multimedia data (such as images and videos) are usually represented in high-dimensional feature space. In this paper, we propose a novel Sparse Online Metric Learning (SOML) scheme for learning sparse distance functions from large-scale high-dimensional data and explore its application to image retrieval. In contrast to many existing distance metric learning algorithms that are often designed for low-dimensional data, the proposed algorithms are able to learn sparse distance metrics from high-dimensional data in an efficient and scalable manner. Our experimental results show that the proposed method achieves better …


Air Indexing For On-Demand Xml Data Broadcast, Weiwei Sun, Rongrui Qin, Jinjin Wu, Baihua Zheng Jun 2014

Air Indexing For On-Demand Xml Data Broadcast, Weiwei Sun, Rongrui Qin, Jinjin Wu, Baihua Zheng

Research Collection School Of Computing and Information Systems

XML data broadcast is an efficient way to disseminate semi-structured information in wireless mobile environments. In this paper, we propose a novel two-tier index structure to facilitate the access of XML document in an on-demand broadcast system. It provides the clients with an overall image of all the XML documents available at the server side and hence enables the clients to locate complete result sets accordingly. A pruning strategy is developed to cut down the index size and a two-tier structure is proposed to further remove any redundant information. In addition, two index distribution strategies, namely naive distribution and partial …


On Efficient Reverse Skyline Query Processing, Yunjun Gao, Qing Liu, Baihua Zheng, Gang Chen Jun 2014

On Efficient Reverse Skyline Query Processing, Yunjun Gao, Qing Liu, Baihua Zheng, Gang Chen

Research Collection School Of Computing and Information Systems

Given a D-dimensional data set P and a query point q, a reverse skyline query (RSQ) returns all the data objects in P whose dynamic skyline contains q. It is important for many real life applications such as business planning and environmental monitoring. Currently, the state-of-the-art algorithm for answering the RSQ is the reverse skyline using skyline approximations (RSSA) algorithm, which is based on the precomputed approximations of the skylines. Although RSSA has some desirable features, e.g., applicability to arbitrary data distributions and dimensions, it needs for multiple accesses of the same nodes, incurring redundant I/O and CPU costs. In …


Graph-Based Semi-Supervised Learning: Realizing Pointwise Smoothness Probabilistically, Yuan Fang, Kevin Chen-Chuan Chang, Hady W. Lauw Jun 2014

Graph-Based Semi-Supervised Learning: Realizing Pointwise Smoothness Probabilistically, Yuan Fang, Kevin Chen-Chuan Chang, Hady W. Lauw

Research Collection School Of Computing and Information Systems

As the central notion in semi-supervised learning, smoothness is often realized on a graph representation of the data. In this paper, we study two complementary dimensions of smoothness: its pointwise nature and probabilistic modeling. While no existing graph-based work exploits them in conjunction, we encompass both in a novel framework of Probabilistic Graph-based Pointwise Smoothness (PGP), building upon two foundational models of data closeness and label coupling. This new form of smoothness axiomatizes a set of probability constraints, which ultimately enables class prediction. Theoretically, we provide an error and robustness analysis of PGP. Empirically, we conduct extensive experiments to show …


Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang Jun 2014

Ar-Miner: Mining Informative Reviews For Developers From Mobile App Marketplace, Ning Chen, Jialiu Lin, Steven C. H. Hoi, Xiaokui Xiao, Boshen Zhang

Research Collection School Of Computing and Information Systems

With the popularity of smartphones and mobile devices, mobile application (a.k.a. “app”) markets have been growing exponentially in terms of number of users and downloads. App developers spend considerable effort on collecting and exploiting user feedback to improve user satisfaction, but suffer from the absence of effective user review analytics tools. To facilitate mobile app developers discover the most “informative” user reviews from a large and rapidly increasing pool of user reviews, we present “AR-Miner” — a novel computational framework for App Review Mining, which performs comprehensive analytics from raw user reviews by (i) first extracting informative user reviews by …


How Many Eyeballs Does A Bug Need? An Empirical Validation Of Linus' Law, Subhajit Datta, Proshanta Sarkar, Sutirtha Das, Sonu Sreshtha, Prasanth Lade, Subhashis Majumder May 2014

How Many Eyeballs Does A Bug Need? An Empirical Validation Of Linus' Law, Subhajit Datta, Proshanta Sarkar, Sutirtha Das, Sonu Sreshtha, Prasanth Lade, Subhashis Majumder

Research Collection School Of Computing and Information Systems

Linus’ Law reflects on a key characteristic of open source software development: developers’ tendency to closely work together in the bug resolution process. In this paper we empirically examine Linus’ Law using a data-set of 1,000+ Android bugs, owned by 70+ developers. Our results indicate that encouraging developers to work closely with one another has nuanced implications; while one form of contact may help reduce bug resolution time, another form can have quite the opposite effect. We present statistically significant evidence in support of our results and discuss their relevance at the individual and organizational levels.