Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 54

Full-Text Articles in Physical Sciences and Mathematics

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia Dec 2011

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia

David LO

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot …


Nort: Runtime Anomaly-Based Monitoring Of Malicious Behavior For Windows, Narcisa Andrea Milea, Siau-Cheng Khoo, David Lo, Cristi Pop Dec 2011

Nort: Runtime Anomaly-Based Monitoring Of Malicious Behavior For Windows, Narcisa Andrea Milea, Siau-Cheng Khoo, David Lo, Cristi Pop

David LO

Protecting running programs from exploits has been the focus of many host-based intrusion detection systems. To this end various formal methods have been developed that either require manual construction of attack signatures or modelling of normal program behavior to detect exploits. In terms of the ability to discover new attacks before the infection spreads, the former approach has been found to be lacking in flexibility. Consequently, in this paper, we present an anomaly monitoring system, NORT, that verifies on-the-fly whether running programs comply to their expected normal behavior. The model of normal behavior is based on a rich set of …


Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2011

Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Source code contains textual, structural, and semantic information, which can all be leveraged for effective search. Some studies have proposed semantic code search where users can specify query topics in a natural language. Other studies can search through system dependence graphs. In this paper, we propose a semantic dependence search engine that integrates both kinds of techniques and can retrieve code snippets based on expressive user queries describing both topics and dependencies. Users can specify their search targets in a free form format describing desired topics (i.e., high-level semantic or functionality of the target code); a specialized graph query language …


Search-Based Fault Localization, Shaowei Wang, David Lo, Lingxiao Jiang, - Lucia, Hoong Chuin Lau Dec 2011

Search-Based Fault Localization, Shaowei Wang, David Lo, Lingxiao Jiang, - Lucia, Hoong Chuin Lau

David LO

Many spectrum-based fault localization measures have been proposed in the literature. However, no single fault localization measure completely outperforms others: a measure which is more accurate in localizing some bugs in some programs is less accurate in localizing other bugs in other programs. This paper proposes to compose existing spectrum-based fault localization measures into an improved measure. We model the composition of various measures as an optimization problem and present a search-based approach to explore the space of many possible compositions and output a heuristically near optimal composite measure. We employ two search-based strategies including genetic algorithm and simulated annealing …


Towards More Accurate Retrieval Of Duplicate Bug Reports, Chengnian Sun, David Lo, Siau-Cheng Khoo, Jing Jiang Dec 2011

Towards More Accurate Retrieval Of Duplicate Bug Reports, Chengnian Sun, David Lo, Siau-Cheng Khoo, Jing Jiang

David LO

In a bug tracking system, different testers or users may submit multiple reports on the same bugs, referred to as duplicates, which may cost extra maintenance efforts in triaging and fixing bugs. In order to identify such duplicates accurately, in this paper we propose a retrieval function (REP) to measure the similarity between two bug reports. It fully utilizes the information available in a bug report including not only the similarity of textual content in summary and description fields, but also similarity of non-textual fields such as product, component, version, etc. For more accurate measurement of textual similarity, we extend …


Bug Signature Minimization And Fusion, David Lo, Hong Cheng, Xiaoyin Wang Dec 2011

Bug Signature Minimization And Fusion, David Lo, Hong Cheng, Xiaoyin Wang

David LO

Debugging is a time-consuming activity. To help in debugging, many approaches have been proposed to pinpoint the location of errors given labeled failures and correct executions. While such approaches have been shown to be accurate, at times the location alone is not sufficient in helping programmers understand why the bug happens and how to fix it. Furthermore, a single location might not be powerful enough to discriminate failures from correct executions. To address the above challenges, there have been recent studies on extracting bug signatures which are composed of multiple locations appearing together in a particular order signifying an occurrence …


Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang Dec 2011

Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang

David LO

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to …


Mining Top-K Large Structural Patterns In A Massive Network, Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, Philip S. Yu Dec 2011

Mining Top-K Large Structural Patterns In A Massive Network, Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, Philip S. Yu

David LO

With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1 − ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a …


Recommending People In Developers' Collaboration Network, Didi Surian, Nian Liu, David Lo, Hanghang Tong, Ee Peng Lim, Christos Faloutsos Dec 2011

Recommending People In Developers' Collaboration Network, Didi Surian, Nian Liu, David Lo, Hanghang Tong, Ee Peng Lim, Christos Faloutsos

David LO

Many software developments involve collaborations of developers across the globe. This is true for both open-source and closed-source development efforts. Developers collaborate on different projects of various types. As with any other teamwork endeavors, finding compatibility among members in a development team is helpful towards the realization of the team’s goal. Compatible members tend to share similar programming style and naming strategy, communicate well with one another, etc. However, finding the right person to work with is not an easy task. In this work, we extract information available from Sourceforge.Net, the largest database of open source software, and build developer …


Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang Dec 2011

Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang

David LO

Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel …


Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz Dec 2011

Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz

David LO

Specification mining methods are used to extract candidate specifications from system execution traces. A major challenge for specification mining is succinctness. That is, in addition to the soundness, completeness, and scalable performance of the specification mining method, one is interested in producing a succinct result, which conveys a lot of information about the system under investigation but uses a short, machine and human-readable representation. In this paper we address the succinctness challenge in the context of scenario-based specification mining, whose target formalism is live sequence charts (LSC), an expressive extension of classical sequence diagrams. We do this by adapting three …


Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang Dec 2011

Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang

David LO

Layered architecture prescribes a good principle for separating concerns to make systems more maintainable. One example of such layered architectures is the separation of classes into three groups: Boundary, Control, and Entity, which are referred to as the three analysis class stereotypes in UML. Classes of different stereotypes are interacting with one another, when properly designed, the overall interaction would be maintainable, flexible, and robust. On the other hand, poor design would result in less maintainable system that is prone to errors. In many software projects, the stereotypes of classes are often missing, thus detection of design flaws becomes non-trivial. …


Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong Nov 2011

Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong

David LO

A sequential rule expresses a relationship between two series of events happening one after another. Sequential rules are potentially useful for analyzing data in sequential format, ranging from purchase histories, network logs and program execution traces. In this work, we investigate and propose a syntactic characterization of a non-redundant set of sequential rules built upon past work on compact set of representative patterns. A rule is redundant if it can be inferred from another rule having the same support and confidence. When using the set of mined rules as a composite filter, replacing a full set of rules with a …


Mining Temporal Rules From Program Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Temporal Rules From Program Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Specification mining is a process of extracting specifications, often from program execution traces. These specifications can in turn be used to aid program understanding, monitoring and verification. There are a number of dynamic-analysis-based specification mining tools in the literature, however none so far extract past time temporal expressions in the form of rules stating: "whenever a series of events occurs, previously another series of events has happened". Rules of this format are commonly found in practice and useful for various purposes. Most rule-based specification mining tools only mine future-time temporal expression. Many past-time temporal rules like "whenever a resource is …


Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Studies have shown that program comprehension takes up to 45% of software development costs. Such high costs are caused by the lack-of documented specification and further aggravated by the phenomenon of software evolution. There is a need for automated tools to extract specifications to aid program comprehension. In this paper, a novel technique to efficiently mine common software temporal patterns from traces is proposed. These patterns shed light on program behaviors, and are termed iterative patterns. They capture unique characteristic of software traces, typically not found in arbitrary sequences. Specifically, due to loops, interesting iterative patterns can occur multiple times …


Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan Nov 2011

Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan

David LO

Bug localization has attracted a lot of attention recently. Most existing methods focus on pinpointing a single statement or function call which is very likely to contain bugs. Although such methods could be very accurate, it is usually very hard for developers to understand the context of the bug, given each bug location in isolation. In this study, we propose to model software executions with graphs at two levels of granularity: methods and basic blocks. An individual node represents a method or basic block and an edge represents a method call, method return or transition (at the method or basic …


Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo Nov 2011

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo

David LO

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should …


Mining Software Specifications, David Lo, Siau-Cheng Khoo Nov 2011

Mining Software Specifications, David Lo, Siau-Cheng Khoo

David LO

No abstract provided.


Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo Nov 2011

Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo

David LO

Model checking is a major approach in ensuring software correctness. It verifies a model converted from code against some formal properties. However, difficulties and programmers ’ reluctance to formalize formal properties have been some hurdles to its widespread industrial adoption. Also, with the advent of commercial off-the-shelf (COTS) components provided by third party vendors, model checking is further challenged as often only a binary version of the code is provided by vendors. Interestingly, latest instrumentation tools like PIN and Valgrind have enable execution traces to be collected dynamically from a running program. In this preliminary study, we investigate what can …


Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu Nov 2011

Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu

David LO

In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common dependence conditions but existing code searching techniques can hardly handle dependence relations satisfactorily. In this paper, we propose a technique that enables developers to make queries involving dependence conditions and textual conditions on the system dependence graph …


Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo Nov 2011

Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo

David LO

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely …


Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

No abstract provided.


Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim Nov 2011

Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim

David LO

During social interactions in a community, there are often sub-communities that behave in opposite manner. These antagonistic sub-communities could represent groups of people with opposite tastes, factions within a community distrusting one another, etc. Taking as input a set of interactions within a community, we develop a novel pattern mining approach that extracts for a set of antagonistic sub-communities. In particular, based on a set of user specified thresholds, we extract a set of pairs of sub-communities that behave in opposite ways with one another. To prevent a blow up in these set of pairs, we focus on extracting a …


Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze Nov 2011

Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze

David LO

Many testing and analysis techniques use finite state models to validate and verify the quality of software systems. Since the specification of such models is complex and time-consuming, researchers defined several techniques to extract finite state models from code and traces. Automatically generating models requires much less effort than designing them, and thus eases the verification and validation of large software systems. However, when models are inferred automatically, the precision of the mining process is critical. Behavioral models mined with imprecise processes can include many spurious behaviors, and can thus compromise the results of testing and analysis techniques that use …


Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei Nov 2011

Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei

David LO

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by upto 58%


Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form "whenever a series of precedent events occurs, eventually a series of consequent events occurs". Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the "window" …


Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li Nov 2011

Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li

David LO

We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called “T-OLAP”), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo Nov 2011

Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo

David LO

Dynamic specification mining involves discovering software behavior from traces for the purpose of program comprehension and bug detection. However, in concurrent/distributed programs, the inherent partial order relationships among events occurring across processes pose a big challenge to specification mining. In this paper, we propose a framework for mining partial orders so as to understand concurrent program behavior. Our miner takes in a set of concurrent program traces, and produces a message sequence graph (MSG) to represent the concurrent program behavior. An MSG represents a graph where the nodes of the graph are partial orders, represented as Message Sequence Charts. Mining …


Mining Modal Scenarios From Execution Traces, David Lo, Shahar Maoz, Siau-Cheng Khoo Nov 2011

Mining Modal Scenarios From Execution Traces, David Lo, Shahar Maoz, Siau-Cheng Khoo

David LO

Specification mining is a dynamic analysis process aimed at automatically inferring suggested specifications of a program from its execution traces. We describe a method, a framework, and a tool, for mining inter-object scenario-based specifications in the form of a UML2-compliant variant of Damm and Harel's Live Sequence Charts (LSC), which extends the classical partial order semantics of sequence diagrams with temporal liveness and symbolic class level lifelines, in order to generate compact and expressive specifications. Moreover, we use previous research work and tools developed for LSC to visualize, analyze, manipulate, test, and thus evaluate the scenario-based specifications we mine. Our …