Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 17 of 17

Full-Text Articles in Physical Sciences and Mathematics

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia Dec 2011

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia

David LO

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot …


Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2011

Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Source code contains textual, structural, and semantic information, which can all be leveraged for effective search. Some studies have proposed semantic code search where users can specify query topics in a natural language. Other studies can search through system dependence graphs. In this paper, we propose a semantic dependence search engine that integrates both kinds of techniques and can retrieve code snippets based on expressive user queries describing both topics and dependencies. Users can specify their search targets in a free form format describing desired topics (i.e., high-level semantic or functionality of the target code); a specialized graph query language …


Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang Dec 2011

Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang

David LO

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to …


Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang Dec 2011

Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang

David LO

Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel …


Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang Dec 2011

Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang

David LO

Layered architecture prescribes a good principle for separating concerns to make systems more maintainable. One example of such layered architectures is the separation of classes into three groups: Boundary, Control, and Entity, which are referred to as the three analysis class stereotypes in UML. Classes of different stereotypes are interacting with one another, when properly designed, the overall interaction would be maintainable, flexible, and robust. On the other hand, poor design would result in less maintainable system that is prone to errors. In many software projects, the stereotypes of classes are often missing, thus detection of design flaws becomes non-trivial. …


Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan Nov 2011

Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan

David LO

Bug localization has attracted a lot of attention recently. Most existing methods focus on pinpointing a single statement or function call which is very likely to contain bugs. Although such methods could be very accurate, it is usually very hard for developers to understand the context of the bug, given each bug location in isolation. In this study, we propose to model software executions with graphs at two levels of granularity: methods and basic blocks. An individual node represents a method or basic block and an edge represents a method call, method return or transition (at the method or basic …


Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim Nov 2011

Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim

David LO

During social interactions in a community, there are often sub-communities that behave in opposite manner. These antagonistic sub-communities could represent groups of people with opposite tastes, factions within a community distrusting one another, etc. Taking as input a set of interactions within a community, we develop a novel pattern mining approach that extracts for a set of antagonistic sub-communities. In particular, based on a set of user specified thresholds, we extract a set of pairs of sub-communities that behave in opposite ways with one another. To prevent a blow up in these set of pairs, we focus on extracting a …


Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze Nov 2011

Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze

David LO

Many testing and analysis techniques use finite state models to validate and verify the quality of software systems. Since the specification of such models is complex and time-consuming, researchers defined several techniques to extract finite state models from code and traces. Automatically generating models requires much less effort than designing them, and thus eases the verification and validation of large software systems. However, when models are inferred automatically, the precision of the mining process is critical. Behavioral models mined with imprecise processes can include many spurious behaviors, and can thus compromise the results of testing and analysis techniques that use …


Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li Nov 2011

Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li

David LO

We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called “T-OLAP”), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Efficient Mining Of Closed Repetitive Gapped Subsequences From A Sequence Database, Bolin Ding, David Lo, Jiawei Han, Siau-Cheng Khoo Nov 2011

Efficient Mining Of Closed Repetitive Gapped Subsequences From A Sequence Database, Bolin Ding, David Lo, Jiawei Han, Siau-Cheng Khoo

David LO

There is a huge wealth of sequence data available, for example, customer purchase histories, program execution traces, DNA, and protein sequences. Analyzing this wealth of data to mine important knowledge is certainly a worthwhile goal. In this paper, as a step forward to analyzing patterns in sequences, we introduce the problem of mining closed repetitive gapped subsequences and propose efficient solutions. Given a database of sequences where each sequence is an ordered list of events, the pattern we would like to mine is called repetitive gapped subsequence, which is a subsequence (possibly with gaps between two successive events within it) …


Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng Nov 2011

Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng

David LO

Link structures are important patterns one looks out for when modeling and analyzing social networks. In this paper, we propose the task of mining interesting Link Formation rules (LF-rules) containing link structures known as Link Formation patterns (LF-patterns). LF-patterns capture various dyadic and/or triadic structures among groups of nodes, while LF-rules capture the formation of a new link from a focal node to another node as a postcondition of existing connections between the two nodes. We devise a novel LF-rule mining algorithm, known as LFR-Miner, based on frequent subgraph mining for our task. In addition to using a support-confidence framework …


Mining Hierarchical Scenario-Based Specifications, David Lo, Shahar Maoz Nov 2011

Mining Hierarchical Scenario-Based Specifications, David Lo, Shahar Maoz

David LO

Scalability over long traces, as well as comprehensibility and expressivity of results, are major challenges for dynamic analysis approaches to specification mining. In this work we present a novel use of object hierarchies over traces of inter-object method calls, as an abstraction/refinement mechanism that enables user-guided, top-down or bottom-up mining of layered scenario-based specifications, broken down by hierarchies embedded in the system under investigation. We do this using data mining methods that provide statistically significant sound and complete results modulo user-defined thresholds, in the context of Damm and Harel’s live sequence charts (LSC); a visual, modal, scenario-based, inter-object language. Thus, …


Mining Quantified Temporal Rules: Formalism, Algorithms, And Evaluation, David Lo, Ganesan Ramalingam, Venkatesh-Prasad Ranganath, Kapil Vaswani Nov 2011

Mining Quantified Temporal Rules: Formalism, Algorithms, And Evaluation, David Lo, Ganesan Ramalingam, Venkatesh-Prasad Ranganath, Kapil Vaswani

David LO

Libraries usually impose constraints on how clients should use them. Often these constraints are not well-documented. In this paper, we address the problem of recovering such constraints automatically, a problem referred to as specification mining. Given some client programs that use a given library, we identify constraints on the library usage that are (almost) satisfied by the given set of clients.The class of rules we target for mining combines simple binary temporal operators with state predicates (involving equality constraints) and quantification. This is a simple yet expressive subclass of temporal properties that allows us to capture many common API usage …


Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz Nov 2011

Hierarchical Inter-Object Traces For Specification Mining, David Lo, Shahar Maoz

David LO

Major challenges of dynamic analysis approaches to specification mining include scalability over long traces as well as comprehensibility and expressivity of results. We present a novel use of object hierarchies over inter-object traces as an abstraction/refinement mechanism enabling scalable, incremental, top-down mining of scenario-based specifications.


Classification Of Software Behaviors For Failure Detection: A Discriminative Pattern Mining Approach, David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, Chengnian Sun Nov 2011

Classification Of Software Behaviors For Failure Detection: A Discriminative Pattern Mining Approach, David Lo, Hong Cheng, Jiawei Han, Siau-Cheng Khoo, Chengnian Sun

David LO

Software is a ubiquitous component of our daily life. We often depend on the correct working of software systems. Due to the difficulty and complexity of software systems, bugs and anomalies are prevalent. Bugs have caused billions of dollars loss, in addition to privacy and security threats. In this work, we address software reliability issues by proposing a novel method to classify software behaviors based on past history or runs. With the technique, it is possible to generalize past known errors and mistakes to capture failures and anomalies. Our technique first mines a set of discriminative features capturing repetitive series …


Mining Past-Time Temporal Rules From Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules From Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Specification mining is a process of extracting specifications, often from program execution traces. These specifications can in turn be used to aid program understanding, monitoring and verification. There are a number of dynamic-analysis-based specification mining tools in the literature, however none so far extract past time temporal expressions in the form of rules stating: whenever a series of events occurs, previously another series of events has happened. Rules of this format are commonly found in practice and useful for various purposes. Most rule-based specification mining tools only mine future-time temporal expression. Many past-time temporal rules like whenever a resource is …