Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 30 of 72

Full-Text Articles in Physical Sciences and Mathematics

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia Dec 2011

Kb-Anonymity: A Model For Anonymized Behavior-Preserving Test And Debugging Data, Aditya Budi, David Lo, Lingxiao Jiang, Lucia Lucia

David LO

It is often very expensive and practically infeasible to generate test cases that can exercise all possible program states in a program. This is especially true for a medium or large industrial system. In practice, industrial clients of the system often have a set of input data collected either before the system is built or after the deployment of a previous version of the system. Such data are highly valuable as they represent the operations that matter in a client's daily business and may be used to extensively test the system. However, such data often carries sensitive information and cannot …


Nort: Runtime Anomaly-Based Monitoring Of Malicious Behavior For Windows, Narcisa Andrea Milea, Siau-Cheng Khoo, David Lo, Cristi Pop Dec 2011

Nort: Runtime Anomaly-Based Monitoring Of Malicious Behavior For Windows, Narcisa Andrea Milea, Siau-Cheng Khoo, David Lo, Cristi Pop

David LO

Protecting running programs from exploits has been the focus of many host-based intrusion detection systems. To this end various formal methods have been developed that either require manual construction of attack signatures or modelling of normal program behavior to detect exploits. In terms of the ability to discover new attacks before the infection spreads, the former approach has been found to be lacking in flexibility. Consequently, in this paper, we present an anomaly monitoring system, NORT, that verifies on-the-fly whether running programs comply to their expected normal behavior. The model of normal behavior is based on a rich set of …


Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang Dec 2011

Code Search Via Topic-Enriched Dependence Graph Matching, Shaowei Wang, David Lo, Lingxiao Jiang

David LO

Source code contains textual, structural, and semantic information, which can all be leveraged for effective search. Some studies have proposed semantic code search where users can specify query topics in a natural language. Other studies can search through system dependence graphs. In this paper, we propose a semantic dependence search engine that integrates both kinds of techniques and can retrieve code snippets based on expressive user queries describing both topics and dependencies. Users can specify their search targets in a free form format describing desired topics (i.e., high-level semantic or functionality of the target code); a specialized graph query language …


Search-Based Fault Localization, Shaowei Wang, David Lo, Lingxiao Jiang, - Lucia, Hoong Chuin Lau Dec 2011

Search-Based Fault Localization, Shaowei Wang, David Lo, Lingxiao Jiang, - Lucia, Hoong Chuin Lau

David LO

Many spectrum-based fault localization measures have been proposed in the literature. However, no single fault localization measure completely outperforms others: a measure which is more accurate in localizing some bugs in some programs is less accurate in localizing other bugs in other programs. This paper proposes to compose existing spectrum-based fault localization measures into an improved measure. We model the composition of various measures as an optimization problem and present a search-based approach to explore the space of many possible compositions and output a heuristically near optimal composite measure. We employ two search-based strategies including genetic algorithm and simulated annealing …


Towards More Accurate Retrieval Of Duplicate Bug Reports, Chengnian Sun, David Lo, Siau-Cheng Khoo, Jing Jiang Dec 2011

Towards More Accurate Retrieval Of Duplicate Bug Reports, Chengnian Sun, David Lo, Siau-Cheng Khoo, Jing Jiang

David LO

In a bug tracking system, different testers or users may submit multiple reports on the same bugs, referred to as duplicates, which may cost extra maintenance efforts in triaging and fixing bugs. In order to identify such duplicates accurately, in this paper we propose a retrieval function (REP) to measure the similarity between two bug reports. It fully utilizes the information available in a bug report including not only the similarity of textual content in summary and description fields, but also similarity of non-textual fields such as product, component, version, etc. For more accurate measurement of textual similarity, we extend …


Bug Signature Minimization And Fusion, David Lo, Hong Cheng, Xiaoyin Wang Dec 2011

Bug Signature Minimization And Fusion, David Lo, Hong Cheng, Xiaoyin Wang

David LO

Debugging is a time-consuming activity. To help in debugging, many approaches have been proposed to pinpoint the location of errors given labeled failures and correct executions. While such approaches have been shown to be accurate, at times the location alone is not sufficient in helping programmers understand why the bug happens and how to fix it. Furthermore, a single location might not be powerful enough to discriminate failures from correct executions. To address the above challenges, there have been recent studies on extracting bug signatures which are composed of multiple locations appearing together in a particular order signifying an occurrence …


Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang Dec 2011

Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang

David LO

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to …


Recommending People In Developers' Collaboration Network, Didi Surian, Nian Liu, David Lo, Hanghang Tong, Ee Peng Lim, Christos Faloutsos Dec 2011

Recommending People In Developers' Collaboration Network, Didi Surian, Nian Liu, David Lo, Hanghang Tong, Ee Peng Lim, Christos Faloutsos

David LO

Many software developments involve collaborations of developers across the globe. This is true for both open-source and closed-source development efforts. Developers collaborate on different projects of various types. As with any other teamwork endeavors, finding compatibility among members in a development team is helpful towards the realization of the team’s goal. Compatible members tend to share similar programming style and naming strategy, communicate well with one another, etc. However, finding the right person to work with is not an easy task. In this work, we extract information available from Sourceforge.Net, the largest database of open source software, and build developer …


Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang Dec 2011

Concern Localization Using Information Retrieval: An Empirical Study On Linux Kernel, Shaowei Wang, David Lo, Zhenchang Xing, Lingxiao Jiang

David LO

Many software maintenance activities need to find code units (functions, files, etc.) that implement a certain concern (features, bugs, etc.). To facilitate such activities, many approaches have been proposed to automatically link code units with concerns described in natural languages, which are termed as concern localization and often employ Information Retrieval (IR) techniques. There has not been a study that evaluates and compares the effectiveness of latest IR techniques on a large dataset. This study fills this gap by investigating ten IR techniques, some of which are new and have not been used for concern localization, on a Linux kernel …


Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz Dec 2011

Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz

David LO

Specification mining methods are used to extract candidate specifications from system execution traces. A major challenge for specification mining is succinctness. That is, in addition to the soundness, completeness, and scalable performance of the specification mining method, one is interested in producing a succinct result, which conveys a lot of information about the system under investigation but uses a short, machine and human-readable representation. In this paper we address the succinctness challenge in the context of scenario-based specification mining, whose target formalism is live sequence charts (LSC), an expressive extension of classical sequence diagrams. We do this by adapting three …


Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang Dec 2011

Automated Detection Of Likely Design Flaws In Layered Architectures, Aditya Budi, - Lucia, David Lo, Lingxiao Jiang, Shaowei Wang

David LO

Layered architecture prescribes a good principle for separating concerns to make systems more maintainable. One example of such layered architectures is the separation of classes into three groups: Boundary, Control, and Entity, which are referred to as the three analysis class stereotypes in UML. Classes of different stereotypes are interacting with one another, when properly designed, the overall interaction would be maintainable, flexible, and robust. On the other hand, poor design would result in less maintainable system that is prone to errors. In many software projects, the stereotypes of classes are often missing, thus detection of design flaws becomes non-trivial. …


A Multi-Platform Application Suite For Enhancing South Asian Language Pedagogy, Tao Bai, Christopher K. Chung, Konstantin Läufer, Daisy Rockwell, George K. Thiruvathukal Nov 2011

A Multi-Platform Application Suite For Enhancing South Asian Language Pedagogy, Tao Bai, Christopher K. Chung, Konstantin Läufer, Daisy Rockwell, George K. Thiruvathukal

George K. Thiruvathukal

This interdisciplinary project explores the potential for handheld/wireless (H/W) technology in the context of language education within and beyond the classroom. Specifically, we have designed and implemented a suite of multi-platform (desktop/laptop, handheld, and browser) applications to enhance the teaching of South Asian languages such as Hindi-Urdu. Such languages are very difficult to learn, let alone write, and H/W devices (with their handwriting/drawing capabilities) can play a significant role in overcoming the learning curve. The initial application suite includes a character/word tracer, a word splitter/joiner, a smart flashcard with audio, contextual augmented stories for reading comprehension, and a poetic metronome. …


Enhancing The Cs Curriculum With With Aspect-Oriented Software Development (Aosd) And Early Experience, Konstantin Läufer, George K. Thiruvathukal, Tzilla Elrad Nov 2011

Enhancing The Cs Curriculum With With Aspect-Oriented Software Development (Aosd) And Early Experience, Konstantin Läufer, George K. Thiruvathukal, Tzilla Elrad

George K. Thiruvathukal

Aspect-oriented software development (AOSD) is evolving as an important step beyond existing software development approaches such as object-oriented development. An aspect is a module that captures a crosscutting concern, behavior that cuts across different units of abstraction in a software application; expressed as a module, such behavior can be enabled and disabled transparently and non-invasively, without changing the application code itself. Increasing industry demand for expertise in AOSD gives rise to the pedagogical challenge of covering this methodology and its foundations in the computer science curriculum. We present our curricular initiative to incorporate a novel course in AOSD in the …


The Extreme Software Development Series: An Open Curricular Framework For Applied Capstone Courses, Konstantin Läufer, George K. Thiruvathukal Nov 2011

The Extreme Software Development Series: An Open Curricular Framework For Applied Capstone Courses, Konstantin Läufer, George K. Thiruvathukal

George K. Thiruvathukal

We describe an open, flexible curricular framework for offering a collection of advanced undergraduate and graduate courses in software development. The courses offered within this framework are further unified by combining solid foundations with current technology and play the role of capstone courses in a modern software development track. Our initiative has been very successful with all stakeholders involved.


Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong Nov 2011

Non-Redundant Sequential Rules,Theory And Algorithm, David Lo, Siau-Cheng Khoo, Limsoon Wong

David LO

A sequential rule expresses a relationship between two series of events happening one after another. Sequential rules are potentially useful for analyzing data in sequential format, ranging from purchase histories, network logs and program execution traces. In this work, we investigate and propose a syntactic characterization of a non-redundant set of sequential rules built upon past work on compact set of representative patterns. A rule is redundant if it can be inferred from another rule having the same support and confidence. When using the set of mined rules as a composite filter, replacing a full set of rules with a …


Mining Temporal Rules From Program Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Temporal Rules From Program Execution Traces, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Specification mining is a process of extracting specifications, often from program execution traces. These specifications can in turn be used to aid program understanding, monitoring and verification. There are a number of dynamic-analysis-based specification mining tools in the literature, however none so far extract past time temporal expressions in the form of rules stating: "whenever a series of events occurs, previously another series of events has happened". Rules of this format are commonly found in practice and useful for various purposes. Most rule-based specification mining tools only mine future-time temporal expression. Many past-time temporal rules like "whenever a resource is …


Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Studies have shown that program comprehension takes up to 45% of software development costs. Such high costs are caused by the lack-of documented specification and further aggravated by the phenomenon of software evolution. There is a need for automated tools to extract specifications to aid program comprehension. In this paper, a novel technique to efficiently mine common software temporal patterns from traces is proposed. These patterns shed light on program behaviors, and are termed iterative patterns. They capture unique characteristic of software traces, typically not found in arbitrary sequences. Specifically, due to loops, interesting iterative patterns can occur multiple times …


Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan Nov 2011

Identifying Bug Signatures Using Discriminative Graph Mining, Hong Cheng, David Lo, Yang Zhou, Xiaoyin Wang, Xifeng Yan

David LO

Bug localization has attracted a lot of attention recently. Most existing methods focus on pinpointing a single statement or function call which is very likely to contain bugs. Although such methods could be very accurate, it is usually very hard for developers to understand the context of the bug, given each bug location in isolation. In this study, we propose to model software executions with graphs at two levels of granularity: methods and basic blocks. An individual node represents a method or basic block and an edge represents a method call, method return or transition (at the method or basic …


Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo Nov 2011

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo

David LO

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should …


Mining Software Specifications, David Lo, Siau-Cheng Khoo Nov 2011

Mining Software Specifications, David Lo, Siau-Cheng Khoo

David LO

No abstract provided.


Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo Nov 2011

Model Checking In The Absence Of Code, Model And Properties, David Lo, Siau-Cheng Khoo

David LO

Model checking is a major approach in ensuring software correctness. It verifies a model converted from code against some formal properties. However, difficulties and programmers ’ reluctance to formalize formal properties have been some hurdles to its widespread industrial adoption. Also, with the advent of commercial off-the-shelf (COTS) components provided by third party vendors, model checking is further challenged as often only a binary version of the code is provided by vendors. Interestingly, latest instrumentation tools like PIN and Valgrind have enable execution traces to be collected dynamically from a running program. In this preliminary study, we investigate what can …


Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu Nov 2011

Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu

David LO

In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common dependence conditions but existing code searching techniques can hardly handle dependence relations satisfactorily. In this paper, we propose a technique that enables developers to make queries involving dependence conditions and textual conditions on the system dependence graph …


Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo Nov 2011

Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo

David LO

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely …


Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

No abstract provided.


Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim Nov 2011

Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim

David LO

During social interactions in a community, there are often sub-communities that behave in opposite manner. These antagonistic sub-communities could represent groups of people with opposite tastes, factions within a community distrusting one another, etc. Taking as input a set of interactions within a community, we develop a novel pattern mining approach that extracts for a set of antagonistic sub-communities. In particular, based on a set of user specified thresholds, we extract a set of pairs of sub-communities that behave in opposite ways with one another. To prevent a blow up in these set of pairs, we focus on extracting a …


Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze Nov 2011

Automatic Steering Of Behavioral Model Inference, David Lo, Leonardo Mariani, Mauro Pezze

David LO

Many testing and analysis techniques use finite state models to validate and verify the quality of software systems. Since the specification of such models is complex and time-consuming, researchers defined several techniques to extract finite state models from code and traces. Automatically generating models requires much less effort than designing them, and thus eases the verification and validation of large software systems. However, when models are inferred automatically, the precision of the mining process is critical. Behavioral models mined with imprecise processes can include many spurious behaviors, and can thus compromise the results of testing and analysis techniques that use …


Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei Nov 2011

Extracting Paraphrases Of Technical Terms From Noisy Parallel Software Corpus, Xiaoyin Wang, David Lo, Jing Jiang, Lu Zhang, Hong Mei

David LO

In this paper, we study the problem of extracting technical paraphrases from a parallel software corpus, namely, a collection of duplicate bug reports. Paraphrase acquisition is a fundamental task in the emerging area of text mining for software engineering. Existing paraphrase extraction methods are not entirely suitable here due to the noisy nature of bug reports. We propose a number of techniques to address the noisy data problem. The empirical evaluation shows that our method significantly improves an existing method by upto 58%


Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form "whenever a series of precedent events occurs, eventually a series of consequent events occurs". Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the "window" …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo Nov 2011

Mining Message Sequence Graphs, Sandeep Kumar, Siau-Cheng Khoo, Abhik Roychoudhury, David Lo

David LO

Dynamic specification mining involves discovering software behavior from traces for the purpose of program comprehension and bug detection. However, in concurrent/distributed programs, the inherent partial order relationships among events occurring across processes pose a big challenge to specification mining. In this paper, we propose a framework for mining partial orders so as to understand concurrent program behavior. Our miner takes in a set of concurrent program traces, and produces a message sequence graph (MSG) to represent the concurrent program behavior. An MSG represents a graph where the nodes of the graph are partial orders, represented as Message Sequence Charts. Mining …