Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 15 of 15

Full-Text Articles in Physical Sciences and Mathematics

Mining Top-K Large Structural Patterns In A Massive Network, Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, Philip S. Yu Dec 2011

Mining Top-K Large Structural Patterns In A Massive Network, Feida Zhu, Qiang Qu, David Lo, Xifeng Yan, Jiawei Han, Philip S. Yu

David LO

With ever-growing popularity of social networks, web and bio-networks, mining large frequent patterns from a single huge network has become increasingly important. Yet the existing pattern mining methods cannot offer the efficiency desirable for large pattern discovery. We propose Spider- Mine, a novel algorithm to efficiently mine top-K largest frequent patterns from a single massive network with any user-specified probability of 1 − ϵ. Deviating from the existing edge-by-edge (i.e., incremental) pattern-growth framework, SpiderMine achieves its efficiency by unleashing the power of small patterns of a bounded diameter, which we call “spiders”. With the spider structure, our approach adopts a …


Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz Dec 2011

Towards Succinctness In Mining Scenario-Based Specifications, David Lo, Shahar Maoz

David LO

Specification mining methods are used to extract candidate specifications from system execution traces. A major challenge for specification mining is succinctness. That is, in addition to the soundness, completeness, and scalable performance of the specification mining method, one is interested in producing a succinct result, which conveys a lot of information about the system under investigation but uses a short, machine and human-readable representation. In this paper we address the succinctness challenge in the context of scenario-based specification mining, whose target formalism is live sequence charts (LSC), an expressive extension of classical sequence diagrams. We do this by adapting three …


Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Studies have shown that program comprehension takes up to 45% of software development costs. Such high costs are caused by the lack-of documented specification and further aggravated by the phenomenon of software evolution. There is a need for automated tools to extract specifications to aid program comprehension. In this paper, a novel technique to efficiently mine common software temporal patterns from traces is proposed. These patterns shed light on program behaviors, and are termed iterative patterns. They capture unique characteristic of software traces, typically not found in arbitrary sequences. Specifically, due to loops, interesting iterative patterns can occur multiple times …


Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo Nov 2011

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo

David LO

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should …


Mining Software Specifications, David Lo, Siau-Cheng Khoo Nov 2011

Mining Software Specifications, David Lo, Siau-Cheng Khoo

David LO

No abstract provided.


Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

No abstract provided.


Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form "whenever a series of precedent events occurs, eventually a series of consequent events occurs". Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the "window" …


Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo Nov 2011

Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo

David LO

Softwares are often built without specification. Tools to automatically extract specification from software are needed and many techniques have been proposed. One type of these specifications – temporal API specification – is often specified in the form of automaton (i.e., FSA/PFSA). There have been many work on mining software temporal specification using dynamic analysis techniques; i.e., analysis of software program traces. Unfortunately, the issues of scalability, robustness and accuracy of these techniques have not been comprehensively addressed. In this paper, we describe a framework that enables assessments of the performance of a specification miner in generating temporal specification of software …


Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo Nov 2011

Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo

David LO

Software specifications are often lacking, incomplete and outdated in the industry. Lack and incomplete specifications cause various software engineering problems. Studies have shown that program comprehension takes up to 45% of software development costs. One of the root causes of the high cost is the lack-of documented specification. Also, outdated and incomplete specification might potentially cause bugs and compatibility issues. In this paper, we describe novel data mining techniques to mine or reverse engineer these specifications from the pool of software engineering data. A large amount of software data is available for analysis. One form of software data is program …


Mining Specifications In Diversified Formats From Execution Traces, David Lo Nov 2011

Mining Specifications In Diversified Formats From Execution Traces, David Lo

David LO

Software evolves; this phenomenon causes increase in maintenance efforts, problem in comprehending the ever-changing code base and difficulty in verifying software correctness. As software changes, often the documented specification is not updated. Outdated specification adds challenge to the understanding of the code base during maintenance tasks. Also, software changes might induce bugs, anomalies and even security threats. To address the above issues, we propose an array of specification mining techniques to mine software specifications in diversified formats from program execution traces. Case studies on various systems show that the extracted specifications shed light on the behaviors of systems under analysis. …


Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu Nov 2011

Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu

David LO

To improve software productivity and quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks. However, mining SE data poses several challenges. The authors present various algorithms to effectively mine sequences, graphs, and text from such data.


Specification Mining: A Concise Introduction, David Lo, Siau-Cheng Khoo, Chao Liu, Jiawei Han Nov 2011

Specification Mining: A Concise Introduction, David Lo, Siau-Cheng Khoo, Chao Liu, Jiawei Han

David LO

No abstract provided.


Terapixel Imaging Of Cosmological Simulations, Yu Feng, Rupert Croft, Tiziana Di Matteo, Nishikanta Khandai, Randy Sargent, Illah Nourbakhsh, Paul Dille, Chris Bartley, Volker Springel, Anirban Jana, Jeffrey Gardner Nov 2011

Terapixel Imaging Of Cosmological Simulations, Yu Feng, Rupert Croft, Tiziana Di Matteo, Nishikanta Khandai, Randy Sargent, Illah Nourbakhsh, Paul Dille, Chris Bartley, Volker Springel, Anirban Jana, Jeffrey Gardner

Randy Sargent

The increasing size of cosmological simulations has led to the need for new visualization techniques. We focus on smoothed particle hydrodynamic (SPH) simulations run with the GADGET code and describe methods for visually accessing the entire simulation at full resolution. The simulation snapshots are rastered and processed on supercomputers into images that are ready to be accessed through a Web interface (GigaPan). This allows any scientist with a Web browser to interactively explore simulation data sets in both spatial and temporal dimensions and data sets which in their native format can be hundreds of terabytes in size or more. We …


The Implementation Of A Visual Language Interface For An Object-Oriented Multi-Media Database System, Mun Kew Leong, Boon-Siong Choo, Chun-Hong Kok, Jyh-Jang Lim, Arcot Desai Narasimhalu Jan 2011

The Implementation Of A Visual Language Interface For An Object-Oriented Multi-Media Database System, Mun Kew Leong, Boon-Siong Choo, Chun-Hong Kok, Jyh-Jang Lim, Arcot Desai Narasimhalu

Arcot Desai NARASIMHALU

This paper documents the ongoing implementation of the VILD visual language interface to the object-oriented multimedia database system, MDBMS. We set forth the infrastructure on which VILD has been developed, and describe in detail the three sections of the language system which have been completed: the Schema Editor to define the database, the Frame Editor to edit and enter data, and the Browser to view the data. We conclude with the query implementation of VILD.


Facial Identification System Using Multiple Retrievals Techniques, Jian Kang Wu, Arcot Desai Narasimhalu Jan 2011

Facial Identification System Using Multiple Retrievals Techniques, Jian Kang Wu, Arcot Desai Narasimhalu

Arcot Desai NARASIMHALU

During a police investigation, officers often have to sort through hundreds of photographs to identify a suspect. To aid this task, we at the Institute of Systems Science developed and implemented a flexible database system that can retrieve faces using personal information, fuzzy and free-text descriptors, and classification trees.