Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Databases and Information Systems

2011

Institution
Keyword
Publication
Publication Type
File Type

Articles 31 - 60 of 268

Full-Text Articles in Physical Sciences and Mathematics

Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Iterative Patterns For Software Specification Discovery, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

Studies have shown that program comprehension takes up to 45% of software development costs. Such high costs are caused by the lack-of documented specification and further aggravated by the phenomenon of software evolution. There is a need for automated tools to extract specifications to aid program comprehension. In this paper, a novel technique to efficiently mine common software temporal patterns from traces is proposed. These patterns shed light on program behaviors, and are termed iterative patterns. They capture unique characteristic of software traces, typically not found in arbitrary sequences. Specifically, due to loops, interesting iterative patterns can occur multiple times …


Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo Nov 2011

Smartic: Specification Mining Architecture With Trace Filtering And Clustering, David Lo, Siau-Cheng Khoo

David LO

Improper management of software evolution, compounded by imprecise, and changing requirements, along with the "short time to market" requirement, commonly leads to a lack of up-to-date specifications. This can result in software that is characterized by bugs, anomalies and even security threats. Software specification mining is a new technique to address this concern by inferring specifications automatically. In this paper, we propose a novel API specification mining architecture called SMArTIC Specification Mining Architecture with Trace fIltering and Clustering) to improve the accuracy, robustness and scalability of specification miners. This architecture is constructed based on two hypotheses: (1) Erroneous traces should …


Mining Software Specifications, David Lo, Siau-Cheng Khoo Nov 2011

Mining Software Specifications, David Lo, Siau-Cheng Khoo

David LO

No abstract provided.


Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu Nov 2011

Matching Dependence-Related Queries In The System Dependence Graph., Xiaoyin Wang, David Lo, Jiefeng Cheng, Lu Zhang, Hong Mei, Jeffrey Xu Yu

David LO

In software maintenance and evolution, it is common that developers want to apply a change to a number of similar places. Due to the size and complexity of the code base, it is challenging for developers to locate all the places that need the change. A main challenge in locating the places that need the change is that, these places share certain common dependence conditions but existing code searching techniques can hardly handle dependence relations satisfactorily. In this paper, we propose a technique that enables developers to make queries involving dependence conditions and textual conditions on the system dependence graph …


Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo Nov 2011

Mining Iterative Generators And Representative Rules For Software Specification Discovery, David Lo, Jinyan Li, Limsoon Wong, Siau-Cheng Khoo

David LO

Billions of dollars are spent annually on software-related cost. It is estimated that up to 45 percent of software cost is due to the difficulty in understanding existing systems when performing maintenance tasks (i.e., adding features, removing bugs, etc.). One of the root causes is that software products often come with poor, incomplete, or even without any documented specifications. In an effort to improve program understanding, Lo et al. have proposed iterative pattern mining which outputs patterns that are repeated frequently within a program trace, or across multiple traces, or both. Frequent iterative patterns reflect frequent program behaviors that likely …


Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Mining Past-Time Temporal Rules: A Dynamic Analysis Approach, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

No abstract provided.


Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim Nov 2011

Mining Antagonistic Communities From Social Networks, Kuan Zhang, David Lo, Ee Peng Lim

David LO

During social interactions in a community, there are often sub-communities that behave in opposite manner. These antagonistic sub-communities could represent groups of people with opposite tastes, factions within a community distrusting one another, etc. Taking as input a set of interactions within a community, we develop a novel pattern mining approach that extracts for a set of antagonistic sub-communities. In particular, based on a set of user specified thresholds, we extract a set of pairs of sub-communities that behave in opposite ways with one another. To prevent a blow up in these set of pairs, we focus on extracting a …


Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu Nov 2011

Efficient Mining Of Recurrent Rules From A Sequence Database, David Lo, Siau-Cheng Khoo, Chao Liu

David LO

We study a novel problem of mining significant recurrent rules from a sequence database. Recurrent rules have the form "whenever a series of precedent events occurs, eventually a series of consequent events occurs". Recurrent rules are intuitive and characterize behaviors in many domains. An example is in the domain of software specifications, in which the rules capture a family of program properties beneficial to program verification and bug detection. Recurrent rules generalize existing work on sequential and episode rules by considering repeated occurrences of premise and consequent events within a sequence and across multiple sequences, and by removing the "window" …


Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li Nov 2011

Efficient Topological Olap On Information Networks, Qiang Qu, Feida Zhu, Xifeng Yan, Jiawei Han, Philip Yu, Hongyan Li

David LO

We propose a framework for efficient OLAP on information networks with a focus on the most interesting kind, the topological OLAP (called “T-OLAP”), which incurs topological changes in the underlying networks. T-OLAP operations generate new networks from the original ones by rolling up a subset of nodes chosen by certain constraint criteria. The key challenge is to efficiently compute measures for the newly generated networks and handle user queries with varied constraints. Two effective computational techniques, T-Distributiveness and T-Monotonicity are proposed to achieve efficient query processing and cube materialization. We also provide a T-OLAP query processing framework into which these …


Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi Nov 2011

Comprehensive Evaluation Of Association Measures For Fault Localization, Lucia Lucia, David Lo, Lingxiao Jiang, Aditya Budi

David LO

In statistics and data mining communities, there have been many measures proposed to gauge the strength of association between two variables of interest, such as odds ratio, confidence, Yule-Y, Yule-Q, Kappa, and gini index. These association measures have been used in various domains, for example, to evaluate whether a particular medical practice is associated positively to a cure of a disease or whether a particular marketing strategy is associated positively to an increase in revenue, etc. This paper models the problem of locating faults as association between the execution or non-execution of particular program elements with failures. There have been …


Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo Nov 2011

Towards Better Quality Specification Miners, David Lo, Siau-Cheng Khoo

David LO

Softwares are often built without specification. Tools to automatically extract specification from software are needed and many techniques have been proposed. One type of these specifications – temporal API specification – is often specified in the form of automaton (i.e., FSA/PFSA). There have been many work on mining software temporal specification using dynamic analysis techniques; i.e., analysis of software program traces. Unfortunately, the issues of scalability, robustness and accuracy of these techniques have not been comprehensively addressed. In this paper, we describe a framework that enables assessments of the performance of a specification miner in generating temporal specification of software …


Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng Nov 2011

Mining Interesting Link Formation Rules In Social Networks, Cane Wing-Ki Leung, Ee Peng Lim, David Lo, Jianshu Weng

David LO

Link structures are important patterns one looks out for when modeling and analyzing social networks. In this paper, we propose the task of mining interesting Link Formation rules (LF-rules) containing link structures known as Link Formation patterns (LF-patterns). LF-patterns capture various dyadic and/or triadic structures among groups of nodes, while LF-rules capture the formation of a new link from a focal node to another node as a postcondition of existing connections between the two nodes. We devise a novel LF-rule mining algorithm, known as LFR-Miner, based on frequent subgraph mining for our task. In addition to using a support-confidence framework …


Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia Nov 2011

Mining Closed Discriminative Dyadic Sequential Patterns, David Lo, Hong Cheng, - Lucia

David LO

A lot of data are in sequential formats. In this study, we are interested in sequential data that goes in pairs. There are many interesting datasets in this format coming from various domains including parallel textual corpora, duplicate bug reports, and other pairs of related sequences of events. Our goal is to mine a set of closed discriminative dyadic sequential patterns from a database of sequence pairs each belonging to one of the two classes +ve and -ve. These dyadic sequential patterns characterize the discriminating facets contrasting the two classes. They are potentially good features to be used for the …


Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo Nov 2011

Mining Patterns And Rules For Software Specification Discovery, David Lo, Siau-Cheng Khoo

David LO

Software specifications are often lacking, incomplete and outdated in the industry. Lack and incomplete specifications cause various software engineering problems. Studies have shown that program comprehension takes up to 45% of software development costs. One of the root causes of the high cost is the lack-of documented specification. Also, outdated and incomplete specification might potentially cause bugs and compatibility issues. In this paper, we describe novel data mining techniques to mine or reverse engineer these specifications from the pool of software engineering data. A large amount of software data is available for analysis. One form of software data is program …


Mining Specifications In Diversified Formats From Execution Traces, David Lo Nov 2011

Mining Specifications In Diversified Formats From Execution Traces, David Lo

David LO

Software evolves; this phenomenon causes increase in maintenance efforts, problem in comprehending the ever-changing code base and difficulty in verifying software correctness. As software changes, often the documented specification is not updated. Outdated specification adds challenge to the understanding of the code base during maintenance tasks. Also, software changes might induce bugs, anomalies and even security threats. To address the above issues, we propose an array of specification mining techniques to mine software specifications in diversified formats from program execution traces. Case studies on various systems show that the extracted specifications shed light on the behaviors of systems under analysis. …


Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu Nov 2011

Data Mining For Software Engineering, Tao Xie, Suresh Thummalapenta, David Lo, Chao Liu

David LO

To improve software productivity and quality, software engineers are increasingly applying data mining algorithms to various software engineering tasks. However, mining SE data poses several challenges. The authors present various algorithms to effectively mine sequences, graphs, and text from such data.


Specification Mining: A Concise Introduction, David Lo, Siau-Cheng Khoo, Chao Liu, Jiawei Han Nov 2011

Specification Mining: A Concise Introduction, David Lo, Siau-Cheng Khoo, Chao Liu, Jiawei Han

David LO

No abstract provided.


Terapixel Imaging Of Cosmological Simulations, Yu Feng, Rupert Croft, Tiziana Di Matteo, Nishikanta Khandai, Randy Sargent, Illah Nourbakhsh, Paul Dille, Chris Bartley, Volker Springel, Anirban Jana, Jeffrey Gardner Nov 2011

Terapixel Imaging Of Cosmological Simulations, Yu Feng, Rupert Croft, Tiziana Di Matteo, Nishikanta Khandai, Randy Sargent, Illah Nourbakhsh, Paul Dille, Chris Bartley, Volker Springel, Anirban Jana, Jeffrey Gardner

Randy Sargent

The increasing size of cosmological simulations has led to the need for new visualization techniques. We focus on smoothed particle hydrodynamic (SPH) simulations run with the GADGET code and describe methods for visually accessing the entire simulation at full resolution. The simulation snapshots are rastered and processed on supercomputers into images that are ready to be accessed through a Web interface (GigaPan). This allows any scientist with a Web browser to interactively explore simulation data sets in both spatial and temporal dimensions and data sets which in their native format can be hundreds of terabytes in size or more. We …


On-Line Banking Systems: Are They Sustainable?, Satish Mahadevan Srinivasan, Sachin Pawaskar, Abhishek Tripathi, Lotfollah Najjar Nov 2011

On-Line Banking Systems: Are They Sustainable?, Satish Mahadevan Srinivasan, Sachin Pawaskar, Abhishek Tripathi, Lotfollah Najjar

Information Systems and Quantitative Analysis Faculty Proceedings & Presentations

Although the trend for on-line banking has increased in recent years, the customers have not shown enthusiastic participation in the past and in present. Since the sustainability of a bank supporting on-line-banking service depends on what capacity it can attract new customers, retain already existing customers and how well can it extend its services to the current and future customer base. This investigation is focused on examining if there is any significant difference among the factors namely the transactional security, information design, navigational design, visual design, web site trust, web site satisfaction and e-loyalty over sustainability of on-line banking for …


A Visual Analytics System For Metropolitan Transportation, Siyuan Liu, Ce Liu, Qiong Luo, Lionel M. Ni, Huamin Qu Nov 2011

A Visual Analytics System For Metropolitan Transportation, Siyuan Liu, Ce Liu, Qiong Luo, Lionel M. Ni, Huamin Qu

LARC Research Publications

With the increasing availability of metropolitan transportation data, such as those from vehicle GPSs (Global Positioning systems) and road-side sensors, it becomes viable for authorities, operators, as well as individuals to analyze the data for a better understanding of the transportation system and possibly improved utilization and planning of the system. We report our experience in building the VAST (Visual Analytics for Smart Transportation) system. Our key observation is that metropolitan transportation data are inherently visual as they are spatiotemporal around road networks. Therefore, we visualize traffic data together with digital maps and support analytical queries through this interactive visual …


Are There Contagion Effects In Information Technology And Business Process Outsourcing?, Arti Mann, Robert J. Kauffman, Kunsoo Han, Barrie R. Nault Nov 2011

Are There Contagion Effects In Information Technology And Business Process Outsourcing?, Arti Mann, Robert J. Kauffman, Kunsoo Han, Barrie R. Nault

Research Collection School Of Computing and Information Systems

We model the diffusion of IT outsourcing using announcements about IT outsourcing deals. We estimate a lognormal diffusion curve to test whether IT outsourcing follows a pure diffusion process or there are contagion effects involved. The methodology permits us to study the consequences of outsourcing events, especially mega-deals with IT contract amounts that exceed US$1 billion. Mega-deals act, we theorize, as precipitating events that create a strong basis for contagion effects and are likely to affect decision-making by other firms in an industry. Then, we evaluate the role of different communication channels in the diffusion process of IT outsourcing by …


A Brain-Inspired Model Of Hierarchical Planner, Budhitama Subagdja, Ah-Hwee Tan Nov 2011

A Brain-Inspired Model Of Hierarchical Planner, Budhitama Subagdja, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Hierarchical planning is an approach of planning by composing and executing hierarchically arranged plans to solve some problems. Most symbolic-based hierarchical planners have been devised to allow the knowledge to be described expressively. However, a great challenge is to automatically seek and acquire new plans on the fly. This paper presents a novel neural-based model of hierarchical planning that can seek and acquired new plans on-line if the necessary knowledge are lacking. Inspired by findings in neuropsychology, plans can be inherently learnt, retrieved, and manipulated simultaneously rather than discretely processed like in most symbolic approaches. Using a multi-channel adaptive resonance …


Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao Nov 2011

Software Process Evaluation: A Machine Learning Approach, Ning Chen, Steven C. H. Hoi, Xiaokui Xiao

Research Collection School Of Computing and Information Systems

Software process evaluation is essential to improve software development and the quality of software products in an organization. Conventional approaches based on manual qualitative evaluations (e.g., artifacts inspection) are deficient in the sense that (i) they are time-consuming, (ii) they suffer from the authority constraints, and (iii) they are often subjective. To overcome these limitations, this paper presents a novel semi-automated approach to software process evaluation using machine learning techniques. In particular, we formulate the problem as a sequence classification task, which is solved by applying machine learning algorithms. Based on the framework, we define a new quantitative indicator to …


Consistent Community Identification In Complex Networks, Haewoon Kwak, Young-Ho Eom, Yoonchan Choi, Hawoong Jeong Nov 2011

Consistent Community Identification In Complex Networks, Haewoon Kwak, Young-Ho Eom, Yoonchan Choi, Hawoong Jeong

Research Collection School Of Computing and Information Systems

We have found that known community identification algorithms produce inconsistent communities when the node ordering changes at input. We use the pairwise membership probability and consistency to quantify the level of consistency across multiple runs of an algorithm. Based on these two metrics, we address the consistency problem without compromising the modularity. The key insight of the algorithm is to use pairwise membership probabilities as link weights. It offers a new tool in the study of community structures and their evolutions.


Unsupervised Multiple Kernel Learning, Jinfeng Zhuang, Jialei Wang, Steven C. H. Hoi, Xiangyang Lan Nov 2011

Unsupervised Multiple Kernel Learning, Jinfeng Zhuang, Jialei Wang, Steven C. H. Hoi, Xiangyang Lan

Research Collection School Of Computing and Information Systems

Traditional multiple kernel learning (MKL) algorithms are essentially supervised learning in the sense that the kernel learning task requires the class labels of training data. However, class labels may not always be available prior to the kernel learning task in some real world scenarios, e.g., an early preprocessing step of a classification task or an unsupervised learning task such as dimension reduction. In this paper, we investigate a problem of Unsupervised Multiple Kernel Learning (UMKL), which does not require class labels of training data as needed in a conventional multiple kernel learning task. Since a kernel essentially defines pairwise similarity …


Coping With Distance: An Empirical Study Of Communication On The Jazz Platform, Renuka Sindhgatta, Bikram Sengupta, Subhajit Datta Nov 2011

Coping With Distance: An Empirical Study Of Communication On The Jazz Platform, Renuka Sindhgatta, Bikram Sengupta, Subhajit Datta

Research Collection School Of Computing and Information Systems

Global software development - which is characterized by teams separated by physical distance and/or time-zone differences - has traditionally posed significant communication challenges. Often these have caused delays in completing tasks, or created misalignment across sites leading to re-work. In recent years, however, a new breed of development environments with rich collaboration features have emerged to facilitate cross-site work in distributed projects. In this paper we revisit the question "does distance matter?" in the context of IBM Jazz Platform -- a state-of-the-art collaborative development environment. We study the ecosystem of a large distributed team of around 300 members across 35 …


Enabling Gpu Acceleration With Messaging Middleware, Randall E. Duran, Li Zhang, Tom Hayhurst Nov 2011

Enabling Gpu Acceleration With Messaging Middleware, Randall E. Duran, Li Zhang, Tom Hayhurst

Research Collection School Of Computing and Information Systems

Graphics processing units (GPUs) offer great potential for accelerating processing for a wide range of scientific and business applications. However, complexities associated with using GPU technology have limited its use in applications. This paper reviews earlier approaches improving GPU accessibility, and explores how integration with middleware messaging technologies can further improve the accessibility and usability of GPU-enabled platforms. The results of a proof-of-concept integration between an open-source messaging middleware platform and a general-purpose GPU platform using the CUDA framework are presented. Additional applications of this technique are identified and discussed as potential areas for further research.


Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang Nov 2011

Finding Relevant Answers In Software Forums, Swapna Gottopati, David Lo, Jing Jiang

Research Collection School Of Computing and Information Systems

Online software forums provide a huge amount of valuable content. Developers and users often ask questions and receive answers from such forums. The availability of a vast amount of thread discussions in forums provides ample opportunities for knowledge acquisition and summarization. For a given search query, current search engines use traditional information retrieval approach to extract webpages containing relevant keywords. However, in software forums, often there are many threads containing similar keywords where each thread could contain a lot of posts as many as 1,000 or more. Manually finding relevant answers from these long threads is a painstaking task to …


Learning Human Emotion Patterns For Modeling Virtual Humans, Shu Feng, Ah-Hwee Tan Nov 2011

Learning Human Emotion Patterns For Modeling Virtual Humans, Shu Feng, Ah-Hwee Tan

Research Collection School Of Computing and Information Systems

Emotion modeling is a crucial part in modeling virtual humans. Although various emotion models have been proposed, most of them focus on designing specific appraisal rules. As there is no unified framework for emotional appraisal, the appraisal variables have to be defined beforehand and evaluated in a subjective way. In this paper, we propose an emotion model based on machine learning methods by taking the following position: an emotion model should mirror actual human emotion in the real world and connect tightly with human inner states, such as drives, motivations and personalities. Specifically, a self-organizing neural model called Emotional Appraisal …


The Knowledge-Driven Exploration Of Integrated Biomedical Knowledge Sources Facilitates The Generation Of New Hypotheses, Vinh Nguyen, Olivier Bodenreider, Todd Minning, Amit P. Sheth Oct 2011

The Knowledge-Driven Exploration Of Integrated Biomedical Knowledge Sources Facilitates The Generation Of New Hypotheses, Vinh Nguyen, Olivier Bodenreider, Todd Minning, Amit P. Sheth

Kno.e.sis Publications

Knowledge gained from the scientific literature can complement newly obtained experimental data in helping researchers understand the pathological processes underlying diseases. However, unless the scientific literature and experimental data are semantically integrated, it is generally difficult for scientists to exploit the two sources effectively. We argue that, in addition to the semantic integration of heterogeneous knowledge sources, the usability of the integrated resource by scientists is dependent upon the availability of knowledge visualization and exploration tools. Moreover, the integration techniques must be scalable and the exploration interfaces must be easy to use by bench scientists. The end goal of such …