Open Access. Powered by Scholars. Published by Universities.®

Physical Sciences and Mathematics Commons

Open Access. Powered by Scholars. Published by Universities.®

Articles 1 - 10 of 10

Full-Text Articles in Physical Sciences and Mathematics

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya Dec 2009

Mining Data From Multiple Software Development Projects, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao, Naeem Seliya

Computer Science Faculty Publications

A large system often goes through multiple software project development cycles, in part due to changes in operation and development environments. For example, rapid turnover of the development team between releases can influence software quality, making it important to mine software project data over multiple system releases when building defect predictors. Data collection of software attributes are often conducted independent of the quality improvement goals, leading to the availability of a large number of attributes for analysis. Given the problems associated with variations in development process, data collection, and quality goals from one release to another emphasizes the importance of …


High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao Nov 2009

High-Dimensional Software Engineering Data And Feature Selection, Huanjing Wang, Taghi M. Khoshgoftaar, Kehan Gao

Computer Science Faculty Publications

Software metrics collected during project development play a critical role in software quality assurance. A software practitioner is very keen on learning which software metrics to focus on for software quality prediction. While a concise set of software metrics is often desired, a typical project collects a very large number of metrics. Minimal attention has been devoted to finding the minimum set of software metrics that have the same predictive capability as a larger set of metrics – we strive to answer that question in this paper. We present a comprehensive comparison between seven commonly-used filter-based feature ranking techniques (FRT) …


Robust Lifetime Measurement In Large-Scale P2p Systems With Non-Stationary Arrivals, Xiaoming Wang, Zhongmei Yao, Yueping Zhang, Dmitri Loguinov Sep 2009

Robust Lifetime Measurement In Large-Scale P2p Systems With Non-Stationary Arrivals, Xiaoming Wang, Zhongmei Yao, Yueping Zhang, Dmitri Loguinov

Computer Science Faculty Publications

Characterizing user churn has become an important topic in studying P2P networks, both in theoretical analysis and system design. Recent work has shown that direct sampling of user lifetimes may lead to certain bias (arising from missed peers and round-off inconsistencies) and proposed a technique that estimates lifetimes based on sampled residuals. In this paper, however, we show that under non-stationary arrivals, which are often present in real systems, residual-based sampling does not correctly reconstruct user lifetimes and suffers a varying degree of bias, which in some cases makes estimation completely impossible. We overcome this problem using two contributions: a …


An Empirical Investigation Of Filter Attribute Selection Techniques For Software Quality Classification, Kehan Gao, Taghi M. Khoshgoftaar, Huanjing Wang Aug 2009

An Empirical Investigation Of Filter Attribute Selection Techniques For Software Quality Classification, Kehan Gao, Taghi M. Khoshgoftaar, Huanjing Wang

Computer Science Faculty Publications

Attribute selection is an important activity in data preprocessing for software quality modeling and other data mining problems. The software quality models have been used to improve the fault detection process. Finding faulty components in a software system during early stages of software development process can lead to a more reliable final product and can reduce development and maintenance costs. It has been shown in some studies that prediction accuracy of the models improves when irrelevant and redundant features are removed from the original data set. In this study, we investigated four filter attribute selection techniques, Automatic Hybrid Search (AHS), …


Residual-Based Estimation Of Peer And Link Lifetimes In P2p Networks, Xiaoming Wang, Zhongmei Yao, Dmitri Loguinov Jun 2009

Residual-Based Estimation Of Peer And Link Lifetimes In P2p Networks, Xiaoming Wang, Zhongmei Yao, Dmitri Loguinov

Computer Science Faculty Publications

Existing methods of measuring lifetimes in P2P systems usually rely on the so-called Create-BasedMethod (CBM), which divides a given observation window into two halves and samples users ldquocreatedrdquo in the first half every Delta time units until they die or the observation period ends. Despite its frequent use, this approach has no rigorous accuracy or overhead analysis in the literature. To shed more light on its performance, we first derive a model for CBM and show that small window size or large Delta may lead to highly inaccurate lifetime distributions. We then show that create-based sampling exhibits an inherent …


Node Isolation Model And Age-Based Neighbor Selection In Unstructured P2p Networks, Zhongmei Yao, Derek Leonard, Dmitri Loguinov Feb 2009

Node Isolation Model And Age-Based Neighbor Selection In Unstructured P2p Networks, Zhongmei Yao, Derek Leonard, Dmitri Loguinov

Computer Science Faculty Publications

Previous analytical studies of unstructured P2P resilience have assumed exponential user lifetimes and only considered age-independent neighbor replacement. In this paper, we overcome these limitations by introducing a general node-isolation model for heavy-tailed user lifetimes and arbitrary neighbor-selection algorithms. Using this model, we analyze two age-biased neighbor-selection strategies and show that they significantly improve the residual lifetimes of chosen users, which dramatically reduces the probability of user isolation and graph partitioning compared with uniform selection of neighbors. In fact, the second strategy based on random walks on age-proportional graphs demonstrates that, for lifetimes with infinite variance, the system monotonically increases …


Correlation Of Music Charts And Search Engine Rankings, Martin Klein, Olena Hunsicker, Michael Nelson Jan 2009

Correlation Of Music Charts And Search Engine Rankings, Martin Klein, Olena Hunsicker, Michael Nelson

Computer Science Faculty Publications

We investigate the question whether expert rankings of real-world entities correlate with search engine (SE) rankings of corresponding web resources. We compare Billboards "Hot 100 Airplay" music charts with SE rankings of associated web resources. Out of nine comparisons we found two strong, two moderate, two weak and one negative correlation. The remaining two comparisons were inconclusive.


Object Reuse And Exchange, Michael L. Nelson, Carl Lagoze, Herbert Van De Sompel, Pete Johnston, Robert Sanderson, Simeon Warner, Jürgen Sieck (Ed.), Michael A. Herzog (Ed.) Jan 2009

Object Reuse And Exchange, Michael L. Nelson, Carl Lagoze, Herbert Van De Sompel, Pete Johnston, Robert Sanderson, Simeon Warner, Jürgen Sieck (Ed.), Michael A. Herzog (Ed.)

Computer Science Faculty Publications

The Open Archives Object Reuse and Exchange (OAI-ORE) project defines standards for the description and exchange of aggregations of Web resources. The OAI-ORE abstract data model is conformant with the Architecture of the World Wide Web and leverages concepts from the Semantic Web, including RDF descriptions and Linked Data. In this paper we provide a brief review of a motivating example and its serialization in Atom.


Exploring Out-Of-Turn Interactions With Websites, Saverio Perugini, Naren Ramakrishnan, Manuel A. Pérez-Quiñones, Mary E. Pinney, Mary Beth Rosson Jan 2009

Exploring Out-Of-Turn Interactions With Websites, Saverio Perugini, Naren Ramakrishnan, Manuel A. Pérez-Quiñones, Mary E. Pinney, Mary Beth Rosson

Computer Science Faculty Publications

Hierarchies are ubiquitous on the web for structuring online catalogs and indexing multidimensional attributed data sets. They are a natural metaphor for information seeking if their levelwise structure mirrors the user's conception of the underlying domain. In other cases, they can be frustrating, especially if multiple drill‐downs are necessary to arrive at information of interest. To support a broad range of users, site designers often expose multiple faceted classifications or provide within‐page pruning mechanisms. We present a new technique, called out-of-turn interaction, that increases the richness of user interaction at hierarchical sites, without enumerating all possible completion paths in the …


User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang Jan 2009

User Interface Design, Moritz Stefaner, Sebastien Ferre, Saverio Perugini, Jonathan Koren, Yi Zhang

Computer Science Faculty Publications

As detailed in Chap. 1, system implementations for dynamic taxonomies and faceted search allow a wide range of query possibilities on the data. Only when these are made accessible by appropriate user interfaces, the resulting applications can support a variety of search, browsing and analysis tasks. User interface design in this area is confronted with specific challenges. This chapter presents an overview of both established and novel principles and solutions.